There is a lot of noise around machine learning but what are companies actually doing? What kind of insights are they really getting and what are they planning to work towards in the future? For most companies, their press about machine learning talks a good game but is light on details leaving investors and competitors to wonder what is really going on.
I’ve spent the last 6 months working this question by gathering data science and analytics job postings from over 200 different companies in eComm/retail, gaming, finance, insurance, healthcare, manufacturing, and professional services (big 5 consulting, Microsoft, Oracle, Teradata, Lumen Data, etc.). The job postings are a revealing look into how these businesses staff for their current and planned projects. By analyzing the requested skill sets and in many cases looking at the detailed project descriptions within the job posting, I have identified several trends within industries. I am keeping it high level for brevity.
Across the Board Trends – What Tools Are Companies Using?Spark is everywhere.Python is by far the most mentioned statistical language with R close behind it.Natural Language Processing is being used across the board with some custom development but mostly leveraging open source tools.There is no clear leader in the NoSQL space.While unstructured data is here to stay, the relational database is far from dead.Companies say “Tableau” when they talk about visualization more than any other platform.Amazon Web Services is gaining traction. AWS is often mentioned as a nice to have skill.
Across the Board Trends – What Insights Are Most Looked For?
Customer behavior is #1 except in manufacturing. Companies want to know what makes their customers tick. The ability to achieve increasing granularity in customer segmentation is a significant trend across job descriptions. This suggests that companies want to interact with their customers on a 1 to 1 basis and are using analytics to accomplish that goal.
Marketing performance and optimization is a close second. What image will make this customer most likely to click on an ad? How should the site display change based on how the customer came to our site? These and many more are the types of questions marketing departments are looking to answer with analytics. It appears that data driven marketing has gained wide acceptance and marketing departments across industries have embraced that trend.
Gaming is an interesting case study because they already seem to have the analytics capabilities they need. The product management job descriptions from EA incorporate heavy data analytics capabilities, almost a data scientist without the coding background. This is true at several smaller gaming companies as well. It suggests an advanced game analytics infrastructure is already in place.
What are they learning? After talking with some people inside EA I learned that they are building data driven in game economies. Their freemium games’ in game purchases are almost completely data driven and EA is not alone in their heavy reliance on analytics. Online gambling sites’ customer loyalty systems are extremely advanced in their use of analytics for player acquisition and retention. I have worked with 3 now, so I know that from an insider’s perspective. Aside from social media companies like Facebook, LinkedIn, and Twitter as well as outliers like Trulia, Apple, and Google, it looks like the most advanced uses of customer analytics are deployed in gaming companies.
Retail and eComm
Size matters in this space. Large companies like Macy’s and Best Buy have sophisticated analytics capabilities based on their data scientist job requirements. Smaller players are just getting started. Many smaller retailers have generic requirements and phrases like “you know what data can do for retail and you’re ready to put analytics to work driving revenue.”
Recommender systems, demand forecasting and real time pricing are the 3 most frequently mentioned insights. From the job descriptions it looks like retailers are after the same insights no matter their size. However larger retailers are farther ahead in understanding what it takes to get those insights.
Online retailers like Amazon are significantly ahead of the curve so you would think their job descriptions would be complex, but they are not. They are very similar to smaller companies’ but without the vague applications. Their interview questions are focused on applied statistics and software engineering. They are very tight lipped about the algorithms in use. This class of business has a holistic understanding of analytics. They control the information they release and thoroughly analyze the data available to them. What is Amazon learning from analytics? A lot more than we really know, and a lot more than their competitors are safe assumptions.
Finance is another interesting case study. For investing and banking, the industry has used analytics for a very long time. The new tools have allowed the industry to expand those efforts significantly. Uses of analytics for establishing creditworthiness, fraud detection, and investment strategy are well publicized and supported by qualifications from their job postings.
What is interesting is how far they are taking fraud detection. JP Morgan Chase is hiring data scientists for a project to detect patterns of securities fraud by combing not only trading patterns but also individuals’ communications. Phone messages, text, email, IM and many others are being run through algorithms to detect patterns of fraud and the intent to defraud. They are not just looking to forensically detect fraud but to proactively detect fraud before it happens. It is a highly sophisticated project which suggests that the industry’s analytics capabilities are very mature.
There is one trend worth noting from my study of insurance data science job postings. The industry is adopting a personalized approach to risk assessment. Their job descriptions suggest they are working on not only marketing personalization but also increasing granularity in their ability to predict customers’ specific risk profile. It will be interesting to see how individual agents leverage this data, but it has interesting implications for building personalized rates and policies. The concept of personalization has applications beyond marketing. Insurance looks to be creating the infrastructure for personalized pricing and products. That trend will likely spread to other industries if it is successful at increasing revenue and loyalty.
The uses of patient data to improve care and diagnosis accuracy are well publicized in healthcare. The interesting data point is the divide between data scientist hiring at small and large hospitals. Small hospitals are hiring analysts familiar with 3rd party analytics software. Smaller hospitals are relying on that software to overcome the costs of data science while larger hospitals are hiring teams of data scientists. It’s a trend that, if it proves successful in reducing the costs of analytics, will likely spread to other industries.
The diversity of requirements for data scientists in the manufacturing industry was a big surprise to me. It has been one of the slower industries to adopt analytics but companies like GE are teaching the industry just how much they can gain from data science. Real time automated machine control systems using advanced deep learning is a project I came across in a job description. Simpler machine control and failure detection systems, algorithms that predict a failure before it happens so maintenance can happen during off hours, both showed up frequently in job descriptions.
Data driven supplier discovery has been a big trend for the last 2 years and it looks like a growing number of manufacturers are using analytics systems to build their supply chain. Data Analyst job descriptions asked for familiarity with a few different 3rd party supplier discovery systems. Just in time inventory is another long-time trend that analytics is being used to optimize. Here again, 3rd party systems knowledge is in demand.
The volume of hiring at consulting companies like Deloitte, Accenture, InfoSys, Tech Mahindra and others suggests that the outsourcing of “low level” data scientists has already begun at large US companies. Companies like mine are trickle down beneficiaries of this trend as Corp2Corp hiring is an easy way to pick up several qualified data scientists quickly. Software providers like Microsoft and Teradata are hiring data scientists to build products that automate the “low level” data science work, putting machine learning within reach of small and mid-sized businesses.
Professional services companies are also hiring data scientists to help with their own sales and marketing efforts. The job and project descriptions suggest that these companies are using analytics to better understand their clients’ decision process in relation to buying professional services and business software products. It looks like analytics are being used by sales staff to overcome the complexities of the business to business sales process as well as to improve the overall customer experience. Marketing is being customized by company, role and influence over the decision process using a combination of custom and 3rd party applications.
I really expected there to be more hype and less substance. I thought I would be calling businesses out for talking big while delivering small. That has not been the case. Most companies are ramping up for what they say they are working on. There were a couple of exceptions, but the trend seems to be that the hype is real. Based on surveys from GE, Accenture and many others, the vast majority of businesses are satisfied with the results of their machine learning initiatives and plan to continue to grow their capabilities.
I thought there would be a big divide between data haves and have nots. It looks like professional services companies are stepping in quickly to fill the gap. The big capabilities gap is really between companies like Amazon, Apple, Facebook, LinkedIn and Google and the rest of the business world. These analytics titans have taken on the toughest challenges of data science and are reaping the rewards. Apple rejects all but the top 1% of data science candidates based on a discussion I had with a group of recruiters working to fill positions there. The scale at these businesses is truly beyond comprehension except by the most skilled data scientists. I would really like to know what they’re doing.