Machine Learning’s Big Pitfalls
There are many mistakes a business can make on their path to implementing data science and machine learning. Here are the most common.
Vin Vashishta | Originally Published: September 27th, 2014
When I wrote this, we were still calling it Big Data. Businesses were getting into big data and the first wave of data science was getting started. I saw firsthand the positive impact machine learning initiatives made on businesses. The cost savings, revenue streams, and competitive advantages are well known now. The pitfalls were and still are not. Here are a few ways I have seen data initiatives go wrong.
This is a common pitfall of any change, taking the old way of thinking and applying new tools. Data science drives new products and prescriptive analytics. Where BI and analytics were able to tell a business that 80% of all customers… or 40% of all employees…, data science can be much more specific and granular. It reveals insights like customer Ryan A. has a 52% likelihood of making a second purchase in the next 3 months and a 91% likelihood if we send him a special offer of free shipping. To realize the potential of machine learning, the business needs to raise its expectations.
Machine learning for decision support shows a clear course of action. Analytics typically requires significant interpretation to determine a plan of action. Analytics packaged as machine learning often leads to paralysis by analysis and conflicting conclusions. Incomplete analysis should not be tolerated.
Machine learning insights reach conclusions about causality while analytics focuses on correlation. When the two get confused in a presentation it leads to poor decision making. Google used a corollary model for flu predictions. It worked in the short term but failed publicly and catastrophically in the long term. Fortunately, no one was taking any actions based on the model, but businesses often use corollary models to inform business strategy decisions with erratic results.
When I see data point correlation, I use this example to show why they are logic traps. Over the last 200 years as the numbers of pirates have decreased, global temperatures have increased.
Based on these two data points, we should be spending more time fighting global warming by increasing the number of pirates worldwide? On its face, that is ridiculous because we have prior knowledge telling us this conclusion should be dismissed. What about if the two data points were: number of products on a web page and average sale amount? Those two sound plausibly linked when shown increasing on a graph together. It presents no more solid proof than pirates and global temperatures.
What correlation shows is cause for a hypothesis and justification for an experiment. Experimentation is a key tactic of machine learning strategy. It allows us to establish a causal relationship between multiple variables. That is why we say machine learning reveals deep insights. It reveals why something is happening rather than telling us something is happening and leaving the rest to our interpretation. Again, the business needs to raise their expectations to realize the potential of machine learning.
The lesson from these stories is that initiatives need to go all in. An analytics initiative needs to stay that way even with access to larger datasets and machine learning analytical tools. A machine learning initiative needs to think in terms of complex datasets and machine learning tools. A mixture leads to failures. They also show that the business needs to expect more from machine learning. Machine learning tools and datasets should lead to prescriptive.
Algorithms are theories / equations that help us make predictions under certainty. That means we know all the variables, options, probabilities, and outcomes. It is the low hanging fruit of machine learning and so it gets done first.
As the business becomes more accustomed to data enabling decisions, the questions being asked of data become more complex. That leads to a greater number of increasingly complex algorithms. These take significant skill and infrastructure to create then implement. They also make visualization increasingly difficult.
As a result, job descriptions for data scientists become increasingly hard to fill because they require in depth knowledge of complex scientific and statistical principals coupled with high end programming skills. Costs rise as hardware needs increase and the company starts to produce customized solutions to their specific business needs. This is the machine learning maturity chasm and it is a result of the law of diminishing returns.
An analytics approach has significant limitations and needs to be replaced early in the adoption of machine learning with a heuristic approach. Heuristics, simply put, are what allow us as people to recognize patterns. Heuristics in machine learning come down to the concept of model generalization. These deeper patterns, those that generalize to more than one business problem, are the big insights of machine learning.
If no one gets it, no one will use it. That is true of a lot of technology. With machine learning, complexity is inherent and that scares people away. Machine learning is pigeonholed as a marketing only tool or not ready for prime time because the complexity escapes from the data science group. As soon as a business user sees a differential equation their perception of the tool changes and that is a difficult thing to undo. It slows adoption of machine learning in a lot of companies.
Uncertainty has much the same effect on business users. Not knowing what machine learning can do and what the overall strategy for machine learning is within the company makes it hard to get a handle on how machine learning will impact them specifically. It is hard to ask the right questions and propose initiatives that would benefit the organization. Goals, a machine learning strategy and people explaining machine learning in business terms, are all critical pieces to removing uncertainty.
Even groups that do not benefit from machine learning need to be included. They may not need a voice at the table, but they do need a clear understanding of what is happening. I have seen some very irrational reactions to being left in the dark about the business’s machine learning strategy and goals. Those reactions are well worth the few hours of education required to avoid.
Many machine learning pitfalls revolve around data governance. Data governance covers a range of topics:
Ignoring these issues creates hurdles the business will have to face later. Facebook has recently (this was 2016…) generated some backlash for their data experiments. Target and other retailers are dealing with the costs of customer data breaches. Google frequently deals with concerns stemming from their wide-ranging collection and use of personal data.
In the best-case scenario, poor data governance still increases the cost of machine learning. In the case of data quality issues, it can cause a business to stop trusting the data. Privacy, security, and ethical issues can cause customers to lose faith in the brand and business.
A business needs policies and processes to manage its machine learning. Collection and usage policies need to be well communicated to customers. Policies must be consistent with other customer brand experiences. Just like any other product, data needs quality testing regimes to insure it meets the expectations of those using it. These are not complicated steps in and of themselves but the combination of all the issues surrounding data governance usually lead to something being left out. An oversight team or program manager can prevent that pitfall.
Machine learning is no longer a wild, wild west type of technology. It has matured and stabilized quickly. Trial and error are no longer necessary. There are great products and a lot of expertise available to help businesses realize the promise of machine learning in a well-managed way.
However, as with any other technology rollout, it is not problem free. Knowing what the pitfalls are allows for better planning and a smoother implementation. That is key for successful initiatives and companywide adoption.
Expectations are still lagging capabilities. Machine learning still falls short of causal inference. Decision support systems are built on poorly managed data. These are all still realities now. The solutions here were driven by my time with early adopters. Businesses can avoid a lot of pitfalls by working on a few key areas.