Why Data Science Needs Predictive Analytics but Predictive Analytics Does Not Need Data Science

I wrote this five years ago and it has become more relevant today. Predictive analytics and now prescriptive analytics have moved past data science methodologies. To provide business value, companies need to take the next step.

Vin Vashishta | Originally Published: June 7th, 2015

Book time with me for Career Coaching or sign up for my Business Strategy for Data Scientists class.

We are firmly in the trough of disillusionment over data science. I am seeing a trend in my clients from startups to Fortune 100 that is driving that disillusionment. The results of data science are often failing to meet expectations. If you look at the fall of TESCO and the pending sale of their analytics business, it is one of several telling case studies.

What follows a business failure is the postmortem, a deep dive into what went wrong and how it is going to get fixed. As the fog of data science fades, an uncomfortable truth is settling in. I was working with a midsized retail client to turn around a data science lab that was not producing what the rest of the business expected. After a painful postmortem, the CMO pulled me aside. He stared at me for an uncomfortable length of time and then decided to ask his question. “Why doesn’t any of this feel especially actionable? Why do I feel like I’m overpaying for what I’m getting?”

Moving from Descriptive to Predictive

He nailed it. The long and the short of my response was this. Businesses start data science initiatives with a three-slide presentation. Slide 1 reads something like: Gather Data. Slide 2: Get Insights. Slide 3: Growth and Profit. Businesses trust data science labs, outsourced data science teams and data science applications to provide the insights on slide 2. In the short term, that works out well.

Data science is incredibly good at prescriptive analytics. These are the data visualizations that describe the as is and current state of everything from the business to competitors to customers and more. Data science is not as good at extrapolating what will happen next because most of its tools do not work well for mid-range and long-range predictive models.

Said another way, data science tools are great at telling a business what it should do right now and bad at telling leadership what it should plan to do after that. After a while, prescriptive analytics begin to feel like a visit from Captain Obvious. Those are not the insights the business was looking for on slide 2. Through these postmortems, data science is revealed as a one trick pony with businesses expecting a second act that the methodology does not deliver.

I was fortunate to spend some time on the inside of business strategy with some extremely smart strategists before starting my current business. I learned the focus of business strategy is firmly on the future. What is happening right now is great context but what a business strategist really wants is a picture of what’s coming to build the right 3 to 5-year plan. That is the second act and predictive analytics delivers where data science stops. To realize the promise of actionable insights on slide 2, business needs predictive analytics.

Differentiating Predictive Analytics from Data Science

Data science is excellent at connecting two or more data endpoints. This reveals the relationships between these data points which opens a rich set of real time insights. When X changes those insights allow the business to understand what is happening to Y. It is tempting to say that data science predicts what is happening to Y based on what is happening to X but that is not really what is going on. It is more accurate to say, data science describes what is happening to Y based on what is happening to X.

Why am I parsing terms so closely? Let’s use American football as an example of describing what happens to Y based on X. If the quarterback completes a pass to a receiver who is faster than anyone on the opposing team, with no defenders around him, he will score a touchdown. We can make that statement with a high level of certainty. The black swans like the receiver celebrating too soon and dropping the ball or injuring himself during the run make our certainty less than 100% but we are all comfortable making that statement.

However, we have not predicted a touchdown. We have defined a set of circumstances where a touchdown is highly likely to be scored that we can only use as the touchdown is being scored. Let’s say the TV signal gets lost in the instant that the fast receiver, all alone near the end zone, catches a pass. Using the relationship between known data points, we can describe what is happening right now even though we are not seeing it ourselves. That’s how most data comes to us in business, as an incomplete picture which is why data science proves itself so useful in the short term.

Let’s wind the clock backwards a bit to the end of the previous play. The coaches on both sides are making their decisions on play calls. What predictive analytics is good at is connecting two or more event endpoints. Events are little data ecosystems in and of themselves. An event is all the data that describes a specific point in time.

In our example we have all the information each coach is using as well as a model of their decision-making process. Using the event, we can model their play calls. That leads to the next set of events. This is everything from the quarterback or defensive captain making changes at the line to individual player behaviors. We run this series of nested models forward to predict the outcome of that play which is the second event endpoint or in our case the touchdown.

The complexity difference between a data science model and a predictive model is significant. The tendency to extrapolate a data science model into a predictive model is a huge pitfall. Stringing data science models together to form a predictive model fails very quickly because of the way a data science model handles uncertainty; it essentially ignores it. Although data science-based analytics gives a degree of certainty like 85% it does not describe what happens the other 15% of the time. Predictive analytics does which gives rise to that increased complexity.

That is where the event endpoint versus data endpoint differentiation becomes important. Predictive models look at a more complete picture and as a result can provide a more accurate description of not only what is happening now but what will happen a number of events into the future. A predictive analytics approach includes methodologies to obtain the missing bits of information through very large datasets or experimentation.

Strategy: The Business Case for Predictive Analytics

The evolution of analytics capabilities must move quickly from descriptive to predictive to keep pace with business needs. The driver of that process is business strategy. The decisions that surround the strategy planning process are filled with uncertainty. The goal of predictive analytics is to remove as much uncertainty as possible from the process so strategists can make better decisions based on more complete information.

When I show clients the fruit of predictive models, it is like giving them a bright flashlight while they are walking through a dark forest. That is the key business need justifying the effort behind predictive analytics. Why does a child run into the street without looking? They cannot see the potential consequences of that action. Parents have that foresight and can make better decisions resulting in better outcomes. In this case that leads to keeping their child from being hit by a car. In a business case, that can lead to executives preventing the business from being hit by a disruption like a new competitor.

Imagine the advantage of being able to see one step farther than competitors. That is the shift in capabilities that predictive analytics brings to business strategy. It is one that data science methodologies cannot. The solution for business’s disillusionment with data science is predictive analytics. Let’s move on to act 2.