AI hype is feeding naïve business analytics

My problem with the current hype is not that AI would not hold the potential to drive the next industrial revolution. Truly amazing progress is being made in AI research labs and companies around the globe. My problem is that a great many things are being dubbed AI and way too often the AI jargon is nothing but a pretty wrapper for sloppy analytical thinking and naïve statistical methodology. Somehow people forget that correlation does not imply causation when they have a cutting-edge deep learning algorithm in their hands.

All too often we see the cool new hammer from the machine learning toolbox being offered as a solution even when the problem looks nothing like a nail. This is particularly problematic when predictive models are being offered as solutions to business problems requiring identification of causal effects. Simple regressions—and more complex predictive models for that matter—are often given causal interpretations without a second thought and marketed as cutting-edge AI or state-of-the-art econometrics. It is important to keep in mind that good prediction accuracy has nothing to do with the validity of the causal interpretation. Weight may be a great predictor of height in your model, but that does not mean you can make a middle-aged consultant any taller by putting him on a high-calorie diet. While this is clear to most people, the lesson that correlation need not imply causation often goes out the window when we turn to business problems.

The confusion around correlation and causation is particularly common in pricing and marketing optimization. Correlations between prices and volumes are often treated as measures of price elasticity, even though the relationship is also driven by supply side pricing responses to shifts in demand. Similarly, associations between marketing spend and business outcomes are used to quantify the impact of marketing, despite the fact that most marketing departments do not pick their spending levels randomly. If prices are raised when demand is high and marketing budgets expanded when good ideas and marketing opportunities abound, the correlations in the data are far from causal. The problem is not about slight biases or small errors either. Naively estimated effects can be off by an order of magnitude or even go in the wrong direction compared to the actual causal effect. For example, it is not uncommon to see positive price elasticity estimates—implying higher volumes at higher prices—even for basic products, when predictive algorithms are haphazardly applied to observational sales data. Yet such money machines hardly exist in practice.

To make progress on causal inference, we need to understand in great detail how the data we are using has been generated. Only then can we identify and isolate random-like variation in the variable of interest to help us uncover its causal effect. And when observational data cannot provide such variation, we should design randomized controlled experiments to help us tease out causal effects. This is important because basing key decisions or actions on naïve approaches and sloppy thinking can be extremely costly. Former Illinois Governor Rod Blagojevich, for example, wanted to spend $26 million a year to send a book per month to every child in Illinois, because children from book-filled homes perform better at school. Luckily for Illinois tax payers, the plan was rejected by the legislature as they undoubtedly realized—channeling their inner economist and Freakonomics author Steven Levitt—that “a book is in fact less a cause of intelligence than an indicator”.

The overselling of predictive models as causal and hyping them up as AI or cutting-edge econometrics is  quite misleading; at the core, these are still the same methods economists have largely dismissed as naïve since the 1980s. To push beyond correlations, the economics profession has focused on developing analytical strategies and statistical methods for credible estimation of causal effects. These strategies and methods—especially when coupled with advances in machine learning—can be very powerful, but are still largely missing from mainstream machine learning and data science toolkits.  Once we combine the power of machine learning with serious identification strategies to estimate causal effects, we can really start expanding the role of AI in solving critical business problems.