The Future of Everything

February 7, 2017

Big data versus big theory

Filed under: Forecasting — Tags: — David @ 4:05 pm

The Winter 2017 edition of Foresight magazine includes my commentary on the article Changing the Paradigm for Business Forecasting by Michael Gilliland from SAS. A longer version of Michael’s argument can be read on his SAS blog, and my response is below.

Michael Gilliland argues convincingly that we need a paradigm shift in forecasting, away from an “offensive” approach that is characterized by a reliance on complicated models, and towards a more “defensive” approach which uses simple but robust models. As he points out, we have been too focussed on developing highly sophisticated models, as opposed to finding something that actually works in an efficient way.

Gilliland notes that part of this comes down to a fondness for complexity. While I agree completely with his conclusion that simple models are usually preferable to complicated models, I would add that the problem is less an obsession with complexity per se, than with building detailed mechanistic models of complexity. And the problem is less big data, than big theory.

The archetype for the model-centric approach is the complex computer models of the atmosphere used in weather forecasting, which were pioneered around 1950 by the mathematician John von Neumann. These weather models divide the atmosphere (and sometimes the oceans) into a three-dimensional grid, and use equations based on principles of fluid flow to compute the flow of air and water. However many key processes, such as the formation and dissipation of clouds, cannot be derived from first principles, so need to be approximated. The result is highly complex models that are prone to model error (the “butterfly effect” is a secondary concern) but still do a reasonable job of predicting the weather a few days ahead. Their success inspired a similar approach in other areas such as economics and biology

The problem comes when these models are pushed to make forecasts beyond their zone of validity, as in climate forecasts. And here, simple models may actually do better. For example, a 2011 study by Fildes and Kourentzes showed that, for a limited set of historical data, a neural network model out-performed the conventional climate model approach; and a combination of a Holt linear trend model with a conventional model led to an improvement of 18 percent in forecast accuracy over a ten-year period.[1]

As the authors noted, while there have been many studies of climate models, “few, if any, studies have made a formal examination of their comparative forecasting accuracy records, which is at the heart of forecasting research.” This is consistent with the idea that complex models are favored, not because they are necessarily better, but for institutional reasons.

Another point shown by this example, though, is that models associated with big data, complexity theory, etc., can actually be simpler than the models associated with the reductionist, mechanistic approach. So for example a neural network model might run happily on a laptop, while a full climate model needs a supercomputer. We therefore need to distinguish between model complexity, and complexity science. A key lesson of complexity science is that many phenomena (e.g. clouds) are emergent properties which are not amenable to a reductionist approach, so simple models may be more appropriate.

Complexity science also changes the way we think about uncertainty. Under the mechanistic paradigm, uncertainty estimates can be determined by making random perturbations to parameters or initial conditions. In weather forecasting, for example, ensemble forecasting ups the complexity level by making multiple forecasts and analysing the spread. A similar approach is taken in economic forecasts. However if error is due to the model being incapable of capturing the complexity of the system, then there is no reason to think that perturbing model inputs will tell you much about the real error (because the model structure is wrong). So again, it may be more appropriate to simply estimate error bounds based on past experience and update them as more information becomes available.

Complexity versus simplicity

An example from a different area is the question of predicting heart toxicity for new drug compounds. Drug makers screen their compounds early in the development cycle by testing to see whether they interfere with several cellular ion channels. One way to predict heart toxicity based on these test results is to employ teams of researchers to build an incredibly complicated mechanistic model of the heart, consisting of hundreds of differential equations, and use the ion channel inputs as inputs. Or you can use a machine learning model. Or, most complicated, you can combine these in a multi-model approach. However my colleague Hitesh Mistry at Systems Forecasting found that a simple model, which simply adds or subtracts the ion channel readings – the only parameters are +1 and -1 – performs just as well as the multi-model approach using three large-scale models plus a machine learning model (see Complexity v Simplicity, the winner is?).

Now, to obtain the simple model Mistry used some fairly sophisticated data analysis tools. But what counts is not the complexity of the methods, but the complexity of the final model. And in general, complexity-based models are often simpler than their reductionist counterparts. Clustering algorithms employ some fancy mathematics, but the end result is clusters, which isn’t a very complicated concept. Even agent-based models, which simulate a system using individual software agents that interact with one another, can involve a relatively small number of parameters if designed carefully.

People who work with big data, meanwhile, are keenly aware of the problem of overfitting – more so it would appear then the designers of reductionist models which often have hundreds of parameters. Perhaps the ultimate example of such models is the dynamic stochastic equilibrium models used in macroeconomics. Studies show that these models have effectively no predictive value (which is why they are not used by e.g. hedge funds), and one reason is that key parameters cannot be determined from data so have to be made up (see The Trouble With Macroeconomics by Paul Romer, chief economist at the World Bank).

One reason we have tended to prefer mechanistic-looking models is that they tell a rational cause-and-effect story. When making a forecast it is common to ask whether a certain effect has been taken into account, and if not, to add it to the model. Business forecasting models may not be as explicitly reductionist as their counterparts in weather forecasting, biology, or economics, but they are still often inspired by the need to tell a consistent story. A disadvantage of models that come out of the complexity approach is that they often appear to be black boxes. For example the equations in a neural network model of the climate system might not tell you much about how the climate works, and sometimes that is what people are really looking for.

When it comes to prediction, as opposed to description, I therefore again agree with Michael Gilliland that a ‘defensive’ approach makes more sense. But I think the paradigm shift he describes is part of, or related to, a move away from reductionist models, which we are realising don’t work very well for complex systems. With this new paradigm, models will be simpler, but they can also draw on a range of techniques that have developed for the analysis of complex systems.

[1] Fildes, R., and N. Kourentzes. “Validation and forecasting accuracy in models of climate change.” International Journal of Forecasting 27 (2011): 968–995.



Blog at