I'd like to introduce a young man whose studies in chemistry at the University of London were interrupted by the Second World War. As a chemist, he was assigned to a chemical defense experimental station with the British Army Engineers and undertook work determining the effect of poison gas. There, he was faced with mountains of data on the effects of various doses of various compounds on rats and mice. Since the Army could provide no statistician, and since this bright lad had once read R.A. Fisher's revolutionary Statistical Methods for Research Workers, he was drafted to do the work. Thus was born a statistician who would become the Director of the Statistical Research Group at Princeton (where he married one of Fisher's daughters), create the Department of Statistics at the University of Wisconsin, and exert incredible influence in the fields of statistical inference, robustness (a word he defined and introduced into the statistical lexicon), and modelling; experimental design and response surface methodology; time series analysis and forecasting; and distribution theory, transformation of variables, and nonlinear estimation ... one might just as well say he influenced "statistics" and note that a working statistician owes much of his or her craft to this man whether they know it or not. Ladies and gentlemen, I'd like to introduce George Edward Pelham Box.
George Box |
We've all heard this aphorism, even if we do not know it springs from George Box. While true in the most fundamental sense possible and a profoundly important insight, what isn't widely understood or acknowledged is how incredibly dangerous and damaging this idea is and why, despite its veracity, we should ruthlessly suppress its use. True but dangerous? How is this possible?
Listen carefully the next time you hear this mantra produced. The key is the manner in which most use the statement, emphasizing the first half as exculpatory ("It doesn't matter that my model is wrong, since all models are") and the latter half as permissive. The forgiveness of intellectual sins implicit in the first half of the statement requires of the analyst or programmatic and planning partisan no examination of the sin and its consequences; we are forgiven, for we know not what we do ... though we should know and should not be forgiven for turning a blind eye.
Once forgiven, the utility of the model is elevated as the only criterion of interest, but this is a criterion with no definition. As such it admits all manner of pathologies that contradict the intent of Box in framing this discussion of statistical methods. Consider a somewhat more expansive discussion of the same concept. In the same work quoted above, Box and Draper wrote,
Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.And earlier, in a marvelous 1976 paper, Box averred,
Since all models are wrong the scientist cannot obtain a 'correct' one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so over-elaboration and over-parameterization is often the mark of mediocrity.In each case, the question of utility is tied explicitly to the question of how wrong the model is or is not. Similarly, this is precisely the emphasis in Einstein's famous injunction, often quoted "as simple as possible, but no simpler." He actually said,
It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.In all cases, it is the phenomenon under investigation, the datum of experience, that forms the basis for evaluating the relative utility of the model that is, of necessity, not a perfect representation of the phenomenon. The question is not whether the model is right or wrong, useful or useless. The real question of interest is whether the model is right in the ways that matter. The two concepts, separated by a lowly conjunction in Box's famed quote, are not separate in his intellectual construct. Yet they are very much so in the the framing of the quote as typically (ab)used.
A Spherical Chicken and a Vacuum? |
So why write about these things? The answer is simple. These questions affect every aspect of nearly every problem a military analyst will face--whether that analyst is an operations research analyst, an intelligence analyst, a strategist engaged in course of action analysis,etc. Examples abound.
Consider the ubiquitous 1-n list, a model of decision making that problematically imposes a strict, transitive order in preferences, treats cost and benefit as marginal with all the path dependence possibilities that entails, and does not typically account for interaction and dependencies across the list, all of which compromise the utility of the list as a tool. The model is, however, simple, easy to explain, and conveys some sense of rigor in the construction of the list ... even if none exists. Useful indeed.
Or consider the notion of risk as an expected value expressed via the product of probability and consequence. With no meaningful characterizaion of the underlying distribution in the probability half of this formula, risk degenerates to a simple point estimate with no consideration of the heaviness of the probability tails and the relative likelihood of extremity. Or worse, it is implicitly viewed as a Gaussian distribution because that is what we've been taught to expect, and extreme outcomes are unwittingly eliminated from our calculus. On a related note, when considering a given scenario (within the scope of the various Defense Planning Scenarios) and speaking of risk, are we considering the likelihood of a given scenario (by definition asymptotically close to zero) or the likelihood of some scenario in a given class? This sounds a bit academic, but it is also the sort of subtle phenomenon that can influence our thinking based on the assessment frame we adopt. As such, characterizing the output of such a model as a description of risk is specious at best.
John Maynard Keynes |
Frank H. Knight |
Further, the question of consequence is no less problematic. What do we mean by consequence and how do we quantify it (because the probability/consequence model of risk demands an ordered consequence)? And how does the need for a quantifiable expression of consequence shape the classes of events and outcomes we consider? Does it bias the questions we ask and information we collect, shifting the world subtly into a frame compatible with the probability/consequence mode of orienting to it? What are the consequences of being wrong in such a case?
Continuum of Conflict, 2015 U.S. National Military Strategy |
It offers another kind of list, though, based on the Defense Planning Scenarios. Since each scenario is assigned a numerical value, and since real numbers are well ordered we suddenly have a continuum of conflict. This model may be useful--it certainly makes the complex simple--but is it right in the ways that matter? The continuum makes each of the types of conflict shown effectively similar, differing only in degree. Even the implication of such a continuum is dangerous if it leads military planners to believe the ways and means associated with these forms of conflict identical or that one form of conflict can be compartmentalized in our thinking. Perhaps some room should be made for the notion that more is not always simply more; sometimes more is different, but this is an idea explicitly excluded from an ordering like that presented here.
Blind Men Building Models of an Elephant |
Modeling Counterinsurgency in Afghanistan |
Or what about our models of human nature and the international system. Are we classical realists, structural realists, institutionalists, Romantics, Marxists, or something else? The structural realism of Kenneth Waltz is famously parsimonious, abstracting a great deal into billiard balls that interact on the basis of power alone (a description that is itself more parsimonious than is fair). But this leaves us with a model that cannot explain critical phenomena and necessitates expansion and refinement--see Stephen Walt's balance of threat, for example, a socially constructed concept outside the Waltz model. In the end, we are faced with a model and not with reality, with approximations of truth and not with truth itself.
This notion is particularly important in thinking about the veracity and utility of our models. They are, in fact, models. In all cases, the intent is an "adequate representation of a single datum of experience." But in studying our models we can become detached from experience and attach ourselves to the models themselves, associate our intellectual value with their form and behavior, and make them into things worthy of study unto themselves. In short, we are wont to reify them, a process Peter Berger and Thomas Luckman describe as
... the apprehension of the products of human activity as if they were something else than human products-such as facts of nature, results of cosmic laws, or manifestations of divine will. Reification implies that man is capable of forgetting his own authorship of the human world, and further, that the dialectic between man, the producer, and his products is lost to consciousness. The reified world is ... experienced by man as a strange facticity, an opus alienum over which he has no control rather than as the opus proprium of his own productive activity.
Auguste Rodin, The Thinker |
Finally, if the model is wrong, we must demand a new model more closely aligned to the question of interest, a model right enough to be useful. And this is not just a task for analysts and mathematicians, though it is our duty. This is a task for planners, strategists, operators, decision makers, and everyone else. We must seek the truth, even if we may not find it.
First, however, we should probably scrub Box's exculpatory and damaging aphorism from our decision-making discourse.
Merf, very much agree. Just like "How to Lie with Statistics" is a very unfortunate name for a book that is all about getting it right, not about lying with statistics. Once a catchy phase or idea enters the vernacular or main stream it is almost impossible to turn off. In my MORS Ethics pitch I say this book is our "JAWS" meaning, Peter Benchly never set out to scare the world into fearing sharks...with an end result of the lives of many sharks paying the price. How much good statistics found their end in front of a decision maker opposed to a concept and chose to believe that statistics didn't necessarily have to be telling them the truth? Same with useful models...all wrong. Nope, some very right...maybe the climate change models are currently vying for more "rightness" before they can be useful...even though they can't be entirely wrong...some would cast them as that way. Now, all that said, there are two cases you haven't considered. I, for instance, never took the Box expression to mean, literally the model we are using is wrong, please forgive, it's the best we've got. I always took it to mean the model you are using has errors...it's impossible to model reality. Useful models are closer to reality than useless models...which is why we cannot predict with STORM, for instance. And it follows that a model can thus be very very wrong (with regard to reality), but still be found useful. Now in this second case, and I believe Kent Taylor will chime in here, a model that is very wrong, might still have usefulness if the analyst using the model has half a brain. Kent has always used the expression to mean, this model is crap, but we don't care about the model we care about he analysis. So we are smart enough to know where the crap is hiding so our analysis, we let the model do what doesn't require thinking, we think, and thus the analysis is still gold. I'm guilty of this usage as well. So it's was never exculpatory for me. But I see exactly what you are saying. Shame on us! And I will thus pay more attention to what other's might perceive if the expression is used without to make excuses for our proud profession.
ReplyDeleteIt's like many expressions inside a professional community. We know what we mean, and we tend to mean it with great precision. The problem doesn't become critical until 1) someone, analyst or orherwise, has a vested interest or 2) we are outside the guild. Further, if the model is truly crap, we do need to care about that fact. Either the analysis does not need it or the analysis does need it. In the former case, it is a waste and a distraction; in the latter, it invalidates your intellectual position. If we mean "crap in almost every way except this limited frame in which it illuminates a particular issue," that's different.
DeleteMerf, You threw Kent under the bus for something I attributed to him...so I have to rise in defense. Because I am an engineer I believe the statement. And since you are a mathematician you have a visceral reaction to using something that is "crap" to get you close enough (There it is again, LOL!). I'm not going to throw any particular model under the bus, but there is one near and dear to our hearts that is so wrong as to rise to the level of crappy and to be out and out dangerous if used incorrectly if it were to find itself in the wrong hands. Yet we beat on, boats against the current, borne back ceaselessly into the past...Because there is nothing else that can gonculate the extreme complexities within the problem...something back of the envelop or even a spreadsheets can't track because of the number of moving part. Some would throw it out (CAPE). Others would say, we got nothing else (USAF) so lets do the best we can. I'm split, three ways. I know it's crap. I know we can use it to compare things unrelated to real world outcomes. And, I agree with you, I know we can use it to illuminate issues unforeseen. I guess you might argue that crap is in the eye of the beholder and an extremely relative thing. In the right hands, which I've allowed for, those hands might elevate the model to slightly higher than a steamy pile, thus it could transcend crap, if only for a short time. I contend, time and time again, years after that transcendence, we've discovered something, deep in the bowels of said model, that makes us say, OMG! We were using a steamy pile all along, and we are propelled back along the lines of CAPE..ready to put a cap in it's ass, only to see a new problem come along that we need it for, because it's the only thing on the street. Personally, I believe in the green light, the orgastic future that year by year recedes before us. It eluded us then, but that’s no matter — tomorrow we will run faster, stretch out our arms farther.... And one fine morning —— (I take full responsibility for using FSF quotes out of order from the original text--but they just worked that way)
ReplyDeleteAbsolutely wasn't intending to throw Kent under the bus. In fact, I agree completely. And despite being a dirty mathematician, I'm perfectly comfortable with wrong but useful. (Have you notice that you keep using "mathematician" as a way to impute positions to me that I don't hold and paint me as some sort of ivory-tower academic in opposition to a practical-man-of-the-world engineer who is, by definition, better? Strange way to argue.) In the longish post, I''m not demanding perfection from our models. I'm asking if they are right in the ways that matter and if they are good enough. Newton's Laws are wrong, but they also got us to the moon and back. When you say "close enough" you've stated and applied a standard of utility and a tolerance for error that is appropriate. You've linked how wrong the model is to how well it suits your question, and that means the model is not crap. If it were, you couldn't do those things. There is another standard to apply--how easy it is to use a tool incorrectly--that we might apply as well, and some tools are worse than others in this regard, but that's a question separate from whether or not the model is wrong. (See PowerPoint, for example, and the model "near and dear to our hearts.")
DeleteAll is forgiven of a person who uses the Green Light in a discussion about models.