Friday, August 7, 2015

Right, Wrong, and Relevance


I'd like to introduce a young man whose studies in chemistry at the University of London were interrupted by the Second World War. As a chemist, he was assigned to a chemical defense experimental station with the British Army Engineers and undertook work determining the effect of poison gas. There, he was faced with mountains of data on the effects of various doses of various compounds on rats and mice. Since the Army could provide no statistician, and since this bright lad had once read R.A. Fisher's revolutionary Statistical Methods for Research Workers, he was drafted to do the work. Thus was born a statistician who would become the Director of the Statistical Research Group at Princeton (where he married one of Fisher's daughters), create the Department of Statistics at the University of Wisconsin, and exert incredible influence in the fields of statistical inference, robustness (a word he defined and introduced into the statistical lexicon), and modelling; experimental design and response surface methodology; time series analysis and forecasting; and distribution theory, transformation of variables, and nonlinear estimation ... one might just as well say he influenced "statistics" and note that a working statistician owes much of his or her craft to this man whether they know it or not. Ladies and gentlemen, I'd like to introduce George Edward Pelham Box.

George Box
But Box hardly needs an introduction. You already know him and no doubt quote him regularly even if you are not a professional analyst or statistician. Let me prove it to you. George Box, in his work with Donald Draper on Empirical Model-Building and Response Surfaces, coined the phrase, "Essentially, all models are wrong, but some are useful."

We've all heard this aphorism, even if we do not know it springs from George Box. While true in the most fundamental sense possible and a profoundly important insight, what isn't widely understood or acknowledged is how incredibly dangerous and damaging this idea is and why, despite its veracity, we should ruthlessly suppress its use. True but dangerous? How is this possible?

Listen carefully the next time you hear this mantra produced. The key is the manner in which most use the statement, emphasizing the first half as exculpatory ("It doesn't matter that my model is wrong, since all models are") and the latter half as permissive. The forgiveness of intellectual sins implicit in the first half of the statement requires of the analyst or programmatic and planning partisan no examination of the sin and its consequences; we are forgiven, for we know not what we do ... though we should know and should not be forgiven for turning a blind eye.

Once forgiven, the utility of the model is elevated as the only criterion of interest, but this is a criterion with no definition. As such it admits all manner of pathologies that contradict the intent of Box in framing this discussion of statistical methods. Consider a somewhat more expansive discussion of the same concept. In the same work quoted above, Box and Draper wrote,
Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.
And earlier, in a marvelous 1976 paper, Box averred,
Since all models are wrong the scientist cannot obtain a 'correct' one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so over-elaboration and over-parameterization is often the mark of mediocrity.
In each case, the question of utility is tied explicitly to the question of how wrong the model is or is not. Similarly, this is precisely the emphasis in Einstein's famous injunction, often quoted "as simple as possible, but no simpler." He actually said,
It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.
In all cases, it is the phenomenon under investigation, the datum of experience, that forms the basis for evaluating the relative utility of the model that is, of necessity, not a perfect representation of the phenomenon. The question is not whether the model is right or wrong, useful or useless. The real question of interest is whether the model is right in the ways that matter. The two concepts, separated by a lowly conjunction in Box's famed quote, are not separate in his intellectual construct. Yet they are very much so in the the framing of the quote as typically (ab)used.

A Spherical Chicken and a Vacuum?
Why does this matter? As a separate idea, "useful" is applied without considering the meaning of the word. Does it mean useful in the sense that it illustrates and illuminates some natural phenomenon, as Box would have it? Or does it mean that the model is simple enough to be understood by a non-technical audience? On the other hand, it may mean that our tools appear to deliver answers to life, the universe, and everything, a chimera some see as worth chasing. Or perhaps it means that the model is simple in the sense that Herbert Weisberg uses to characterize "willful ignorance" which "entails simplifying our understanding in order to quantify our uncertainty as mathematical probability." And it is no great stretch to extend this notion of willful ignorance to the ways in which we frame underlying assumptions regarding the structure of the elements and interactions in a model to facilitate their mathematical and/or digital representation. There is a great physics joke in this vein involving a spherical chicken in a vacuum, but that's a story for another day. If any of these begin to affect our assertions regarding utility, we have crossed over into a territory where utility becomes a permissive cause for intellectual failure, and that is a dangerous territory.

So why write about these things? The answer is simple. These questions affect every aspect of nearly every problem a military analyst will face--whether that analyst is an operations research analyst, an intelligence analyst, a strategist engaged in course of action analysis,etc. Examples abound.

Consider the ubiquitous 1-n list, a model of decision making that problematically imposes a strict, transitive order in preferences, treats cost and benefit as marginal with all the path dependence possibilities that entails, and does not typically account for interaction and dependencies across the list, all of which compromise the utility of the list as a tool. The model is, however, simple, easy to explain, and conveys some sense of rigor in the construction of the list ... even if none exists. Useful indeed.

Or consider the notion of risk as an expected value expressed via the product of probability and consequence. With no meaningful characterizaion of the underlying distribution in the probability half of this formula, risk degenerates to a simple point estimate with no consideration of the heaviness of the probability tails and the relative likelihood of extremity. Or worse, it is implicitly viewed as a Gaussian distribution because that is what we've been taught to expect, and extreme outcomes are unwittingly eliminated from our calculus. On a related note, when considering a given scenario (within the scope of the various Defense Planning Scenarios) and speaking of risk, are we considering the likelihood of a given scenario (by definition asymptotically close to zero) or the likelihood of some scenario in a given class? This sounds a bit academic, but it is also the sort of subtle phenomenon that can influence our thinking based on the assessment frame we adopt. As such, characterizing the output of such a model as a description of risk is specious at best. 

John Maynard Keynes
This isn't the end of the issue vis-à-vis probability, though, and there are deeper questions about the model we use as we seek some objective concept of probability to drive our decisions. The very notion of an objective probability is (or at least once was and probably still should be) open to doubt. Consider A Treatise on Probability, a seminal work of John Maynard Keynes--a mathematician and philosopher of long before he became one of the fathers of modern macroeconomics--or Risk, Uncertainty, and Profit by Frank H. Knight, both first published in 1921. Both, in the formative days of the modern theory of probability, put forward a notion that probability is inherently subjective. Knight, for example, includes in his notion of risk (i.e., probability) the question of confidence: "The action which follows upon an opinion depends as much upon the confidence in that opinion as upon the favorableness of the opinion itself." But if subjective consequence is inherent to assessments of probability and risk, we enter into all manner of human cognitive shenanigans. Does increasing information increase the objective assessment of probability, the subjective assessment of confidence, both, or neither? There is some evidence to suggest the second and not the first, with all manner of consequences for how we conceive of risk (and for notions of information dominance and network-centric warfare). But these are central questions for models of decision making under risk.
Frank H. Knight

Further, the question of consequence is no less problematic. What do we mean by consequence and how do we quantify it (because the probability/consequence model of risk demands an ordered consequence)? And how does the need for a quantifiable expression of consequence shape the classes of events and outcomes we consider? Does it bias the questions we ask and information we collect, shifting the world subtly into a frame compatible with the probability/consequence mode of orienting to it? What are the consequences of being wrong in such a case?

Continuum of Conflict,
2015 U.S. National Military Strategy
There is an interesting corollary relationship between the the numerical output model of risk and 1-n lists in the sense that the numerical output provides a de facto list. Et voila! The model is useful, at least in one sense.

It offers another kind of list, though, based on the Defense Planning Scenarios. Since each scenario is assigned a numerical value, and since real numbers are well ordered we suddenly have a continuum of conflict. This model may be useful--it certainly makes the complex simple--but is it right in the ways that matter? The continuum makes each of the types of conflict shown effectively similar, differing only in degree. Even the implication of such a continuum is dangerous if it leads military planners to believe the ways and means associated with these forms of conflict identical or that one form of conflict can be compartmentalized in our thinking. Perhaps some room should be made for the notion that more is not always simply more; sometimes more is different, but this is an idea explicitly excluded from an ordering like that presented here.

Blind Men Building Models of an Elephant
Another interesting question arises from the ways in which these conflicts are modeled as we seek to develop computations of the consequences in them or to develop recommendations for the force structures best aligned with the demands of give scenarios. How will we represent the scenarios, our forces, the forces of the adversary, and their respective strategies? Will attrition define the objectives, and, if so, what is the model for attrition we will use and how does that model for attrition apply across the continuum of conflict? Will our enemies be volitional, dynamic, and devious or static and inanimate? Will we make simplifying assumptions of linearity, an assumption that sounds esoteric but matters in the sense that a nonlinear model exhibits behaviors a linear model cannot replicate, may be more difficult to develop and interpret, and is also generally more reflective of reality. Stainslaw Ulam's adage--"Using a term like nonlinear science is . . . like referring to the bulk of zoology as the study of non-elephant animals”--is a trenchant reminder of this principle.
Modeling Counterinsurgency
in Afghanistan
But this does not mean linear representations are necessarily inappropriate or without value, and precise emulation can be taken too far. Will we proceed down a path of non-linear interactions and voluminous detail, toeing Box's line of "excessive elaboration," as we often do with large-scale campaign simulations or the (perhaps unfairly) infamous effort to model the dynamics of counterinsurgency in Afghanistan? What does utility mean in each of these cases, and what does "right in the ways that matter" mean here?

Or what about our models of human nature and the international system. Are we classical realists, structural realists, institutionalists, Romantics, Marxists, or something else? The structural realism of Kenneth Waltz is famously parsimonious, abstracting a great deal into billiard balls that interact on the basis of power alone (a description that is itself more parsimonious than is fair). But this leaves us with a model that cannot explain critical phenomena and necessitates expansion and refinement--see Stephen Walt's balance of threat, for example, a socially constructed concept outside the Waltz model. In the end, we are faced with a model and not with reality, with approximations of truth and not with truth itself.

This notion is particularly important in thinking about the veracity and utility of our models. They are, in fact, models. In all cases, the intent is an "adequate representation of a single datum of experience." But in studying our models we can become detached from experience and attach ourselves to the models themselves, associate our intellectual value with their form and behavior, and make them into things worthy of study unto themselves. In short, we are wont to reify them, a process Peter Berger and Thomas Luckman describe as
... the apprehension of the products of human activity as if they were something else than human products-such as facts of nature, results of cosmic laws, or manifestations of divine will. Reification implies that man is capable of forgetting his own authorship of the human world, and further, that the dialectic between man, the producer, and his products is lost to consciousness. The reified world is ... experienced by man as a strange facticity, an opus alienum over which he has no control rather than as the opus proprium of his own productive activity.
Auguste Rodin, The Thinker
This suggests an important remedy to the problem of models that are wrong in ways that matter. If we recognize them as the products of human activity, as opus proprium, and not as handed down from authority, then they can be superseded by new products of human ingenuity. They are not sacred, and when we say a model is wrong, our next though should never be to apologize for the model ("but it is useful"). Rather, our thoughts should turn to whether the model is right in the ways that matter. This is the only proper way to defend our work.

Finally, if the model is wrong, we must demand a new model more closely aligned to the question of interest, a model right enough to be useful. And this is not just a task for analysts and mathematicians, though it is our duty. This is a task for planners, strategists, operators, decision makers, and everyone else. We must seek the truth, even if we may not find it.

First, however, we should probably scrub Box's exculpatory and damaging aphorism from our decision-making discourse.

4 comments:

  1. Merf, very much agree. Just like "How to Lie with Statistics" is a very unfortunate name for a book that is all about getting it right, not about lying with statistics. Once a catchy phase or idea enters the vernacular or main stream it is almost impossible to turn off. In my MORS Ethics pitch I say this book is our "JAWS" meaning, Peter Benchly never set out to scare the world into fearing sharks...with an end result of the lives of many sharks paying the price. How much good statistics found their end in front of a decision maker opposed to a concept and chose to believe that statistics didn't necessarily have to be telling them the truth? Same with useful models...all wrong. Nope, some very right...maybe the climate change models are currently vying for more "rightness" before they can be useful...even though they can't be entirely wrong...some would cast them as that way. Now, all that said, there are two cases you haven't considered. I, for instance, never took the Box expression to mean, literally the model we are using is wrong, please forgive, it's the best we've got. I always took it to mean the model you are using has errors...it's impossible to model reality. Useful models are closer to reality than useless models...which is why we cannot predict with STORM, for instance. And it follows that a model can thus be very very wrong (with regard to reality), but still be found useful. Now in this second case, and I believe Kent Taylor will chime in here, a model that is very wrong, might still have usefulness if the analyst using the model has half a brain. Kent has always used the expression to mean, this model is crap, but we don't care about the model we care about he analysis. So we are smart enough to know where the crap is hiding so our analysis, we let the model do what doesn't require thinking, we think, and thus the analysis is still gold. I'm guilty of this usage as well. So it's was never exculpatory for me. But I see exactly what you are saying. Shame on us! And I will thus pay more attention to what other's might perceive if the expression is used without to make excuses for our proud profession.

    ReplyDelete
    Replies
    1. It's like many expressions inside a professional community. We know what we mean, and we tend to mean it with great precision. The problem doesn't become critical until 1) someone, analyst or orherwise, has a vested interest or 2) we are outside the guild. Further, if the model is truly crap, we do need to care about that fact. Either the analysis does not need it or the analysis does need it. In the former case, it is a waste and a distraction; in the latter, it invalidates your intellectual position. If we mean "crap in almost every way except this limited frame in which it illuminates a particular issue," that's different.

      Delete
  2. Merf, You threw Kent under the bus for something I attributed to him...so I have to rise in defense. Because I am an engineer I believe the statement. And since you are a mathematician you have a visceral reaction to using something that is "crap" to get you close enough (There it is again, LOL!). I'm not going to throw any particular model under the bus, but there is one near and dear to our hearts that is so wrong as to rise to the level of crappy and to be out and out dangerous if used incorrectly if it were to find itself in the wrong hands. Yet we beat on, boats against the current, borne back ceaselessly into the past...Because there is nothing else that can gonculate the extreme complexities within the problem...something back of the envelop or even a spreadsheets can't track because of the number of moving part. Some would throw it out (CAPE). Others would say, we got nothing else (USAF) so lets do the best we can. I'm split, three ways. I know it's crap. I know we can use it to compare things unrelated to real world outcomes. And, I agree with you, I know we can use it to illuminate issues unforeseen. I guess you might argue that crap is in the eye of the beholder and an extremely relative thing. In the right hands, which I've allowed for, those hands might elevate the model to slightly higher than a steamy pile, thus it could transcend crap, if only for a short time. I contend, time and time again, years after that transcendence, we've discovered something, deep in the bowels of said model, that makes us say, OMG! We were using a steamy pile all along, and we are propelled back along the lines of CAPE..ready to put a cap in it's ass, only to see a new problem come along that we need it for, because it's the only thing on the street. Personally, I believe in the green light, the orgastic future that year by year recedes before us. It eluded us then, but that’s no matter — tomorrow we will run faster, stretch out our arms farther.... And one fine morning —— (I take full responsibility for using FSF quotes out of order from the original text--but they just worked that way)

    ReplyDelete
    Replies
    1. Absolutely wasn't intending to throw Kent under the bus. In fact, I agree completely. And despite being a dirty mathematician, I'm perfectly comfortable with wrong but useful. (Have you notice that you keep using "mathematician" as a way to impute positions to me that I don't hold and paint me as some sort of ivory-tower academic in opposition to a practical-man-of-the-world engineer who is, by definition, better? Strange way to argue.) In the longish post, I''m not demanding perfection from our models. I'm asking if they are right in the ways that matter and if they are good enough. Newton's Laws are wrong, but they also got us to the moon and back. When you say "close enough" you've stated and applied a standard of utility and a tolerance for error that is appropriate. You've linked how wrong the model is to how well it suits your question, and that means the model is not crap. If it were, you couldn't do those things. There is another standard to apply--how easy it is to use a tool incorrectly--that we might apply as well, and some tools are worse than others in this regard, but that's a question separate from whether or not the model is wrong. (See PowerPoint, for example, and the model "near and dear to our hearts.")

      All is forgiven of a person who uses the Green Light in a discussion about models.

      Delete