Truth in Our Profession: probability

Showing posts with label probability. Show all posts

Friday, August 7, 2015

Right, Wrong, and Relevance

I'd like to introduce a young man whose studies in chemistry at the University of London were interrupted by the Second World War. As a chemist, he was assigned to a chemical defense experimental station with the British Army Engineers and undertook work determining the effect of poison gas. There, he was faced with mountains of data on the effects of various doses of various compounds on rats and mice. Since the Army could provide no statistician, and since this bright lad had once read R.A. Fisher's revolutionary Statistical Methods for Research Workers, he was drafted to do the work. Thus was born a statistician who would become the Director of the Statistical Research Group at Princeton (where he married one of Fisher's daughters), create the Department of Statistics at the University of Wisconsin, and exert incredible influence in the fields of statistical inference, robustness (a word he defined and introduced into the statistical lexicon), and modelling; experimental design and response surface methodology; time series analysis and forecasting; and distribution theory, transformation of variables, and nonlinear estimation ... one might just as well say he influenced "statistics" and note that a working statistician owes much of his or her craft to this man whether they know it or not. Ladies and gentlemen, I'd like to introduce George Edward Pelham Box.

George Box

But Box hardly needs an introduction. You already know him and no doubt quote him regularly even if you are not a professional analyst or statistician. Let me prove it to you. George Box, in his work with Donald Draper on Empirical Model-Building and Response Surfaces, coined the phrase, "Essentially, all models are wrong, but some are useful."

We've all heard this aphorism, even if we do not know it springs from George Box. While true in the most fundamental sense possible and a profoundly important insight, what isn't widely understood or acknowledged is how incredibly dangerous and damaging this idea is and why, despite its veracity, we should ruthlessly suppress its use. True but dangerous? How is this possible?

Listen carefully the next time you hear this mantra produced. The key is the manner in which most use the statement, emphasizing the first half as exculpatory ("It doesn't matter that my model is wrong, since all models are") and the latter half as permissive. The forgiveness of intellectual sins implicit in the first half of the statement requires of the analyst or programmatic and planning partisan no examination of the sin and its consequences; we are forgiven, for we know not what we do ... though we should know and should not be forgiven for turning a blind eye.

Once forgiven, the utility of the model is elevated as the only criterion of interest, but this is a criterion with no definition. As such it admits all manner of pathologies that contradict the intent of Box in framing this discussion of statistical methods. Consider a somewhat more expansive discussion of the same concept. In the same work quoted above, Box and Draper wrote,

Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.

And earlier, in a marvelous 1976 paper, Box averred,

Since all models are wrong the scientist cannot obtain a 'correct' one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so over-elaboration and over-parameterization is often the mark of mediocrity.

In each case, the question of utility is tied explicitly to the question of how wrong the model is or is not. Similarly, this is precisely the emphasis in Einstein's famous injunction, often quoted "as simple as possible, but no simpler." He actually said,

It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.

In all cases, it is the phenomenon under investigation, the datum of experience, that forms the basis for evaluating the relative utility of the model that is, of necessity, not a perfect representation of the phenomenon. The question is not whether the model is right or wrong, useful or useless. The real question of interest is whether the model is right in the ways that matter. The two concepts, separated by a lowly conjunction in Box's famed quote, are not separate in his intellectual construct. Yet they are very much so in the the framing of the quote as typically (ab)used.

A Spherical Chicken and a Vacuum?

Why does this matter? As a separate idea, "useful" is applied without considering the meaning of the word. Does it mean useful in the sense that it illustrates and illuminates some natural phenomenon, as Box would have it? Or does it mean that the model is simple enough to be understood by a non-technical audience? On the other hand, it may mean that our tools appear to deliver answers to life, the universe, and everything, a chimera some see as worth chasing. Or perhaps it means that the model is simple in the sense that Herbert Weisberg uses to characterize "willful ignorance" which "entails simplifying our understanding in order to quantify our uncertainty as mathematical probability." And it is no great stretch to extend this notion of willful ignorance to the ways in which we frame underlying assumptions regarding the structure of the elements and interactions in a model to facilitate their mathematical and/or digital representation. There is a great physics joke in this vein involving a spherical chicken in a vacuum, but that's a story for another day. If any of these begin to affect our assertions regarding utility, we have crossed over into a territory where utility becomes a permissive cause for intellectual failure, and that is a dangerous territory.

So why write about these things? The answer is simple. These questions affect every aspect of nearly every problem a military analyst will face--whether that analyst is an operations research analyst, an intelligence analyst, a strategist engaged in course of action analysis,etc. Examples abound.

Consider the ubiquitous 1-n list, a model of decision making that problematically imposes a strict, transitive order in preferences, treats cost and benefit as marginal with all the path dependence possibilities that entails, and does not typically account for interaction and dependencies across the list, all of which compromise the utility of the list as a tool. The model is, however, simple, easy to explain, and conveys some sense of rigor in the construction of the list ... even if none exists. Useful indeed.

Or consider the notion of risk as an expected value expressed via the product of probability and consequence. With no meaningful characterizaion of the underlying distribution in the probability half of this formula, risk degenerates to a simple point estimate with no consideration of the heaviness of the probability tails and the relative likelihood of extremity. Or worse, it is implicitly viewed as a Gaussian distribution because that is what we've been taught to expect, and extreme outcomes are unwittingly eliminated from our calculus. On a related note, when considering a given scenario (within the scope of the various Defense Planning Scenarios) and speaking of risk, are we considering the likelihood of a given scenario (by definition asymptotically close to zero) or the likelihood of some scenario in a given class? This sounds a bit academic, but it is also the sort of subtle phenomenon that can influence our thinking based on the assessment frame we adopt. As such, characterizing the output of such a model as a description of risk is specious at best.

John Maynard Keynes

This isn't the end of the issue vis-à-vis probability, though, and there are deeper questions about the model we use as we seek some objective concept of probability to drive our decisions. The very notion of an objective probability is (or at least once was and probably still should be) open to doubt. Consider A Treatise on Probability, a seminal work of John Maynard Keynes--a mathematician and philosopher of long before he became one of the fathers of modern macroeconomics--or Risk, Uncertainty, and Profit by Frank H. Knight, both first published in 1921. Both, in the formative days of the modern theory of probability, put forward a notion that probability is inherently subjective. Knight, for example, includes in his notion of risk (i.e., probability) the question of confidence: "The action which follows upon an opinion depends as much upon the confidence in that opinion as upon the favorableness of the opinion itself." But if subjective consequence is inherent to assessments of probability and risk, we enter into all manner of human cognitive shenanigans. Does increasing information increase the objective assessment of probability, the subjective assessment of confidence, both, or neither? There is some evidence to suggest the second and not the first, with all manner of consequences for how we conceive of risk (and for notions of information dominance and network-centric warfare). But these are central questions for models of decision making under risk.

Frank H. Knight

Further, the question of consequence is no less problematic. What do we mean by consequence and how do we quantify it (because the probability/consequence model of risk demands an ordered consequence)? And how does the need for a quantifiable expression of consequence shape the classes of events and outcomes we consider? Does it bias the questions we ask and information we collect, shifting the world subtly into a frame compatible with the probability/consequence mode of orienting to it? What are the consequences of being wrong in such a case?

Continuum of Conflict,
2015 U.S. National Military Strategy

There is an interesting corollary relationship between the the numerical output model of risk and 1-n lists in the sense that the numerical output provides a de facto list. Et voila! The model is useful, at least in one sense.

It offers another kind of list, though, based on the Defense Planning Scenarios. Since each scenario is assigned a numerical value, and since real numbers are well ordered we suddenly have a continuum of conflict. This model may be useful--it certainly makes the complex simple--but is it right in the ways that matter? The continuum makes each of the types of conflict shown effectively similar, differing only in degree. Even the implication of such a continuum is dangerous if it leads military planners to believe the ways and means associated with these forms of conflict identical or that one form of conflict can be compartmentalized in our thinking. Perhaps some room should be made for the notion that more is not always simply more; sometimes more is different, but this is an idea explicitly excluded from an ordering like that presented here.

Blind Men Building Models of an Elephant

Another interesting question arises from the ways in which these conflicts are modeled as we seek to develop computations of the consequences in them or to develop recommendations for the force structures best aligned with the demands of give scenarios. How will we represent the scenarios, our forces, the forces of the adversary, and their respective strategies? Will attrition define the objectives, and, if so, what is the model for attrition we will use and how does that model for attrition apply across the continuum of conflict? Will our enemies be volitional, dynamic, and devious or static and inanimate? Will we make simplifying assumptions of linearity, an assumption that sounds esoteric but matters in the sense that a nonlinear model exhibits behaviors a linear model cannot replicate, may be more difficult to develop and interpret, and is also generally more reflective of reality. Stainslaw Ulam's adage--"Using a term like nonlinear science is . . . like referring to the bulk of zoology as the study of non-elephant animals”--is a trenchant reminder of this principle.

Modeling Counterinsurgency
in Afghanistan

But this does not mean linear representations are necessarily inappropriate or without value, and precise emulation can be taken too far. Will we proceed down a path of non-linear interactions and voluminous detail, toeing Box's line of "excessive elaboration," as we often do with large-scale campaign simulations or the (perhaps unfairly) infamous effort to model the dynamics of counterinsurgency in Afghanistan? What does utility mean in each of these cases, and what does "right in the ways that matter" mean here?

Or what about our models of human nature and the international system. Are we classical realists, structural realists, institutionalists, Romantics, Marxists, or something else? The structural realism of Kenneth Waltz is famously parsimonious, abstracting a great deal into billiard balls that interact on the basis of power alone (a description that is itself more parsimonious than is fair). But this leaves us with a model that cannot explain critical phenomena and necessitates expansion and refinement--see Stephen Walt's balance of threat, for example, a socially constructed concept outside the Waltz model. In the end, we are faced with a model and not with reality, with approximations of truth and not with truth itself.

This notion is particularly important in thinking about the veracity and utility of our models. They are, in fact, models. In all cases, the intent is an "adequate representation of a single datum of experience." But in studying our models we can become detached from experience and attach ourselves to the models themselves, associate our intellectual value with their form and behavior, and make them into things worthy of study unto themselves. In short, we are wont to reify them, a process Peter Berger and Thomas Luckman describe as

... the apprehension of the products of human activity as if they were something else than human products-such as facts of nature, results of cosmic laws, or manifestations of divine will. Reification implies that man is capable of forgetting his own authorship of the human world, and further, that the dialectic between man, the producer, and his products is lost to consciousness. The reified world is ... experienced by man as a strange facticity, an opus alienum over which he has no control rather than as the opus proprium of his own productive activity.

Auguste Rodin, The Thinker

This suggests an important remedy to the problem of models that are wrong in ways that matter. If we recognize them as the products of human activity, as opus proprium, and not as handed down from authority, then they can be superseded by new products of human ingenuity. They are not sacred, and when we say a model is wrong, our next though should never be to apologize for the model ("but it is useful"). Rather, our thoughts should turn to whether the model is right in the ways that matter. This is the only proper way to defend our work.

Finally, if the model is wrong, we must demand a new model more closely aligned to the question of interest, a model right enough to be useful. And this is not just a task for analysts and mathematicians, though it is our duty. This is a task for planners, strategists, operators, decision makers, and everyone else. We must seek the truth, even if we may not find it.

First, however, we should probably scrub Box's exculpatory and damaging aphorism from our decision-making discourse.

Tuesday, July 14, 2015

Ladies Tasting Tea, Determinism, and Comfort With Contingency

Karl Pearson, Public Domain

At the encouragement of a statistician friend of mine, I recently read David Salsburg's book, The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. This was a delightful read in a hundred ways, especially in it's effort to humanize the statistical luminaries of the 20th century, and I highly recommend it. That said, there was one idea--it seems, in fact, as if this idea is a main objective for writing the book--that left me wondering whether the author had taken leave of his senses. Bear with me while I work through this idea and it's implications for analysis.

Here is what Salsburg has to say:

Over a hundred years ago, Karl Pearson proposed that all observations arise from probability distributions and that the purpose of science is to estimate the parameters of those distributions. Before that, the world of science believed that the universe followed laws, like Newton's laws of motion, and that any apparent variation in what was observed were due to errors ... Eventually, the deterministic approach to science collapsed because the differences between what the models predicted and what was actually observed grew greater with more precise measurements. Instead of eliminating the errors that Laplace thought were interfering with the ability to observe the true motion of the planets, more precise measurements showed more and more variation. At this point, science was ready for Karl Pearson and his distributions with parameters.

The first thing that troubles me is a strange, Platonic (metaphysical) realism in the statement that "observations arise from probability distributions" as if these distributions were actual things rather than mathematical characterizations of the observed probabilistic behavior of actual things (or the probabilistic observations of the deterministic behaviors of actual things). At the risk of labeling myself some sort of radical nominalist, this seems an odd and difficult pill to swallow. This does not mean, however, that Pearson's effort to shift our attention from individual observations to more fundamental concepts of parameters that describe the totality of observations is problematic. It only means that the notion of an unobservable pure distribution of which we observe only imperfect shadows is an infelicitous representation of Pearson's work. So, this is not the major objection to Salsburg's purpose and point, but it is the (shaky) foundation on which he proceeds to build his house, and it is the house that presents the more significant problem.

Salsburg seems to fundamentally misunderstand the concept of Kuhn's paradigm shift, the accumulation of anomalies (i.e., the growing differences between observations and expectations in planetary motions, in this case), and the relationship between these phenomena and the philosophical positions of determinism and probabilism. (Incidentally, he also seems to reify this model of science, but that's a problem for another day.) The increasing variation from prediction lamented as a flaw of worldview is in fact such a flaw, but not a flaw in determinism as such but rather a flaw in the model of planetary dynamics as derived from Newton's laws of gravity and motion. The model of planetary motion was wrong--as models are--and this manifested more clearly once methods of measurement improved. This leads not to a revolution in probabilistic worldviews but rather a revolution in the model of gravity and planetary motion (i.e., relativity). So, while the errors of measurement are probabilistic, the source of changing error is systemic. These are different, and need to be treated differently (one statistically and one deterministically).

Henri Poincare
Public Domain

That means there is no fundamental disagreement between the worldviews--probabilistic and deterministic--that Salsburg sets in opposition to each other (at least as he's characterized them ... there are deeper philosophical divides, but Salsburg is really a determinist in disguise). Henri Poincaré writes in Chapter IV of The Foundations of Science that "we have become absolute determinists, and even those who want to reserve the rights of human free will let determinism reign undividedly in the inorganic world at least." He then goes on to discuss in detail the nature of chance, or "the fortuitous phenomena about which the calculus of probabilities will provisionally give information" and describe two fundamental forms of chance: statistically random phenomena and sensitivity to initial conditions. He writes:

If we could know exactly the laws of nature and the situation of the universe at the initial instant, we should be able to predict exactly the situation of this same universe at a subsequent instant. But even when the natural laws should have no further secret for us, we could know the initial situation only approximately.

Since we can know the exact condition of the universe only approximately (because we are finite, because humans have freedom of non-rational choice, becuase we are irrational and our models shape our observations, because Heisenberg dictates that imprecision is fundamental, etc.) all phenomena are thus to some degree or another functionally probabilistic for even the most determined determinist.

Carl von Clausewitz
Public Domain

The form of chance observed is then a product of the underlying dynamics and laws of the system under observation. Are we dealing with statistically random phenomena in which, when we have eliminated large and systemic errors, "there remain many small ones which, their effects accumulating, may become dangerous" and produce results attributed "to chance because their causes are too complicated and too numerous?" (The similarity to Clausewitz's discussion of friction is no coincidence.) Or are we dealing with nonlinear phenomena in which the single small error (or the butterfly flapping it's wings) yields outcomes all out of proportion to the error? Is there a structural reason for the particular distribution we see in the chance behavior? And what parameters describe these distributions?

These are important questions for analysts, with important implications. We bound our systems in time, space and scope for the purposes of tractability, introducing error. We make assumptions regarding the structure of our systems (analogous to the application of Newton's laws to planetary motion), introducing more errors. We measure, anticipate, and assume all manner of inputs to our analytic systems, introducing yet more error.

So what does this mean for us? As analysts we must everyday ask ourselves, "What errors are we introducing, what is their character, what is their structure, and how will they interact with other errors and the system itself?" And we must become comfortable with facing these uncertainties (something occasionally difficult for those of us with too many math classes under our belts).

Reading, thinking and writing about something for analysts to consider.