Friday, September 17, 2010

Lies, damn lies, and ...

“There is always a well-known solution to every human problem—neat, plausible, and wrong.” H.L. Mencken

On the subject of analytic and scientific truth…I haven’t entirely sussed out what this means to analysis, to analysts, and to me, but it leaves me questioning an awful lot of the things I’ve seen, read, and done. If nothing else, it leaves me with an even greater skepticism than I had when I woke up this morning.

I've been reading a little book titled Wrong: Why Experts Keep Failing Us--and How to Know When Not to Trust Them, by David H. Freeman. In this case, experts refers to “scientists, finance wizards, doctors, relationship gurus, celebrity CEOs, high-powered consultants, health officials, and more”—pretty much everyone who offers advice or conclusions in other words—and the book is all about the many and varied ways they (and we) get it…well…wrong most of the time. According to Freeman, we live in a world of “punctuated wrongness,” a world where, according to one expert (the irony here is intentional on my part and acknowledged on Freeman’s), “The facts suggest that for many, if not the majority, of fields, the majority of published studies are likely to be wrong…[probably] the vast majority.” This is a pretty stunning claim. In fact, if I think about this issue as a mathematician—the area of emphasis for most of my formal training and publication—I’m simply staggered by the claim. But my field is a little special I suppose, since “truth” (within the axioms) is pretty easy to spot. We may be the only discipline wherein one can actually lay legitimate claim to prove anything since ours is probably the only completely deductive intellectual endeavor. (That still doesn't mean we have any greater access to Truth, though.) In other fields of inquiry, the fundamental process is inductive—observe, hypothesize, observe, adjust, observe, adjust, etc.—and claims to proof are problematic in the extreme—which doesn’t stop anyone and everyone from using the phrase “studies show” as if they’re quoting from the Book of Heaven. But I also have a fair bit of training in statistics—both on the theory side and in applications—and one of Freeman’s explorations of “wrongness” really hit home.

Why do we use statistical methods in our research? Basically, we want to account for the fact that the world—as we observe it—is stochastic (although whether it is fundamentally stochastic might be an interesting debate) and ensure the measurements we make and the inferences derived from those observations are not (likely to be) statistical flukes. So, when we make a claim that some observation is “statistically significant” (not to be confused with a claim that something is “true”—a mistake we see far too often, even in our professional crowd) we mean there is some known probability—the level of significance—that we'll make a (Type I) mistake in our conclusion based on observing a statistical fluke. So, for example, a level of significance of .05 indicates (kinda sorta) a 5% chance that the results observed are the result of chance—and that our inferences/conclusions/recommendations are “wrong.” 1 in 20? Not so bad. How do we make the leap from there to “the majority of published studies are…wrong?”

As an exercise for the student, suppose 20 teams of researchers are all studying some novel hypothesis/theory and that this theory is “actually” false. Well, (very roughly speaking) we can expect 19 of these teams will come up with the correct ("true negative") conclusion and the 20th will experience a “data fluke” and conclude the mistaken theory is correct (a "false positive"). With me so far? Good. The problem is that this makes for a wonderful theoretical construct and ignores the confounding effects of reality—real researchers with real staff doing real research at real universities/companies/laboratories and submitting results to real journals for actual publication. Freeman has estimates from another set of experts (again with the irony!) indicating that “positive studies” confirming a theory are (one the order of) 10 times more likely to be submitted and accepted for publication than negative studies. So, we don’t get 19 published studies claiming “NO!” and one study crying “YES!” We see 2 negative studies and 1 positive study (using “squint at the blackboard” math)...and 2 out of three ain’t bad. (Isn’t that a line from a song by Meatloaf? I think it’s right before “Paradise by the Dashboard Light” on Bat Out of Hell. Anyway…) The other 17 studies go in a drawer, go in the trash, or are simply rejected. Cool, huh? Still…we don’t have anything like a majority of published studies coming out in the category of “wrong.” In the immortal words of Ron Popeil, “Wait! There’s more!”

Statistical flukes and “publication bias” aren’t the only pernicious little worms of wrongness working their way into the heart of science. “Significance” doesn’t tell us anything about study design, measurement methods, data or meta/proxy-data used, the biases of the researchers, and a brazillion other factors that bear on the outcome of an experiment, and ALL of these affect the results of a study. Each of these are a long discussion in themselves, but it suffices to say “exerts agree” (irony alert) that these are all alive and well in most research fields. So, suppose some proportion of studies have their results “pushed” in the direction of positive results—after all, positive studies are more likely to get published and result in renewed grants and professional accolades and adoring looks from doe-eyed freshman girls (because chicks dig statistics)—and suppose that proportion is in the neighborhood of an additional 20%. Accepting all these (not entirely made up) numbers, we now have 5 false positives from the original 20 studies. If all five of the “positive” studies and the expected proportion (one tenth) of the “negative” studies get published, we expect to see 7 total studies published, of which 5 come to the wrong conclusion. 5 of 7! Holy Crappy Conclusions, Batman! (Don't go reaching for that bottle of Vioxx to treat the sudden pain in your head, now.)

Freeman, following all of this, goes on to warn we should not hold science as a method or scientists themselves in low regard because of these issues. They are, in fact, our most trustworthy experts (as opposed to diet gurus, self-help goobers, television investment wankers, and other such random wieners.) They're the very best we have. Scientists are at the top of the heap, but “that doesn’t mean we shouldn’t have a good understanding of how modest compliment it may be to say so.”
CUMBAYA! It’s no wonder we poor humans muddle through life and screw up on such a grand scale so often! I need a drink, and recent studies show that drinking one glass of red wine each day may have certain health benefits…

13 comments:

  1. Eureka Merf! The difference between engineers and mathematicians clearly manifests itself in your response to my question, "...why are we awash in seemingly incompetent decisions?". Five bad apples out of seven constitutes "awash". El Guapo was awash in Gringo's with two out of three Amigo's identified even before the third fell from the sky. My only issue here is when do we ever get twenty (20) studies(in our profession)? And then purely quantitative studies that could actually make verifiable factual errors...provided we knew the true answer a priori. Looking back on the history of medical study and science surrounding Lou Gherig's disease, which apparently Lou Gherig did not have, there seems to be something less than exact about the exacting research that could lead to a cure (or at least a preventative measure)...some clue that might be "good enough" as we say in the engineering profession. Analysis for public policy decisions are a softer science...even social to annoy the purists of our profession. Should the goal be "good enough" and can we even produce at that level?

    ReplyDelete
  2. Great point that we never get 20 studies in our profession. I was talking to the broader scientific community more than our particular field--and thinking aloud about a set of issues that shocked me because I had never thought through them before.

    But having fewer studies makes the problem worse, in some sense, doesn't it? If you have only 1 study--just one chance to get it right--doesn't the pressure go up? And doesn't the pressure on our program managers increase to get the program right? And doesn't that exacerbate all of the issues associated with biased outcome (that I alluded to but didn't discuss in detail above)?

    Much of the work we do is not really statistically based, of course--no matter how much some people shout about stochastic models--but the issues of bias remain. Which facts do you choose to use? Which mental model of the world do you adopt? What assumptions do you make? How do you weight your factors? Which "experts" (screaming irony) do you enlist to testify on your behalf? What are the consequences of all those decisions? And what are the reasons (both conscious and unconscious) you made them? How does a fine American who loves his/her kids, goes to church, pays their taxes, and wants to do the right thing become “the program manager?”

    And I absolutely buy "good enough." If nothing else the problems with which we wrestle are so complex and riddles with so many necessary assumptions that there really isn't a "right" answer, and we'll definitely need to adjust fire down the road anyway. But what is "good" and what is "enough" and how do we arbitrate these? Who sets the goal posts, and how do you keep from moving them when the answer isn’t what you thought it should be?

    As an aside...Is this a new way to conceive of public policy decisions and research? Is the scientific method (observe, hypothesize, observe, change hypothesis, etc.) the model we actually use in formulating an evolving policy through which we guide our interaction with an evolving world? Do we just not recognize it because it isn’t “quantitative” (as though quantitative demonstrations aren’t rife with nonsense…see main post…and don’t get me started on false precision). Does conceiving of it in this way change the rules for policy analysis? Is that world too ideologically driven to even pretend objectivity? Or is it too easy to fake objectivity?

    Now I’m just rambling, so I’ll pause for breath…

    ReplyDelete
  3. I will not "Ramble On" after your ramble but I will point out that a Zepplin reference wedged into a blog trumps a Meatloaf reference...

    So definitely you are speaking more broadly about science...because to speak about science (observe, hypothesize, experiment, observe...etc) within the analytic work that we do is to shout loudly into the abyss.

    Nobody wants to hear about the analysis, or that it was done correctly, they only want to hear that you finished the analysis so they can defend what they were going to do with the money in the first place. This history lesson has been learned countless times. The way to win is to do your homework…classic Boyd. Although the process may be flawed and the analysis might not be complete, what is done should strive to turn over as many stones as possible. Science that turns out to be wrong comes about because they missed the paradigm not because they did the math wrong. If you are on the right track, even if incomplete or flawed, a partial answer is better than no answer. If you are on the wrong track, a complete answer is always wrong. Few understand this which is why we repeat the same studies about every five years or so.

    It frustrating but it shouldn’t dissuade us from trying…because I, like the Great Gatsby, believe in the green light, the orgastic future that year by year recedes before us. It eluded us then, but that’s no matter—tomorrow we will run faster, stretch out our arms farther…And one fine morning---

    So we beat on, boats against the current, borne back ceaselessly into the past.

    ReplyDelete
  4. One small quibble, and then I'll cease for a few hours...I don't know if I buy that "science that turns out to be wrong comes about because they missed the paradigm not because they did the math wrong." In the large sense and the long term, that may be true. In the short term, though, you can pick up any newspaper or read any professional or popular scientific publication, and on any given day find stories of intentional and unintentional failures of truth in science. The reasons are legion, but as long as we’re talking paradigms, Kuhn once said that scientists choose what to measure, how they measure it, what measurements they keep, and what they conclude from it all. A paradigm is only one of the influences on these decisions. And in many cases a bad piece of work—patently false—can be a far “stickier” message (to borrow from Gladwell). That’s the really pernicious problem—it’s far harder to speak truth to power—since truth must come with humility, caveats, assumptions, and all—than to speak false certainties.

    This is not to say we cannot seek the truth, and it does not mean we cannot do good work. If I thought good work were not possible, I couldn't get out of bed, and I certainly wouldn’t be having this conversation. But understanding the issues which infringe on good work helps us to ensure we CAN do good work. It helps us to understand how and why work that isn’t good gets done and is believed. This knowledge is a weapon, and armed with such weapons “I will not cease from Mental Fight / Nor shall my Sword sleep in my hand.”

    ReplyDelete
  5. To invoke paradigm is indeed to reference Kuhn and I did not mean the world is flat. As I wrote it I actually stumbled on the word, thought about Kuhn, dismissed the context of the larger meaning (which is perhaps the better and more accurate definition) and went with pattern or model (I forgot my audience). To solve the problem using the wrong model...which can indeed be very broad, from missing the forest for the trees, to something very specific, like using a linear approximation of something that is not (I'm guilty of that almost daily by the way) is where I was going.

    Of course now that I've been publicly found wanting (called out at least) with regard to my word choice I am forced to pull down my copy of Kuhn and find where he actually uses paradigm in the broader sense. When he said, "... scientists choose what to measure, how they measure it, and what measurements they keep..." he easily could have been talking about the lesser underlying structure of experimentation as a paradigm before jumping to the broader, most quoted, definition of a scientific revolution that changes everything. Maybe, just maybe…

    ReplyDelete
  6. I'm suddenly concerned we're arguing with each other, when I think we're saying much the same thing in different ways. What I intended by the above is that "paradigms" certainly shape the outcomes of experimentation, but they are also shaped by other, non-paradigmatic factors--selfishness, greed, fear, failures of competence, etc.
    The choices we make in conducting some analysis--methodological, statistical, mathematical, yadda yadda yadda--are important, and we need to approach those choices with our eyes wide open and a healthy respect for the factors which bias the choices and our subsequent interpretation.
    BTW...Kuhn's statement that I paraphrased above is in the context of paradigm. I took it out of that context because to assume the only issue which shapes thinking is the preconceived community theory (paradigm) is to ignore fundamental aspects of human behaviour. I was also not working from The Structure of Scientific Revolutions. The better reference is “The Function of Measurement in Modern Physical Science” (Isis, 1961).

    The choices we make in conducting some analysis--methodological, statistical, mathematical, yadda yadda yadda--are important, and we need to approach those choices with our eyes wide open and a healthy respect for the factors which bias the choices and our subsequent interpretation.

    BTW...Kuhn's statment that I paraphrased above is in the context of paradigm. I took it out of that context because to assume the only issue which shapes thinking is the preconceived community theory (paradigm) is to ignore fundamnetal aspects of human behaviour. I was alos not working from The Structure

    ReplyDelete
  7. I've also been thinking about your original comment, and I wonder if thinking about "the problem" through the statistical lens doesn't have more value in the social and public policy analysis worlds than either of us first implied.

    Isn't there, for example, and implicit statement of expectation about the world--or at least about our ability to meet the challenges in it--in an analysis of future geopolitical states, force structures, capabilities, etc? Since all of our "models" are limited in some sense and since the future is fundamentally indeterminate—aren't we really always, at some level, speaking of a probability that our recommendations will bear fruit? Consider that any scenario-based assessment supposes a distribution in "scenario space" and is implicitly statistical in its foundation if not in the methodological or mathematical language used.

    Just thinking out loud...

    ReplyDelete
  8. Don't worry we are definitely not arguing -- that's what's so fun about blogging...we are having a productive discussion by challenging each others thoughts...even it it's just to be clear. This is a really cool way to exchange ideas and I'm better at it here then in person. If we were in a room having this conversation I would be as quiet as a mouse...I need to think about what was said and take some time to think about a response. I'm like George Castanza...I always think of the zinger after I've left the room (if you are gonna yada yada I'm gonna drop a Seinfeld reference as well). And don't worry...even if you said,"I think your ideas are full of crap" and we were indeed arguing, nothing in the blogosphere is personal. If someone attacks in public there will be ten people who rise in their defense. Plus I have very tough skin. If we were in person the challenge may never occur or if it did feeling might get involved. But a lively discussion, even if we are violently agreeing, gets more readers...which reminds me...where is everyone else?

    No question if we had access to the right data for social and public policy analysis it would have more value. Collecting the right data and getting it done on time are the two major impediments. In some cases the data doesn't exist and in most cases there is very little time. Then of course once we were in possession of the right data and the appropriate amount of time...the actual problems we are discussing crop up. So I think you are correct...it's there, it's always been there, but the question is one of precision...if 75% is as good as we can ever get, even with all the data in the world...and the we experts (irony) do the work correctly...in some cases the expert intuition of others might actually be better then the scientific experiment. We battle those who think their intuition is always right with tools and methods that on our best day might get us to to the fabled 75%...and on our worst day (lets not talk about that)...

    Oh snap! Spurs and Wolves just kicked off...I will return later.

    ReplyDelete
  9. Hello-- Natasha here, first time blogger, long time listener.... I do agree that social and public policy analysis is a different animal, and I don't know that experts do that well, even if they are good at analyzing data.

    I read a good book over the summer that looked the weaknesses of "experts" that try to forecast political outcomes. (Expert Political Judgment by Phil Tetlock) No book report here, I just wanted to highlight one section of the book that I think is applicable to your discussion. Self defenses people use when their analysis is "wrong" or their forecast didn't come true...
    (1) making the point that unlikely things sometimes happen (a la black swan)
    (2) insisting they made the "right" mistake (i.e. even if we didn't find WMD, going into Iraq was a good idea)
    (3) declaring that politics is hopelessly indeterminate (your crystal ball worked better than mine)
    (4) just off on timing
    (5) close-call counterfactual (I was almost right, if not for x, y, z)
    (6) the exogenous shock
    (7) challenging whether the conditions for hypothesis testing were fulfilled ( if x had been satisfied, then y would have occurred)

    I think these defenses crop up in all analysis worlds, and being aware of them has made it easier for me to spot one when someone whips one out.

    ReplyDelete
  10. Great point, Natasha. It's funny that the "explaining away" happens in every field, and several of the justifications you describe above are applied in fields beyond political outcomes and policy predictions.

    This is my first whack at the blogsphere, too. Fun so far...but I think math is fun, so take that for what it's worth.

    ReplyDelete
  11. @ Tasha. These are awesome points and ones I think deserve there own blog post. I think we should return to them again and again. If we are calling this blog "Truth in Our Profession" then these points become the "Seven Deadly Sins of Our Profession". I'm going to give each one of these some more thought but right off the bat if any of us were ever to be caught uttering one of these seven deadly sins to a customer we should be immediately excommunicated from our midst. To utter one is to have fundamentally misunderstood the work they were supposed to be doing.

    ReplyDelete
  12. Thank you Merf, Mooch, and Natasha for the cool discussion. Is a foursome okay? We can play golf later…

    Permit me to make an observation using a paraphrase/adaptation from Clausewitz. The theory of strategy is all well and good for understanding war (talk about gutsy – claiming to understand war), but one of its main values is to give to the genius (Pierce’s “intuit”) language to explain what he sees with his coup d'oeil. So, even if my analysis is faulty, it may still have value. That is to support the “right” cause – even with its imperfection. I am not, however, implying that the merit of our analysis, in its own right, isn’t important or of the highest value to our profession. I just mean to propose an explanation for a phenomenon that you are describing. Now what do we do when our boss lacks the genius’ insight? I suppose we are left to share our best and strive to be correct. Or find the genius and work for him/her. I find it instructive that Boyd knew his Ps curves long before he proved them. He had the “truth” of it, but first had to first learn his math, steal, computer time, and burn a hole or two in the boss’ tie before he was heard. Fortunately he had top-cover from some who recognized his genius.

    So it’s not just that our analysis is “correct,” “sticky,” prove our point, or that we picked the correct frameworks (paradigms being one category) for what we are analyzing, we need to learn to look for the genius of an idea (finding the genius person would be ideal, but for any given situation any mere mortal has the potential to “get it”) – how do we do that? Do you three have that in your checklists? I don’t hear much talk about it – perhaps it’s not possible. Is it possible? How do you do it?

    Okay, eyes wide open, but not seeing exactly what I am looking at. Do you see it too?

    Mooch, did I just do one of the seven deadly sins?

    ReplyDelete
  13. I think it's our collective position that intuition must be learned and practiced...that sounds odd, but Napoleon's glance was possible only because the situations that would unfold before him were so similar to many situations that had come before.

    My position with regard to 95% of the analysis that we do, at most, will reinforce the intuition of those with the proper experience to understand...problem is we have many decision makers without that experience and that's when it runs into trouble...and when we simply are faced with something new and everyone stares at the problem like a pig looking at a watch. We also have tools and techniques to decompose the problem into things we might then understand.

    Along with the 95% work to reinforce intuition comes the helpful attributes of analysis that include the framework to capture the visualize the decision space, that same framework also makes the analysis repeatable and traceable so we know where the results came from and can do it again, and then finally our ability to use this framework to do sensitivity analysis to change our subjective inputs to determine if they are wrong how bad does the picture change.

    Of course every once in awhile the 5% pokes up its head the results of the analysis are counter intuitive. That's the work we live for...except 4% of that turns out to be counterintuitive because we screwed up or we missed something. That leaves 1% for real discovery.

    So I don't think you are wrong about intuition...but very few people have the right intuition and a large percentage of people think they do.

    ReplyDelete