Survivorship Bias And The Stories You Never Hear
Abraham Wald and the Planes
Abraham Wald's bomber analysis is the canonical example of survivorship bias for good reason: the logic is clean, the stakes were high, and the correct answer is counterintuitive in exactly the way the bias predicts.
Wald was a Hungarian-born Jewish mathematician who fled to the United States in 1938 after the Nazi annexation of Austria. He worked for the Statistical Research Group at Columbia University during the war, which applied statistical methods to military problems. The bullet hole problem is his most famous contribution.
The military's data showed that returning bombers had damage concentrated in specific areas — wings, fuselage, certain sections of the tail. The obvious inference was that these areas were being hit by enemy fire and should be reinforced. Wald argued that this reasoning was backwards.
The planes the military was examining had survived enemy fire. The fact that they had damage in the wings and fuselage and had returned meant that wing and fuselage damage was survivable. The planes that had been shot down — the ones not in the data — were presumably being taken down by other kinds of hits. The absence of engine damage in the survivor data was evidence that engine damage was lethal. It didn't show up because planes with engine damage didn't return.
Wald's recommendation: reinforce the areas with no bullet holes. The military's initial intuition was the opposite. The correct answer required recognizing that the sample (returning planes) was not representative of the full population (all planes), and that the selection mechanism — planes that survive return; planes that don't, don't — created systematic bias in the visible data.
This is a logical structure that appears everywhere, and recognizing it requires the same move Wald made: asking who's not in the sample and why.
The Selection Mechanism
Survivorship bias is a special case of selection bias: the sample you observe is not representative of the population you want to draw conclusions about, because the sampling process is systematically non-random.
In the plane case, the selection mechanism is physical survival. In other contexts:
Reporting bias: Only newsworthy events get reported. The base rate of unremarkable outcomes is invisible. This distorts perception of risk (plane crashes in the news vastly overrepresent actual flight risk relative to car travel), political perception (dramatic events dominate coverage; slow structural changes are invisible), and medical risk (reported side effects are not a random sample of side effects — they're the ones reported, which are correlated with severity and with the characteristics of reporting populations).
Publication bias: Scientific journals prefer to publish positive findings. Studies that find no effect are less likely to be submitted, and when submitted, less likely to be accepted. This means the published literature systematically overstates effect sizes and the prevalence of positive results. Meta-analyses that don't correct for publication bias overestimate effects. The replication crisis in psychology is partly a consequence: the published record was biased toward dramatic, positive findings that were easier to publish but harder to replicate.
Recall bias: Survivors of an experience recall it; non-survivors may not be around to recall it, or may not be asked. Soldiers who survived combat are interviewed about what made them effective; those who didn't survive are absent. The advice of combat veterans is useful, but it's advice drawn from a specific subset — the ones who survived to give advice.
Sample attrition: In longitudinal studies, participants who drop out tend to differ systematically from those who stay. If the dropouts are doing worse, and worse outcomes are the thing you're measuring, your final sample will look better than the original population. Many therapy and addiction treatment outcome studies suffer from this: the people who don't respond to treatment stop showing up, leaving a sample biased toward responders.
Market selection: In markets, businesses that survive have demonstrated the ability to survive market selection. Studying surviving businesses to understand what makes businesses successful confounds success-relevant characteristics with survival-relevant characteristics, which overlap but are not identical. Surviving businesses may have survived because of genuine quality, or because of luck, or because of characteristics (flexibility, low overhead, conservative financing) that correlate with survival but not necessarily with long-term excellence.
The Success Literature Problem
The business success genre is a billion-dollar industry and one of the most systematically biased information sources in common use.
The canonical problem: Jim Collins's Good to Great (2001) identified eleven companies that transitioned from good to sustained greatness and analyzed what they had in common. The book generated enormous consulting revenue and is still widely cited. Phil Rosenzweig's The Halo Effect (2007) pointed out the fundamental methodology problem: Collins selected the companies based on outcomes and then identified characteristics that the successful companies shared. This confounds correlation with causation and ignores survivorship.
Companies that failed might have shared many of the same characteristics as the successful ones at the time of the transition. You can't know from Collins's method, because he only looked at successes. The characteristics identified as driving success might actually be ubiquitous characteristics of large companies at that stage, some of which happened to succeed and some of which didn't.
The halo effect compounds this: when we know a company succeeded, we perceive its characteristics more positively. The CEO of a company that thrives is described as "decisive and visionary"; the same behavior in a CEO of a company that fails is described as "stubborn and out of touch." Retrospective assessments of company culture, leadership, and strategy are contaminated by outcome knowledge.
Nassim Taleb makes a related point about "black swans" in business: rare, extreme events drive outcomes in ways that make the past record of survivors a poor guide to strategy. If the distribution of business outcomes has a fat tail — a small number of businesses generating enormous returns while most fail — then strategies optimized to avoid failure may be systematically different from strategies optimized to achieve extreme success. The survivor record tells you about the strategies of the extreme successes, but not about the distribution of outcomes from those strategies, which included many failures you don't see.
Seeking Out the Failures
Correcting for survivorship bias requires proactive effort because failure is invisible by default. You have to go looking for it.
Base rate information. Before drawing conclusions from success stories, find the base rate. What fraction of people who tried this approach succeeded? What's the failure rate for businesses in this industry? What's the success rate for this type of medical treatment? Base rates aren't always available, but when they are, they're essential context.
Failure studies. Some domains have systematic failure research. Aviation has accident investigation (the NTSB investigates every significant crash, with findings publicly published — a systematic effort to learn from failures, not just from survivors). Medicine has complication registries, adverse event reporting, and morbidity and mortality conferences. These are institutionalized mechanisms for keeping failure visible.
Active solicitation of failure narratives. When talking to practitioners in a domain, don't just ask about successes. Ask about failures: "What have you tried that didn't work? What do you see people do that causes projects like this to fail?" The failure narratives contain information the success narratives can't provide.
Pre-mortem as anti-survivorship tool. The pre-mortem forces you to imagine failure before it happens, preventing the success stories from dominating your planning. It's a technique to synthetically introduce failure data into a decision process that would otherwise be driven by available success examples.
Tracking the missing. Ask who is not in the conversation, whose experience is not represented, what stories you haven't heard. If you're evaluating a therapy, ask about people who didn't respond. If you're evaluating a business strategy, ask about companies that tried it and failed. If you're evaluating an educational approach, ask about the students it didn't work for.
The Visibility Asymmetry
A deeper structural point: success generates evidence that persists; failure generates evidence that disappears.
The successful startup founder writes a book, gives keynotes, is cited in the press. The failed startup founder goes to work in a different industry and doesn't publicize the failure. The successful book sells for decades; the unsuccessful books go out of print. The successful organization lives to tell its story; the failed one does not. The successful surgical technique gets adopted and documented; the abandoned approaches leave sparse records.
This creates a built-in distortion in the accumulation of knowledge. The evidence that persists is systematically biased toward success. Over time, this means that the received wisdom in any domain is disproportionately drawn from surviving examples. Conventional wisdom about how to run a business, how to write a book, how to structure an organization, how to execute a strategy — all of it is drawn from the record of what survived selection processes, not from the full distribution of attempts.
This doesn't mean the conventional wisdom is wrong. But it means you should apply additional skepticism when the supporting evidence is heavily weighted toward surviving exemplars. And it means that the most valuable research in any domain often involves actively studying failures: what went wrong, what looked good at the time, what was indistinguishable from eventual successes at the early stages.
The Epistemological Consequence
Survivorship bias is ultimately an epistemological problem. It's a systematic distortion in the information available to you, not a failure of logic given accurate information. Even if you reason perfectly, if your inputs are biased by survivorship, your conclusions will be biased.
This means that corrections for survivorship bias have to happen at the data-collection stage, not just the analysis stage. You can't think your way out of survivorship bias if the failure data simply isn't there. You have to actively build systems that capture failure data: the near-miss reports, the dropped-out trial participants, the closed businesses, the abandoned strategies.
In personal decision-making, this translates to humility about the examples you're drawing on. Your examples of successful people, successful relationships, successful businesses, successful careers are not a random sample of attempts. They're the survivors. The people, relationships, businesses, and careers that failed are not in your mental model in the same way, because they didn't generate the same persistent evidence.
Wald's insight wasn't that the military data was wrong. It was that the military was answering the wrong question — or asking the right question about the wrong population. The population that mattered was all planes that flew into combat, not just planes that returned. Keeping that distinction — between the sample you can see and the population you care about — is the ongoing work of thinking clearly in a world that mostly shows you the survivors.
Comments
Sign in to join the conversation.
Be the first to share how this landed.