Friday, April 1, 2011

Simpson's Paradox

Simpson's Paradox is a particularly well known statistical paradox. At least, it's well known among statisticians.  It is less well known among everyone else.

A layman's version goes something like this: imagine there are two ways of transporting emergency patients to the hospital.  One is by helicopter, the other is by ambulance.  50% of patients who go to the hospital by helicopter die, while only about 21% of patients who go by ambulance die. Clearly, there is something wrong with helicopters, we should make all patients go the hospital by ambulance.

But wait! There are two categories of emergency patients at this hospital, those in normal condition and those who are in critical condition. Of those who travel to the hospital in normal condition by ambulance, 12.5% die while only 10% of normal patients who by helicopter die.  Likewise, of those who travel to the hospital by ambulance in critical condition, 75% of them die while only 70% of critical patients who by helicopter die. What's going on?


Normal Normal died Critical Critical died Total Total died
Helicopter 100 10 200 140 300 150
Ambulance 600 75 100 75 700 150
Total 700 85 300 215 1000 200


What happens is that two thirds of critical patients go by helicopter while only one seventh of normal patients do.  Since normal patients die at a much lower rate than critical patients (roughly 12% as compared to roughly 72%), this skews the data to make ambulance looks much safer than helicopters, even though helicopters are way safer.

Most of the time, that is.


The fascinating moral of this story:  don't immediately assume a relation based on a single correlation. Look deeper for causes.

No comments:

Post a Comment