Why aren’t we doing the maths?
The practical implications of misplaced confidence when dealing with statistical evidence are obvious and worrying
A little two-part test for you. Imagine you’re a doctor, considering whether to recommend a particular kind of cancer screening, “A”. You discover that this form of screening improves five-year survival rates from 68 per cent to 99 per cent. (The five-year survival rate is the proportion of patients alive five years after the cancer was discovered.) The question is: does the screening test “A” save lives?
Part two: now you consider an alternative screening test, “B”. You discover that test “B” reduces cancer deaths from two per 1,000 people to 1.6 per 1,000 people. So: does screening test “B” save lives?
The second question is easier. Screening test “B” unambiguously saves lives: to be precise it saves 0.4 lives per 1,000 people. That might not seem a lot – and if the test is expensive or has unpleasant side-effects it might not be worth it – but that is the nature of cancer screening. Most people don’t have the cancer in question so most people cannot be helped by the test.
What about screening test “A”? This question is harder. The numbers look impressive, but survival rates are a treacherous way to evaluate a screening programme. Imagine a group of 60-year-olds who all develop an incurable cancer that will kill them at 70. They have no symptoms until age 67, so the five-year survival rate when they are diagnosed at 67 is, I’m afraid, zero. Introduce a screening programme and you can discover the cancer much earlier, at age 62. The five-year survival rate is now 100 per cent. But the screening hasn’t saved any lives: it’s merely given early warning of a disease that cannot be treated.
In general, screening programmes look impressive when evaluated by survival rates, because the purpose of screening is to detect the cancer earlier. Whether any lives are saved or not is a different issue entirely.
I’ll admit, this is a tricky pair of questions. You’d have to be a doctor, rigorously trained in how to handle the evidence base for medical treatments, to get this sort of thing right. But here’s the bad news: doctors do not get this sort of thing right.
An article published in the Annals of Internal Medicine in March put these questions to a panel of more than 400 doctors with relevant clinical experience. Eighty-two per cent thought they’d been shown evidence that test “A” saved lives – they hadn’t – and of those, 83 per cent thought the benefit was large or very large. Only 60 per cent thought that test “B” saved lives, and fewer than one-third thought the benefit was large or very large – which is intriguing, because of the few people on course to die from cancer, the test saves 20 per cent of them. In short, the doctors simply did not understand the statistics on cancer screening.
The practical implications of this are obvious and worrying. It seems that doctors may need a good deal of help interpreting the evidence they are likely to be exposed to on clinical effectiveness, while epidemiologists and statisticians need to think hard about how they present their discoveries.
The situation could be worse. A recent survey by the Royal Statistical Society’s “getstats” campaign asked MPs to give the probability of getting two heads when tossing a coin twice. More than half failed to get the answer correct – including a humiliating three-quarters of Labour MPs.
The answer, of course, is 25 per cent, and is appallingly basic stuff. If I try to translate from numeracy to literacy, I’d say that the doctors’ failure was the equivalent of being unable to write a decent essay about “The Waste Land”, while the MPs’ failure was more like the inability to read a newspaper.
The Royal Statistical Society reported that about three-quarters of MPs said they felt confident when dealing with numbers. This confidence is misplaced.
Also published at ft.com.





25 Comments
Mo says:
“about three-quarters of MPs said they felt confident when dealing with numbers. This confidence is misplaced.”
And, thanks to the Dunning-Kruger and related effects, it’s difficult to persuade them of their failing. I wonder how many of those MPs, on being told the right answer, either refused to believe it or else laughed off their wrongness as unimportant.
27th of October, 2012Michael Hales says:
Yes but how are you supposed to learn about it? Probability and statistics is not part of the national curiculum, probably ‘cos it’s too interesting and if as an adult I’d like to learn more about the subject the alternatives are Wikipedia or take three years off work snd use my entire life savings!
27th of October, 2012There’s no point in bleating on about the lack of knowledge about a subject if you don’t offer ways of learning about said subject. It’s almost as if the powers-that-be don’t want us to know.
phayes says:
Of course even if they learn how to interpret the intrinsically unintuitive ‘inferences’ they find in the orthodox stats based literature, the poor doctors still won’t be able to trust them. It really is a scandal that these fundamentally flawed post-Laplacian and pre-Jaynesian concepts and methods are still being taught.
27th of October, 2012Norm says:
Point me to any online training you may know of. Probability always escapes me.
27th of October, 2012Ivar Sonbo Kristiansen says:
Although the last paragraph is well placed, I have problems with some of the previous arguments. First, health care does not save lives, only prolong them. Second, if a screening program prolongs life (on average!), it should be measurable in terms of all cause mortality, not only cause specific. Example: Screening for colorectal cancer reduces colorectal cancer mortality, but has no impact on all cause mortality according to a Cochrane report. Third, statements about prolongtion of life should be based on longer follow-up than five years because of potential “crossing survival curves”.
29th of October, 2012Mike says:
Reminds me of an exercise in my statistics course.
Assume that an HIV test has a 95% chance of detecting HIV and a 1% chance of a false positive. Assume further that (can’t remember, let’s say) 0.05% of the population have HIV.
Should everyone be tested for HIV?
Seemed counterintuitive to me at the time, but the false positives outweigh the actual detections by quite a lot, so the answer would be no.
29th of October, 2012hoover says:
Why aren’t we doing the English? Screening doesn’t improve survival rates; what you do after screening may.
As mathematicians and economists you may think this is pedantry. As somebody who writes and teaches English, I disagree.
29th of October, 2012Michelle Taylor says:
For the people saying ‘point me to online training resources for statistics’ – http://www.khanacademy.org/math/statistics
29th of October, 2012David says:
“You discover that test “B” reduces cancer deaths from two per 1,000 people to 1.6 per 1,000 people.”
Is it possible that the reason that doctors were confused is because the questions ask whether testing, in and of itself, saves lives. The question seems to suggest some kind of Schrodinger’s-Cat or Goodhart-Law effect where the act of measuring changes the outcome. I guess that might be the case (bringing this type of cancer to people’s attention causes them to alter their lifestyles), but presumably that isn’t what the question was aiming at.
D’oh, I’ve just read Hoover’s comment, which basically says what I’ve said (although I’d point out that I’m, roughly speaking, an economist)
29th of October, 2012ProfDC says:
In addition to the “early warning” effect, we ought to be careful about which cancers are in the sample. Test “A” can show a 99% 5-year survival rate by selectively identifying only the cancers that have a longer than 5-year mortality window, and giving a false negative for the other cancers.
If it ignores all the shorter-mortality cancers (and thus doesn’t diagnose anyone suffering from them as “having cancer” at all), the 5-year survival rate (as defined as ‘the proportion of patients alive five years after the cancer was discovered’) can even be 100%. What an amazing test! It’s worse than no test at all at predicting cancer deaths, but its 5-year survival rate is perfect!
29th of October, 2012ProfDC says:
Similarly, if Test “B” kills 0.5 people per thousand through a side-effect of the test, they didn’t die of cancer — and thus Test B can ‘reduce() cancer deaths from two per 1,000 people to 1.6 per 1,000 people’ while increasing deaths from *cancer or testing* by 0.1 per 1K. Would you still say Test B “saves lives” then?
29th of October, 2012Mark says:
Very well put, ‘hoover’. I liked both your comment and your perfect use of punctuation. Please carry on being pedantic.
29th of October, 2012George says:
For anyone tempted to look at the Khan Academy stuff on statistics –
http://learnandteachstatistics.wordpress.com/2012/07/30/khan-not-good/
Also it is slightly unfair to have a go at doctors for not understanding lead time bias. Doctors are not responsible for evaluating screening programmes, this is the job of public health professionals.
29th of October, 2012martin says:
May I recommend “Bad Medicine” to one and all.
30th of October, 2012Mark says:
This is another case where a statistician really just only looks at the statistics. Medical professionals, on the other hand, have to pay closer attention to the practicality of screening very cost/benefit.
For Test B, it would mean that you would have to test 10,000 people for 4 lives to be saved. And you have not even factored in specificity, which could artificially raise disease incidence and subject patients to treatment they do not need.
Test A has merit too, for all those positively screened it is a very significant value to look upon. The whole point of cancer screening is not curative, it is to ensure patients get treated before it becomes too advanced, and they can live as long as possible after treatment. As far as curative treatment comes, we all keep our fingers crossed, which is why the five-year survival is significant.
If all a statistician can pompously say after staring at numbers is “bad bad medicine”, sounds to me you are missing the big picture.
31st of October, 2012Matt Page says:
No link/source for the RSS report? How do we know if the data reported is statistically significant?
Matt
PS Martin I presume you mean “Bad Science”? I don’t think Bon Jovi songs can offer much help.
31st of October, 2012Pete says:
Of course MPs can do the Maths, just look at their expense claims. Some of those are highly creative manipulation of numbers.
31st of October, 2012B D McCullough, Ph.D. says:
I deduced that Screening A saves lives. I did so based on a ceteris paribus assumption. If your question had been, “Can you come up with assumptions under which Screening A does not save lives?” I’d have answered correctly. You asked the wrong question, and I’m disappointed in your journalistic skills. Your question, given your answer, was patently misleading.
31st of October, 2012Gary says:
Check out Ben Goldacre: http://www.badscience.net/
31st of October, 2012Doctor Skeptic says:
I agree with Ivar and ProfDC, screening B does not “unambiguously save lives” as it only reports disease-specific mortality. Often, this does not equate to changes in all-cause mortality, which is more important.
31st of October, 2012Campbell says:
B-
31st of October, 2012LER says:
Mark, you’ve demonstrated the author’s point. You are correct that there are issues about overdiagnosis and overtreatment, given that even a test with a high specificity will return many false positives where prevalence is low. But the application of NNT to a screening test is misleading (of course screening tests have very high NNT), and the point about lead-time bias is that no one lives any longer or gets any closer to that future cure you imagine–you just notice the cancer sooner. Med students all learn about lead-time bias in first year and then go on to forget it, making them vulnerable to marketing pitches for expensive, harmful and ineffectual treatments. The author’s mistake is to compare this medical error to not being able to write an essay on the Waste Land. It should be compared to a doctor’s not knowing the normal range for blood pressure.
1st of November, 2012Mark says:
The 5 year survival rate is a decent measure of the effectiveness of treatments to prolong (rather than save) lives. In the context of treatments this makes perfect sense, and it’s only when applied to screening that it loses its meaning.
The doctors failed to notice the subtlety of the scenario, which, to be fair, was designed to catch them out. It’s not a very meaningful test of doctors’ knowledge.
1st of November, 2012Nicola Ward Petty says:
Thanks for this reminder. I have written a post related to this, showing a straight-forward way (I hope) of making sense of probabilities related to screening.
4th of November, 2012http://learnandteachstatistics.wordpress.com/2012/11/05/probability/
phayes says:
Hmmm… If Khan’s feelings have been hurt by the criticisms of his (poor) teaching of the flaky pseudo-inference he’s learned I hope he reads this:
http://www.bio.ri.ccf.org/robrien/K12shortcourse0707/PDFsToCopy/Section5(5)/5.3_GoodmanAnnIntMed99all.pdf
and then this:
http://www.getstats.org.uk/getstats-stats-glossary/statistical-significance/
6th of November, 2012