I know plenty of economists who are fond of Guinness, but not many who realise just how important the beer has been to the profession. The man who laid the foundations for the global success of Guinness also produced one of the most important tools in economics – and a tool that is widely mishandled today.
Faced with an apparent pattern in any data, a key question is always: “Does this pattern represent something real, or is it just chance?” The simplest example: if I measure the heights of five men and five women and discover that the men tend to be taller than the women, I might be on to something, or I might just have some tall men and some short women in my sample. Based on this small sample, how confident should I be that men are in general taller than women?
The statistical apparatus to check this is a test called Student’s t-test. Student was the pseudonym of William Sealy Gosset, an amiable, rucksack-wearing chemist who – beginning in 1899 – worked all his adult life for Guinness and eventually rose to the rank of head brewer. So nervous was the company about commercial confidentiality that Gosset published surreptitiously under his pseudonym.
From the outset, Gosset’s focus was practical – as the economist and historian Steve Ziliak has discovered through his work in the Guinness archives. To produce beer to a high standard on an industrial scale, Gosset needed to sample and experiment with hops, malt and barley. But experiments are expensive and Gosset developed his small-sample methods because he wanted to understand how many experiments were necessary to be confident of his results. That was a clear trade-off: how much confidence is “enough” depends on the costs of further research and the benefits of extra precision.
Ziliak and his co-author Deirdre McCloskey argue in a recent book, The Cult of Statistical Significance, that most academic disciplines have forgotten this trade-off. Instead, they use an artificial standard propagated not by Gosset but by the famous statistician and mathematical geneticist Ronald Fisher, who took Gosset’s calculations and turned them to his own devices. Fisher proposed ignoring any finding that failed to reach the 95 per cent confidence level. In other words, until the odds against a pattern having emerged by chance are 19 to 1 against, disregard the pattern completely.
That might seem a reasonable precaution – and it is certainly standard practice today – but a sharp line for statistical significance makes no sense, and it has a cost. In a recent interview, Ziliak told me about an employment promotion programme in Illinois in the recession of the early 1980s. Researchers estimated that every dollar spent on the programme saved $4.30 and were 87 per cent confident that the result was real. But that was below Fisher’s 95 per cent standard, so the programme was seen as having done nothing. Fisher would have approved.
This is strange: if I offered you the chance to spend a dollar and get back $4.30 87 per cent of the time, you would be right to see this as a good bet. Gosset would have agreed with you.
In any case, what seems like a precaution can be reckless. If a painkiller seems to cause heart attacks, Fisher’s standard says this risk can be ignored unless the statisticians are 95 per cent sure. A more reasonable standard is not to ask, “Are we certain there is an effect?”, but to consider not only the precision of our estimates, but the importance of the pattern that may be emerging. That is what Gosset did: none of his experiments for Guinness was statistically significant at the 95 per cent level. But economically significant? We can say that they were – with confidence.
Also published at ft.com.