Tim Harford The Undercover Economist

Articles published in February, 2015

Overconfidence man

We don’t have a good sense of our own fallibility. Checking my answers, it was the one I felt the most certain of that I got wrong

In 1913 Robert Millikan published the results of one of the most famous experiments in the history of physics: the “oil drop” experiment that revealed both the electric charge on an electron and, indirectly, the mass of the electron too. The experiment led in part to a Nobel Prize for Millikan but it is simple enough for a school kid to carry it out. I was one of countless thousands who did just that as a child, although I found it hard to get my answers quite as neat as Millikan’s.

We now know that even Millikan didn’t get his answers quite as neat as he claimed he did. He systematically omitted observations that didn’t suit him, and lied about those omissions. Historians of science argue about the seriousness of this cherry-picking, ethically and practically. What seems clear is that if the scientific world had seen all of Millikan’s results, it would have had less confidence that his answer was right.

This would have been no bad thing, because Millikan’s answer was too low. The error wasn’t huge — about 0.6 per cent — but it was vast relative to his stated confidence in the result. (For the technically minded, Millikan’s answer is several standard deviations away from modern estimates: that’s plenty big enough.)

There is a lesson here for all of us about overconfidence. Think for a moment: how old was President Kennedy when he was assassinated? How high is the summit of Mount Kilimanjaro? What was the average speed of the winner of last year’s Monaco F1 Grand Prix? Most people do not know the exact answers to these questions but we can all take a guess.

Let me take a guess myself. JFK was a young president but I’m pretty sure he was over 40 when elected. I’m going to say that when he died he was older than 40 but younger than 60. I climbed Kilimanjaro many years ago and I remember it being 6,090-ish metres high. Let’s say, more than 6,000m but less than 6,300m. As for the racing cars, I think they can do a couple of hundred miles an hour but I know that Monaco is a slow and twisty track. I’ll estimate that the average speed was above 80mph but below 150mph.

Psychologists have conducted experiments asking people to answer such questions with upper and lower bounds for their answers. We don’t do very well. Asked to produce wide margins of error, such that 98 per cent of answers fall within that margin, people usually miss the target 20-40 per cent of the time; asked to produce a tighter margin, such that half the answers are correct, people miss the target two-thirds of the time.

We don’t have a good sense of our own fallibility. Despite the fact that I am well aware of such research, when I went back to check my own answers, it was the one I felt most certain of that I got wrong: Kilimanjaro is just 5,895m high. It seemed bigger at the time.

But there’s another issue here. The charismatic Nobel laureate Richard Feynman pointed out in the early 1970s that the process of fixing Millikan’s error with better measurements was a strange one: “One is a little bit bigger than Millikan’s, and the next one’s a little bit bigger than that, and the next one’s a little bit bigger than that, until finally they settle down to a number which is higher. Why didn’t they discover the new number was higher right away?”

What was probably happening was that whenever a number was close to Millikan’s, it was accepted without too much scrutiny. When a number seemed off it would be viewed with scepticism and reasons would be found to discard it. And since Millikan’s estimate was too low, those suspect measurements would typically be larger than Millikan’s. Accepting them was a long and gradual process.

Feynman added that scientists have learnt their lesson and don’t make such mistakes any more. Perhaps that’s true, although a paper published by the decision scientists Max Henrion and Baruch Fischhoff, almost 15 years after Feynman’s lecture, found that same pattern of gradual convergence in other estimates of physical constants such as Avogadro’s number and Planck’s constant. From the perspective of the 1980s, convergence continued throughout the 1950s and 1960s and sometimes into the 1970s.

Perhaps that drift continues today even in physics. Surely it continues in messier fields of academic inquiry such as medicine, psychology and economics. The lessons seem clear enough. First, to be open to ourselves and to others about the messy fringes of our experiments and data; they may not change our conclusions but they should reduce our overconfidence in those conclusions. Second, to think hard about the ways in which our conclusions may be wrong. Third, to seek diversity: diversity of views and of data-gathering methods. Once we look at the same problem from several angles, we have more chances to spot our errors.

But humans being what they are, this problem isn’t likely to go away. It’s very easy to fool ourselves at the best of times. It’s particularly easy to fool ourselves when we already think we have the answer.

Written for and first published at ft.com.

Is it possible to just click with someone?

‘Whether the computer reckons you’re a love match or not isn’t something that anyone should take seriously’

I’ve occasionally wondered whether the secret to love is mathematics, and I’m not the only one. Mathematics is full of perky ideas about matching or sorting that have a veneer of romantic promise. But for all their beauty and cleverness, one often feels that such ideas are a far better introduction to mathematics than they are to dating and mating.

Consider the Gale-Shapley algorithm, which dates from 1962 but won Lloyd Shapley a Nobel Memorial Prize in economics just a couple of years ago. The algorithm is a way of assigning matching pairs in a stable way. By “stable”, we mean that no two people would do better ignoring the algorithm and instead making a side-arrangement with each other. The Gale-Shapley algorithm can be used for matching students to university places, or kidney donors to kidney recipients. However, it is most famously described as a way of allocating romantic partners. It is, alas, ill suited to this task, since it skips over the possibility of homosexuality, bisexuality, polyamory or even something as simple as divorce. (1962 is on the phone . . . it wants its algorithm back.)

But if pure mathematics cannot help, surely statistics can? Internet dating promises to move us away from abstractions to the more gritty reality of data. Simply type in everything you have to offer, in great detail, and let the computer algorithm find your match. What could be simpler or more efficient?

Perhaps we should be a little cautious before buying into the hype. After all, such promises have been made before. The journalist Matt Novak has unearthed an article from 1924’s Science and Invention magazine in which the magazine’s publisher Hugo Gernsback explained that humans would soon enjoy the same scientific matchmaking approach then lavished on horses. The science included the “electrical sphygmograph” (it takes your pulse) and a “body odor test” (sniffing a hose attached to a large glass capsule that contains your beau or belle).

Then, in the 1960s, enterprising Harvard students set up “Operation Match”. It was a matchmaking service powered by a punch-card IBM computer. Despite breathless media coverage, this was no more scientific than Gernsback’s sphygmograph. According to Dan Slater’s Love in the Time of Algorithms, the men who founded Operation Match were hoping for the first pick of the women themselves.

One subscriber expressed the advantages and limitations of digital dating very well: “I approve of it as a way to meet people, although I have no faith in the questionnaire’s ability to match compatible people.”

Quite so. Operation Match was a numbers game in the crudest sense. It was an easy way to reach lots of nearby singles. There should be no pretence that the computer could actually pair up couples who were ideally suited to each other.

Perhaps we simply need more data? OkCupid, a dating site with geek appeal and a witty, naughty tone, allows you to answer thousands of questions: anything from “Do you like the taste of beer?” to “Would you ever read your partner’s email?” Users typically answer several hundred such questions, as well as indicating what answer they would hope for from a would-be date, and how important they feel the question is.

Again, media reaction has been credulous. Every now and then we hear of nerds who are living the dream, playing OkCupid’s algorithms with such virtuosity that love is theirs to command. Wired magazine introduced us to Chris McKinlay, “the math genius who hacked OkCupid”. McKinlay, we are told, downloaded a dataset containing 20,000 women’s profiles and six million questionnaire answers, optimised his own profile and unleashed an army of software bots to draw women in. He was a data-driven love-magnet.

But OkCupid’s own research suggests this is all rather futile. In one controversial experiment, it took a collection of pairs of users who were a poor match, according to the OkCupid algorithm — and then told them instead that they were highly compatible. One might expect that these not-really-compatible couples would find that their conversations quickly fizzled. In fact, they did scarcely less well than couples where the algorithm genuinely predicted a match. In short, whether the computer reckons you’re a love match or not isn’t a piece of information that anyone should take seriously.

. . .

Hannah Fry, author of The Mathematics of Love, expresses the problem neatly. The algorithm, she says, “is doing exactly what it was designed to do: deliver singles who meet your specifications. The problem here is that you don’t really know what you want.”

Quite so. The list of qualities that we might want in a partner — “fascinating, sexy, fun, handsome, hilarious” — are a poor match for the list of qualities one could share with a computer database — “likes beer, boardgames, Malcolm Gladwell and redheads”. If the computer cannot pose the right questions it is hardly likely to produce the right answers.

As for Chris McKinlay, no doubt we all wish him well. He announced his engagement to Christine Tien Wang — the 88th woman he met in person after spending months in the middle of a perfect dating storm. His experience suggests that just as with Operation Match, the matching process is nonsense and the secret to finding love is to date a lot of people.

Written for and first published at ft.com.

Why the high street is overdosing on caffeine

‘If Starbucks opens a café just round the corner from another Starbucks, is that really about selling more coffee?’

“New Starbucks Opens in Restroom of Existing Starbucks”, announced The Onion, satirically, in 1998. It was a glimpse of the future: there were fewer than 2,000 Starbucks outlets back then and there are more than 21,000 now. They are also highly concentrated in some places. Seoul has nearly 300 Starbucks cafés, London has about 200 — a quarter of all the Starbucks outlets in the UK — and midtown Manhattan alone has 100. It raises the question: how many Starbucks shopfronts are too many?

Such concerns predate the latte boom. In the late 1970s, Douglas Adams (also satirically) posited the Shoe Event Horizon. This is the point at which so much of the retail landscape is given over to shoe shops that utter economic collapse is inevitable.

And in 1972, the US Federal Trade Commission issued an entirely non-satirical complaint against the leading manufacturers of breakfast cereal, alleging that they were behaving anti-competitively by packing the shelves with frivolous variations on the basic cereals. That case dragged on for years before eventually being closed down by congressional action.

The intuition behind these complaints is straightforward. If Starbucks opens a café just round the corner — or in some cases, across the road — from another Starbucks, could that really be about selling more coffee, or is it about creating a retail landscape so caffeinated that no rival could survive? Similarly, the arrival on the supermarket shelves of Cinnamon Burst Cheerios might seem reasonable enough, were they not already laden with Apple Cinnamon Cheerios and Cheerios Protein Cinnamon Almond and 12 other variants on the Cheerios brand.

Conceptually, there is little difference between having outlets that are physically close together and having products that differ only in subtle ways. But it is hard to be sure exactly why a company is packing its offering so densely, at the risk of cannibalising its own sales.

A crush of products or outlets may be because apparently similar offerings reflect differences that matter to consumers. I do not much care whether I am eating Corn Flakes or Shreddies — the overall effect seems much the same to me — but others may care very much indeed. It might well be that in midtown Manhattan, few people will bother walking an extra block to get coffee, so if Starbucks wants customers it needs to be on every corner.

But an alternative explanation is that large companies deliberately open too many stores, or launch too many products, because they wish to pre-empt competitors. Firms could always slash prices instead to keep the competition away but that may not be quite as effective — a competitor might reasonably expect any price war to be temporary. It is less easy to un-launch a new product or shut down a brand-new outlet. A saturated market is likely to stay saturated for a while, then, and that should make proliferation a more credible and effective deterrent than low prices.

A recent paper by two economists from Yale, Mitsuru Igami and Nathan Yang, studies this question in the market for fast-food burgers. Igami and Yang used old telephone directories to track the expansion of the big burger chains into local markets across Canada from 1970 to 2005. After performing some fancy analysis, they concluded that big burger chains did seem to be trying to pre-empt competition. If Igami and Yang’s model is to be believed, McDonald’s was opening more outlets, more quickly than would otherwise have been profitable.

It is the consumer who must ultimately pay for these densely packed outlets and products. But perhaps the price is worthwhile. The econometrician Jerry Hausman once attempted to measure the value to consumers of Apple Cinnamon Cheerios. He concluded that it was tens of millions of dollars a year — not much in the context of an economy of $17tn a year, but not nothing either. Perhaps competitors were shut out of the market by Apple Cinnamon Cheerios but that doesn’t mean that consumers didn’t value them.

 . . . 

It may be helpful to consider what life would be like if every café, cereal brand or fast-food joint were owned by a separate company. Steven Salop, an economist at Georgetown University, produced an elegant economic analysis of this scenario in 1979. He found that even a market full of independents will seem a little too crowded. This is because firms will keep showing up and looking for customers until there is not enough demand to cover their costs. The last entrepreneur to enter is the one that just breaks even, scraping together enough customers to pay for the cost of setting up the business. She is indifferent to whether she is in business or doing something else entirely. However, every other entrepreneur in the crowded market is wishing that she had stayed away.

Whether the products are shoes or cereal, lattes or cheeseburgers, markets will often seem wastefully crowded. That perception is largely an illusion, but not entirely. In big city markets, there really are too many cereals, too many cafés and too many fast-food restaurants. But even if they were all mom-and-pop independents, that might still be true.

Written for and first published at ft.com.

Making a lottery out of the law

‘The cure for “bad statistics” isn’t “no statistics” — it’s using statistical tools properly’

The chances of winning the UK’s National Lottery are absurdly low — almost 14 million to one against. When you next read that somebody has won the jackpot, should you conclude that he tampered with the draw? Surely not. Yet this line of obviously fallacious reasoning has led to so many shaky convictions that it has acquired a forensic nickname: “the prosecutor’s fallacy”.

Consider the awful case of Sally Clark. After her two sons each died in infancy, she was accused of their murder. The jury was told by an expert witness that the chance of both children in the same family dying of natural causes was 73 million to one against. That number may have weighed heavily on the jury when it convicted Clark in 1999.

As the Royal Statistical Society pointed out after the conviction, a tragic coincidence may well be far more likely than that. The figure of 73 million to one assumes that cot deaths are independent events. Since siblings share genes, and bedrooms too, it is quite possible that both children may be at risk of death for the same (unknown) reason.

A second issue is that probabilities may be sliced up in all sorts of ways. Clark’s sons were said to be at lower risk of cot death because she was a middle-class non-smoker; this factor went into the 73-million-to-one calculation. But they were at higher risk because they were male, and this factor was omitted. Which factors should be included and which should be left out?

The most fundamental error would be to conclude that if the chance of two cot deaths in one household is 73 million to one against, then the probability of Clark’s innocence was also 73 million to one against. The same reasoning could jail every National Lottery winner for fraud.

Lottery wins are rare but they happen, because lots of people play the lottery. Lots of people have babies too, which means that unusual, awful things will sometimes happen to those babies. The court’s job is to weigh up the competing explanations, rather than musing in isolation that one explanation is unlikely. Clark served three years for murder before eventually being acquitted on appeal; she drank herself to death at the age of 42.

Given this dreadful case, one might hope that the legal system would school itself on solid statistical reasoning. Not all judges seem to agree: in 2010, the UK Court of Appeal ruled against the use of Bayes’ Theorem as a tool for evaluating how to put together a collage of evidence.

As an example of Bayes’ Theorem, consider a local man who is stopped at random because he is wearing a distinctive hat beloved of the neighbourhood gang of drug dealers. Ninety-eight per cent of the gang wear the hat but only 5 per cent of the local population do. Only one in 1,000 locals is in the gang. Given only this information, how likely is the man to be a member of the gang? The answer is about 2 per cent. If you randomly stop 1,000 people, you would (on average) stop one gang member and 50 hat-wearing innocents.

We should ask some searching questions about the numbers in my example. Who says that 5 per cent of the local population wear the special hat? What does it really mean to say that the man was stopped “at random”, and do we believe that? The Court of Appeal may have felt it was spurious to put numbers on inherently imprecise judgments; numbers can be deceptive, after all. But the cure for “bad statistics” isn’t “no statistics” — it’s using statistical tools properly.

Professor Colin Aitken, the Royal Statistical Society’s lead man on statistics and the law, comments that Bayes’ Theorem “is just a statement of logic. It’s irrefutable.” It makes as much sense to forbid it as it does to forbid arithmetic.

 . . . 

These statistical missteps aren’t a uniquely British problem. Lucia de Berk, a paediatric nurse, was thought to be the most prolific serial killer in the history of the Netherlands after a cluster of deaths occurred while she was on duty. The court was told that the chance this was a coincidence was 342 million to one against. That’s wrong: statistically, there seems to be nothing conclusive at all about this cluster. (The death toll at the unit in question was actually higher before de Berk started working there.)

De Berk was eventually cleared on appeal after six years behind bars; Richard Gill, a British statistician based in the Netherlands, took a prominent role in the campaign for her release. Professor Gill has now turned his attention to the case of Ben Geen, a British nurse currently serving a 30-year sentence for murdering patients in Banbury, Oxfordshire. In his view, Geen’s case is a “carbon copy” of the de Berk one.

Of course, it is the controversial cases that grab everyone’s attention, so it is difficult to know whether statistical blunders in the courtroom are commonplace or rare, and whether they are decisive or merely part of the cut and thrust of legal argument. But I have some confidence in the following statement: a little bit of statistical education for the legal profession would go a long way.

Written for and first published at ft.com.

Elsewhere

  • 1 Twitter
  • 2 Flickr
  • 3 RSS
  • 4 YouTube
  • 5 Podcasts
  • 6 Facebook

Books

  • Messy
  • The Undercover Economist Strikes Back
  • Adapt
  • Dear Undercover Economist
  • The Logic of Life
  • The Undercover Economist

Tim’s Tweets

Search by Keyword

Free Email Updates

Enter your email address to receive notifications of new articles by email (you can unsubscribe at any time).

Join 4,075 other subscribers

Do NOT follow this link or you will be banned from the site!