Tim Harford The Undercover Economist

Articles published in April, 2014

The random risks of randomised trials

‘There are perils to treating patients not as human beings but as means to some glorious end’

The backlash against randomised trials in policy has begun. Randomised controlled trials (RCTs) are widely accepted as the foundation for evidence-based medicine. Yet a decade ago, they were extremely rare in other contexts such as economics, criminal justice or social policy. That is changing.

In the UK, Downing Street’s newly privatised Behavioural Insight Team has made it cool to test new ideas for conducting policy by running experiments in which many thousands of participants receive various treatments at random. The Education Endowment Foundation, set up with £125m of UK government money, has begun 59 RCTs involving 2,300 schools. In the aid industry, RCTs have been popularised by MIT’s Poverty Action Lab, which celebrated its 10th anniversary last summer – one estimate is that 500 RCTs are under way in the field of education policy alone.

With such a dramatic expansion of the use of randomised trials, it’s only right that we ask some hard questions about how they are being used. The World Bank’s development impact blog has been hosting a debate about the ethics of these trials; they have been criticised in The New York Times and in an academic article by economists Steve Ziliak and Edward Teather-Posadas.

Objections to the idea of randomisation aren’t new. The great epidemiologist Archie Cochrane once ran an RCT of coronary care units, with the alternative treatment being care at home. He was vigorously attacked by cardiologists: how could he justify randomly denying treatment to patients? The counter argument is simple: how could we justify prescribing treatments without knowing whether or not they work?

Yet that should not give carte blanche for evaluators to do whatever they like. Hanging in the background of this debate are awful abuses such as the “Tuskegee Study of Untreated Syphilis in the Negro Male”, which began in 1932. Researchers went to extraordinary lengths to ensure 400 African-American men with syphilis went untreated, although a proven treatment was available from 1947. When the experiment ended in 1972, many men were dead, 40 wives had been infected and 19 children with congenital syphilis had been born.

The Tuskegee study was not a randomised trial, but it demonstrates the perils of treating patients not as human beings but as means to some glorious end. This topic is rightly sensitive in development aid, as there is a clear power imbalance between the agencies who pay for new interventions and the poverty-stricken citizens on the receiving end.

In a perfect world, everyone involved in a trial would give informed consent, and everyone in the control group would receive the best available alternative to the approach being tested. (These are the basic guidelines laid out for medical trials by the World Medical Association’s “Helsinki” declaration.)

Yet compromises are common. Dean Karlan is professor of economics at Yale and founder of Innovations for Poverty Action, which evaluates development projects using randomisation. He points out that telling participants too much about the trial destroys the validity of the results by changing everyone’s behaviour.

Then there is the question of who consents. Camilla Nevill of the Economics Endowment Foundation says that trials are often agreed to and conducted by schools. Trying to persuade every parent to agree explicitly to the trial “decimates” the number of participants, she says.

Is this ethically troubling? At first glance, yes. But there is a risk of a double standard. Without the EEF funding, some schools would adopt the new teaching approach anyway. It is only when a researcher proposes a meaningful evaluation that suddenly there is talk of informed consent.

Ben Goldacre, an epidemiologist and author of Bad Pharma, says “it’s reasonable to hold researchers to a higher standard” if only to protect the reputation of rigorous research. But how high a standard is high enough?

Steve Ziliak, a critic of RCTs, complains about one conducted in China in which some visually-impaired children were given glasses while others received nothing. The case against the trial is that we no more need a randomised trial of spectacles than we need a randomised trial of the parachute.

The case for the defence is that we know that spectacles work but we don’t know how important it might be to pay for spectacles rather than, say, textbooks or vitamin supplements. None of these children was in line to receive glasses anyway, so what harm have the researchers inflicted?

I should leave the final word to Archie Cochrane. In his trial of coronary care units, run in the teeth of vehement opposition, early results suggested that home care was at the time safer than hospital care. Mischievously, Cochrane swapped the results round, giving the cardiologists the (false) message that their hospitals were best all along.

“They were vociferous in their abuse,” he later wrote, and demanded that the “unethical” trial stop immediately. He then revealed the truth and challenged the cardiologists to close down their own hospital units without delay. “There was dead silence.”

The world often surprises even the experts. When considering an intervention that might profoundly affect people’s lives, if there is one thing more unethical than running a randomised trial, it’s not running the trial.

Also published at ft.com.

Big Data: Are we making a big mistake?

Five years ago, a team of researchers from Google announced a remarkable achievement in one of the world’s top scientific journals, Nature. Without needing the results of a single medical check-up, they were nevertheless able to track the spread of influenza across the US. What’s more, they could do it more quickly than the Centers for Disease Control and Prevention (CDC). Google’s tracking had only a day’s delay, compared with the week or more it took for the CDC to assemble a picture based on reports from doctors’ surgeries. Google was faster because it was tracking the outbreak by finding a correlation between what people searched for online and whether they had flu symptoms.

Not only was “Google Flu Trends” quick, accurate and cheap, it was theory-free. Google’s engineers didn’t bother to develop a hypothesis about what search terms – “flu symptoms” or “pharmacies near me” – might be correlated with the spread of the disease itself. The Google team just took their top 50 million search terms and let the algorithms do the work.
The success of Google Flu Trends became emblematic of the hot new trend in business, technology and science: “Big Data”. What, excited journalists asked, can science learn from Google?
As with so many buzzwords, “big data” is a vague term, often thrown around by people with something to sell. Some emphasise the sheer scale of the data sets that now exist – the Large Hadron Collider’s computers, for example, store 15 petabytes a year of data, equivalent to about 15,000 years’ worth of your favourite music.
But the “big data” that interests many companies is what we might call “found data”, the digital exhaust of web searches, credit card payments and mobiles pinging the nearest phone mast. Google Flu Trends was built on found data and it’s this sort of data that ­interests me here. Such data sets can be even bigger than the LHC data – Facebook’s is – but just as noteworthy is the fact that they are cheap to collect relative to their size, they are a messy collage of datapoints collected for disparate purposes and they can be updated in real time. As our communication, leisure and commerce have moved to the internet and the internet has moved into our phones, our cars and even our glasses, life can be recorded and quantified in a way that would have been hard to imagine just a decade ago.
Cheerleaders for big data have made four exciting claims, each one reflected in the success of Google Flu Trends: that data analysis produces uncannily accurate results; that every single data point can be captured, making old statistical sampling techniques obsolete; that it is passé to fret about what causes what, because statistical correlation tells us what we need to know; and that scientific or statistical models aren’t needed because, to quote “The End of Theory”, a provocative essay published in Wired in 2008, “with enough data, the numbers speak for themselves”.
Unfortunately, these four articles of faith are at best optimistic oversimplifications. At worst, according to David Spiegelhalter, Winton Professor of the Public Understanding of Risk at Cambridge university, they can be “complete bollocks. Absolute nonsense.”
Found data underpin the new internet economy as companies such as Google, Facebook and Amazon seek new ways to understand our lives through our data exhaust. Since Edward Snowden’s leaks about the scale and scope of US electronic surveillance it has become apparent that security services are just as fascinated with what they might learn from our data exhaust, too.
Consultants urge the data-naive to wise up to the potential of big data. A recent report from the McKinsey Global Institute reckoned that the US healthcare system could save $300bn a year – $1,000 per American – through better integration and analysis of the data produced by everything from clinical trials to health insurance transactions to smart running shoes.
But while big data promise much to scientists, entrepreneurs and governments, they are doomed to disappoint us if we ignore some very familiar statistical lessons.
“There are a lot of small data problems that occur in big data,” says Spiegelhalter. “They don’t disappear because you’ve got lots of the stuff. They get worse.”
. . .
Four years after the original Nature paper was published, Nature News had sad tidings to convey: the latest flu outbreak had claimed an unexpected victim: Google Flu Trends. After reliably providing a swift and accurate account of flu outbreaks for several winters, the theory-free, data-rich model had lost its nose for where flu was going. Google’s model pointed to a severe outbreak but when the slow-and-steady data from the CDC arrived, they showed that Google’s estimates of the spread of flu-like illnesses were overstated by almost a factor of two.
The problem was that Google did not know – could not begin to know – what linked the search terms with the spread of flu. Google’s engineers weren’t trying to figure out what caused what. They were merely finding statistical patterns in the data. They cared about ­correlation rather than causation. This is common in big data analysis. Figuring out what causes what is hard (impossible, some say). Figuring out what is correlated with what is much cheaper and easier. That is why, according to Viktor Mayer-Schönberger and Kenneth Cukier’s book, Big Data, “causality won’t be discarded, but it is being knocked off its pedestal as the primary fountain of meaning”.
But a theory-free analysis of mere correlations is inevitably fragile. If you have no idea what is behind a correlation, you have no idea what might cause that correlation to break down. One explanation of the Flu Trends failure is that the news was full of scary stories about flu in December 2012 and that these stories provoked internet searches by people who were healthy. Another possible explanation is that Google’s own search algorithm moved the goalposts when it began automatically suggesting diagnoses when people entered medical symptoms.
Google Flu Trends will bounce back, recalibrated with fresh data – and rightly so. There are many reasons to be excited about the broader opportunities offered to us by the ease with which we can gather and analyse vast data sets. But unless we learn the lessons of this episode, we will find ourselves repeating it.
Statisticians have spent the past 200 years figuring out what traps lie in wait when we try to understand the world through data. The data are bigger, faster and cheaper these days – but we must not pretend that the traps have all been made safe. They have not.
. . .
In 1936, the Republican Alfred Landon stood for election against President Franklin Delano Roosevelt. The respected magazine, The Literary Digest, shouldered the responsibility of forecasting the result. It conducted a postal opinion poll of astonishing ambition, with the aim of reaching 10 million people, a quarter of the electorate. The deluge of mailed-in replies can hardly be imagined but the Digest seemed to be relishing the scale of the task. In late August it reported, “Next week, the first answers from these ten million will begin the incoming tide of marked ballots, to be triple-checked, verified, five-times cross-classified and totalled.”
After tabulating an astonishing 2.4 million returns as they flowed in over two months, The Literary Digest announced its conclusions: Landon would win by a convincing 55 per cent to 41 per cent, with a few voters favouring a third candidate.
The election delivered a very different result: Roosevelt crushed Landon by 61 per cent to 37 per cent. To add to The Literary Digest’s agony, a far smaller survey conducted by the opinion poll pioneer George Gallup came much closer to the final vote, forecasting a comfortable victory for Roosevelt. Mr Gallup understood something that The Literary Digest did not. When it comes to data, size isn’t everything.
Opinion polls are based on samples of the voting population at large. This means that opinion pollsters need to deal with two issues: sample error and sample bias.
Sample error reflects the risk that, purely by chance, a randomly chosen sample of opinions does not reflect the true views of the population. The “margin of error” reported in opinion polls reflects this risk and the larger the sample, the smaller the margin of error. A thousand interviews is a large enough sample for many purposes and Mr Gallup is reported to have conducted 3,000 interviews.
But if 3,000 interviews were good, why weren’t 2.4 million far better? The answer is that sampling error has a far more dangerous friend: sampling bias. Sampling error is when a randomly chosen sample doesn’t reflect the underlying population purely by chance; sampling bias is when the sample isn’t randomly chosen at all. George Gallup took pains to find an unbiased sample because he knew that was far more important than finding a big one.
The Literary Digest, in its quest for a bigger data set, fumbled the question of a biased sample. It mailed out forms to people on a list it had compiled from automobile registrations and telephone directories – a sample that, at least in 1936, was disproportionately prosperous. To compound the problem, Landon supporters turned out to be more likely to mail back their answers. The combination of those two biases was enough to doom The Literary Digest’s poll. For each person George Gallup’s pollsters interviewed, The Literary Digest received 800 responses. All that gave them for their pains was a very precise estimate of the wrong answer.
The big data craze threatens to be The Literary Digest all over again. Because found data sets are so messy, it can be hard to figure out what biases lurk inside them – and because they are so large, some analysts seem to have decided the sampling problem isn’t worth worrying about. It is.
Professor Viktor Mayer-Schönberger of Oxford’s Internet Institute, co-author of Big Data, told me that his favoured definition of a big data set is one where “N = All” – where we no longer have to sample, but we have the entire background population. Returning officers do not estimate an election result with a representative tally: they count the votes – all the votes. And when “N = All” there is indeed no issue of sampling bias because the sample includes everyone.
But is “N = All” really a good description of most of the found data sets we are considering? Probably not. “I would challenge the notion that one could ever have all the data,” says Patrick Wolfe, a computer scientist and professor of statistics at University College London.
An example is Twitter. It is in principle possible to record and analyse every message on Twitter and use it to draw conclusions about the public mood. (In practice, most researchers use a subset of that vast “fire hose” of data.) But while we can look at all the tweets, Twitter users are not representative of the population as a whole. (According to the Pew Research Internet Project, in 2013, US-based Twitter users were disproportionately young, urban or suburban, and black.)
There must always be a question about who and what is missing, especially with a messy pile of found data. Kaiser Fung, a data analyst and author of Numbersense, warns against simply assuming we have everything that matters. “N = All is often an assumption rather than a fact about the data,” he says.
Consider Boston’s Street Bump smartphone app, which uses a phone’s accelerometer to detect potholes without the need for city workers to patrol the streets. As citizens of Boston download the app and drive around, their phones automatically notify City Hall of the need to repair the road surface. Solving the technical challenges involved has produced, rather beautifully, an informative data exhaust that addresses a problem in a way that would have been inconceivable a few years ago. The City of Boston proudly proclaims that the “data provides the City with real-time in­formation it uses to fix problems and plan long term investments.”
Yet what Street Bump really produces, left to its own devices, is a map of potholes that systematically favours young, affluent areas where more people own smartphones. Street Bump offers us “N = All” in the sense that every bump from every enabled phone can be recorded. That is not the same thing as recording every pothole. As Microsoft researcher Kate Crawford points out, found data contain systematic biases and it takes careful thought to spot and correct for those biases. Big data sets can seem comprehensive but the “N = All” is often a seductive illusion.
. . .
Who cares about causation or sampling bias, though, when there is money to be made? Corporations around the world must be salivating as they contemplate the uncanny success of the US discount department store Target, as famously reported by Charles Duhigg in The New York Times in 2012. Duhigg explained that Target has collected so much data on its customers, and is so skilled at analysing that data, that its insight into consumers can seem like magic.
Duhigg’s killer anecdote was of the man who stormed into a Target near Minneapolis and complained to the manager that the company was sending coupons for baby clothes and maternity wear to his teenage daughter. The manager apologised profusely and later called to apologise again – only to be told that the teenager was indeed pregnant. Her father hadn’t realised. Target, after analysing her purchases of unscented wipes and magnesium supplements, had.
Statistical sorcery? There is a more mundane explanation.
“There’s a huge false positive issue,” says Kaiser Fung, who has spent years developing similar approaches for retailers and advertisers. What Fung means is that we didn’t get to hear the countless stories about all the women who received coupons for babywear but who weren’t pregnant.
Hearing the anecdote, it’s easy to assume that Target’s algorithms are infallible – that everybody receiving coupons for onesies and wet wipes is pregnant. This is vanishingly unlikely. Indeed, it could be that pregnant women receive such offers merely because everybody on Target’s mailing list receives such offers. We should not buy the idea that Target employs mind-readers before considering how many misses attend each hit.
In Charles Duhigg’s account, Target mixes in random offers, such as coupons for wine glasses, because pregnant customers would feel spooked if they realised how intimately the company’s computers understood them.
Fung has another explanation: Target mixes up its offers not because it would be weird to send an all-baby coupon-book to a woman who was pregnant but because the company knows that many of those coupon books will be sent to women who aren’t pregnant after all.
None of this suggests that such data analysis is worthless: it may be highly profitable. Even a modest increase in the accuracy of targeted special offers would be a prize worth winning. But profitability should not be conflated with omniscience.
. . .
In 2005, John Ioannidis, an epidemiologist, published a research paper with the self-explanatory title, “Why Most Published Research Findings Are False”. The paper became famous as a provocative diagnosis of a serious issue. One of the key ideas behind Ioannidis’s work is what statisticians call the “multiple-comparisons problem”.
It is routine, when examining a pattern in data, to ask whether such a pattern might have emerged by chance. If it is unlikely that the observed pattern could have emerged at random, we call that pattern “statistically significant”.
The multiple-comparisons problem arises when a researcher looks at many possible patterns. Consider a randomised trial in which vitamins are given to some primary schoolchildren and placebos are given to others. Do the vitamins work? That all depends on what we mean by “work”. The researchers could look at the children’s height, weight, prevalence of tooth decay, classroom behaviour, test scores, even (after waiting) prison record or earnings at the age of 25. Then there are combinations to check: do the vitamins have an effect on the poorer kids, the richer kids, the boys, the girls? Test enough different correlations and fluke results will drown out the real discoveries.
There are various ways to deal with this but the problem is more serious in large data sets, because there are vastly more possible comparisons than there are data points to compare. Without careful analysis, the ratio of genuine patterns to spurious patterns – of signal to noise – quickly tends to zero.
Worse still, one of the antidotes to the ­multiple-comparisons problem is transparency, allowing other researchers to figure out how many hypotheses were tested and how many contrary results are languishing in desk drawers because they just didn’t seem interesting enough to publish. Yet found data sets are rarely transparent. Amazon and Google, Facebook and Twitter, Target and Tesco – these companies aren’t about to share their data with you or anyone else.
New, large, cheap data sets and powerful ­analytical tools will pay dividends – nobody doubts that. And there are a few cases in which analysis of very large data sets has worked miracles. David Spiegelhalter of Cambridge points to Google Translate, which operates by statistically analysing hundreds of millions of documents that have been translated by humans and looking for patterns it can copy. This is an example of what computer scientists call “machine learning”, and it can deliver astonishing results with no preprogrammed grammatical rules. Google Translate is as close to theory-free, data-driven algorithmic black box as we have – and it is, says Spiegelhalter, “an amazing achievement”. That achievement is built on the clever processing of enormous data sets.
But big data do not solve the problem that has obsessed statisticians and scientists for centuries: the problem of insight, of inferring what is going on, and figuring out how we might intervene to change a system for the better.
“We have a new resource here,” says Professor David Hand of Imperial College London. “But nobody wants ‘data’. What they want are the answers.”
To use big data to produce such answers will require large strides in statistical methods.
“It’s the wild west right now,” says Patrick Wolfe of UCL. “People who are clever and driven will twist and turn and use every tool to get sense out of these data sets, and that’s cool. But we’re flying a little bit blind at the moment.”
Statisticians are scrambling to develop new methods to seize the opportunity of big data. Such new methods are essential but they will work by building on the old statistical lessons, not by ignoring them.
Recall big data’s four articles of faith. Uncanny accuracy is easy to overrate if we simply ignore false positives, as with Target’s pregnancy predictor. The claim that causation has been “knocked off its pedestal” is fine if we are making predictions in a stable environment but not if the world is changing (as with Flu Trends) or if we ourselves hope to change it. The promise that “N = All”, and therefore that sampling bias does not matter, is simply not true in most cases that count. As for the idea that “with enough data, the numbers speak for themselves” – that seems hopelessly naive in data sets where spurious patterns vastly outnumber genuine discoveries.
“Big data” has arrived, but big insights have not. The challenge now is to solve new problems and gain new answers – without making the same old statistical mistakes on a grander scale than ever.

This article was first published in the FT Magazine, 29/30 March 2014. Read it in its original setting here.

Have living standards really stopped rising?

‘People drift in and out of all income groups as a result of luck or the life-cycle of a career’

Income inequality is soaring in the US and the UK. The income earned by all but a few has been stagnating for a generation. So I claimed in my column of March 22. But was I right?

Several readers contacted me to suggest alternative interpretations of what look like grim data. Their objections are worth considering: they teach us both about the way numbers can lead us astray, and about the way our economy is evolving.

The first claim is that what looks like stagnation isn’t, because real gains have been mislabelled as mere inflation. The everyday technology of today was the stuff of science fiction in the 1970s when this apparent stagnation began. Perhaps inflation measures haven’t kept up.

There is truth in this argument, although we will never know how much truth unless somebody figures out how many Sinclair ZX81s an iPad is worth. But we should be cautious. In the US, 40 per cent of the consumer price index (CPI) tracks the cost of housing and related costs such as domestic heating; another 30 per cent tracks the cost of food and drink.

If two-thirds of my income goes on basics such as food and shelter, and my income is barely keeping pace with the price of such basics, there is a limit to how ecstatic I am likely to feel about the fact that iPhones exist.

There is a more technical version of the “inflation is lower than we think” argument. Customers can switch between different goods to avoid some price increases: from apples to oranges, from Cox’s Orange Pippin to Granny Smith, from apples at Whole Foods to apples at Tesco. Inflation measures may miss some of that. Or perhaps measured inflation has failed to take full account of quality improvements such as safer, more comfortable, more efficient and more durable cars.

In the US, the Boskin Commission was convened to evaluate such questions and concluded, late in 1996, that the CPI was indeed overstating true inflation by about 1.1 per cent. That’s a shockingly large figure: large enough to matter and large enough to raise questions about whether it can be true. Official statistics have changed as a result, so even if plausible, then the overstatement should be smaller now.

Can it really be that most American families have enjoyed rising incomes which have simply been missed because of errors in measuring inflation? I am not qualified to judge, and the commission became a political football because many government benefits are indexed using variants of CPI.

All this raises the question of whether prices are rising faster for the rich, exaggerating the measured rise in inequality. This is not true in the long run, and recently the opposite has been true: the poor have faced higher inflation than the rich.

Let’s move away from inflation. There are other ways in which things may be cheerier than we think. Russ Roberts, author of econ-novels such as The Invisible Heart, invites us to think harder about flatlining median household income. The “household” has changed over time, getting smaller in the US. So, says Roberts, what looks like stagnation may simply be singledom. If two-person households today make less than four-person households in 1980, that is hardly a problem.

This is a fair point but the effect doesn’t seem large enough to help us much. US household sizes have not shrunk much over the past generation (from 2.63 in 1990 to 2.59 in 2010). Looking not at households but at individuals, real median income for men in the US was higher in 1990 than in 2012. And it was higher in 1978 than in 1990. That is hardly reassuring, even though women have enjoyed strong gains in median income.

A third claim has been made by my colleague Merryn Somerset Webb, among others. It’s that talking about the income share of “the 1 per cent” over time is simply an error, because there is no “1 per cent” over time. People drift in and out of all income groups as a result of luck (being sacked; earning a bonus) or the life-cycle of a career from trainee through the senior ranks to eventual retirement.

. . .

Merryn has a point, but not a killer argument. I strongly suspect (but cannot prove) that no more than 3 per cent of people spend at least a decade enjoying membership of the “top 1 per cent” – in the UK, the bar is £164,000 a year. The majority of people never reach those heights.

Most of these objections should lead us to conclude that growth is not quite as slow as we fear, and increasing inequality not quite as stark as it first seems. None of them is powerful enough to put my mind at rest.

Yet there is genuine encouragement for optimists from the developing world. While inequality in many countries is increasing, it’s gently falling globally, because the likes of China, India and Indonesia are growing much faster than rich countries. And the situation is better than that. As last year’s UN Human Development Report argued, inequality in health and education is being reduced much faster than inequality in income. For once, that is good news that matters.

Also published at ft.com.

Economists aren’t all bad

‘Some research on students suggests economics either attracts or creates sociopaths’

Justin Welby, the Archbishop of Canterbury, recently bemoaned the way that “we are all reduced to being Homo financiarius or Homo economicus, mere economic units … for whom any gain is someone else’s loss in a zero-sum world.”

The remarks were reported on the 1st of April, but I checked, and the Archbishop seems serious. He set out two ways to see the world: the way a Christian sees it, full of abundance and grace; and the way he claims Milton Friedman saw it, as a zero-sum game.

Whatever the faults one might find in Friedman’s thinking, seeing the world as a zero-sum game was not one of them. So what do we learn from this, other than that the Archbishop of Canterbury was careless in his choice of straw man? The Archbishop does raise a troubling idea. Perhaps studying economics is morally corrosive and may simply make you a meaner, narrower human being.

That might seem to be taking the economics-bashing a bit far but there is a hefty body of evidence to consult here. (Two recent short survey articles, by psychologist Adam Grant and by economist Timothy Taylor, provide a good starting point.) Several studies have compared the attitudes or behaviour of economics students or teachers with those of people learning or teaching other academic disciplines.

Typically, these studies find that economists are less co-operative in classroom games: they contribute less to collective goods and they act selfishly in the famous prisoner’s dilemma (where two people have a strong incentive to betray each other but would collectively be better off if both stayed loyal). In 1993, Robert Frank (an economist) and Thomas Gilovich and Dennis Regan (both psychologists) surveyed academics and found that although almost everyone claimed to give money to charity, almost 10 per cent of economists said they gave nothing.

Frank and his colleagues also gave hypothetical dilemmas to students. Would they correct a billing error in their favour? Would they return a lost but addressed envelope containing cash? (And what did they think other people would do in these situations?) Those studying traditional microeconomics classes were less likely than other students to give the honest response, and slightly less likely to expect honesty from others. Most students said they would return an addressed envelope with cash in it but economics students were more likely to admit to baser motives.

Reading such research suggests economics either attracts or creates sociopaths – and that should give economics instructors pause for thought.

Yet I am not totally persuaded. Economists did actually give more to charity in Frank’s survey. They were richer, and while they gave less as a percentage of their income they did give more in cash terms.

What about those hypothetical questions about envelopes full of cash? Were economics students selfish or merely truthful? Anthony Yezer and Robert Goldfarb (economists) and Paul Poppen (a psychologist) conducted an experiment to find out, surreptitiously dropping addressed envelopes with cash in classrooms to see if economics students really were less likely to return the money. Yezer and colleagues found quite the opposite: the economics students were substantially more likely to return the cash. Not quite so selfish after all.

Most importantly, classroom experiments with collective goods or the prisoner’s dilemma don’t capture much of economic life. The prisoner’s dilemma is a special case, and a counter-intuitive one. It is not surprising that economics students behave differently, nor does it tell us much about how they behave in reality. If there is a single foundational principle in economics it is that when you give people the chance to trade with each other, both of them tend to become better off. Maybe that’s naive but it’s all about “abundance” and is the precise opposite of a zero-sum mentality.

In fact, some of the more persuasive criticisms of economics are that it is too optimistic about abundance and peaceful gains from trade. From this perspective, economists should give more attention to the risks of crime and violence and to the prospect of inviolable environmental limits to economic growth. Perhaps economists don’t realise that some situations really are zero-sum games.

. . .

Economists may appear ethically impoverished on the question of co-operating in the prisoner’s dilemma but they seem to have a far more favourable attitude to immigration from poorer countries. To an economist, foreigners are people too.

This viewpoint infuriates some critics of economics, to the extent that it earned the famous nickname of “the dismal science”. Too few people know the context in which Thomas Carlyle hurled that epithet: it was in a proslavery article, first published in 1849, a few years after slavery had been abolished in the British empire. Carlyle attacked the idea that “black men” might simply be induced to work for pay, according to what he sneeringly termed the “science of supply and demand”. Scorning the liberal views of economists, he believed Africans should be put to work by force.

Economics puts us at risk of some ethical mistakes, but with its respect for individual human agency it also inoculates us against some true atrocities. I’m not ashamed to be a dismal scientist.

Also published at ft.com.

“An evening with Tim Harford”

…sounds like the world’s worst date, but in fact I’ll be talking about my book “The Undercover Economist Strikes Back”, which I hope will be a lot more fun.

It’s on 24 April, 6.30pm in central London – full details here.

Prospect Magazine is organising and I am afraid there is a ticket price, but it’s a small venue and there will be drinks laid on. Come along if you like that sort of thing!

If you don’t fancy paying money, here’s a FREE video of me speaking about “How to Prevent Financial Meltdowns”.

10th of April, 2014MarginaliaSpeechesVideoComments off

Why long-term unemployment matters

‘Research shows that employers ignore people who have been out of work for more than six months’

“Quantity has a quality of its own.” Whether or not Stalin ever said this about the Red Army, it is true of being out of work. Evidence is mounting that the long-term unemployed aren’t merely the short-term unemployed with the addition of a little waiting time. They are in a very different situation – and an alarming one at that.

Researchers in the US are setting the pace on this topic, because it is in America that a sharp and unique shift has occurred. Broad measures of unemployment reached high but not unprecedented levels during the recent great recession. Yet long-term unemployment (lasting more than six months) surged off the charts. It has been extremely rare for long-term unemployment to make up more than 20 per cent of US unemployment, but it was at 45 per cent during the depths of the recession. In the UK and eurozone, long-term unemployment is pervasive but that, alas, is not news.

As long as there is a recovery, why does this matter? A clue emerges when we look at two statistical relationships that are famous to econo-nerds like me: the Phillips curve and the Beveridge curve. (They are named after two greats of the London School of Economics, Bill Phillips and William Beveridge.)

The Phillips curve shows a relationship between inflation and unemployment. The Beveridge curve shows a relationship between vacancies and unemployment. Both of these relationships have been doing strange things recently: given the number of people out of work, both inflation and vacancies are higher than we’d expect.

What that means is that we can hear the engine of US economic activity revving away and yet the economy is still moving slowly. The gears aren’t meshing properly; economic growth is not being converted into jobs as smoothly as we would hope. So what’s going on?

Here’s a thought experiment: what if the long-term unemployed didn’t exist? What if we replotted the Phillips curve and the Beveridge curve using statistics on short-term unemployment? It turns out that the old statistical relationships would work just fine. We can solve the statistical puzzle – all we need to do is assume that the long-term unemployed are irrelevant to the way the economy works.

A recent Brookings Institution research paper by Alan Krueger (a senior adviser to Barack Obama during the recession), Judd Cramer and David Cho examines this discomfiting thesis in greater depth. The researchers conclude that people who have been out of work for more than six months are indeed marginalised: employers ignore them, bidding up wages if necessary to attract workers from the ranks of the short-term unemployed.

I’ve written before about an experiment conducted by a young economist, Rand Ghayad. He mailed out nearly 5,000 carefully calibrated job applications, using a computer to tweak key parameters. He found that employers were three times more likely to call an applicant with irrelevant but recent employment experience, than someone who had relevant experience but had been out of work for more than six months. Long-term unemployment had become a trap.

In Ghayad’s experiment, the long-term unemployed were identical in every other way to other applicants. In reality, of course, it may be that people also become demotivated after a long spell of looking for work. The “benefits culture” at work? It seems not. Earlier research by Krueger and Andreas Mueller tracked job hunters over time and showed them becoming ever less active in the job market – and ever more depressed. They could not rouse themselves, even when unemployment insurance payments were about to expire. It wasn’t that the people joined the ranks of the long-term unemployed because they were demotivated to start with: the long-term unemployment came first, and the unhappiness and the lack of drive came later.

. . .

There is a silver lining to all this: it suggests that those of us worried about deep, technology-driven weaknesses in the US economy may be wrong. Instead, the US economy has a cyclical problem so serious that it left lasting scars – but they will heal eventually. One can hope, anyway. Experience in Canada and Sweden during the past two decades suggests that it is possible to chip away at long-term unemployment but it takes time.

Is there a policy cure for this challenge? The rightwing intuition is tough love, based on the theory that overgenerous unemployment support merely incentivises people to sit on the sidelines of the labour market until they become unemployable. Leftwingers retort that the long-term unemployed are the victims of circumstance and need our support.

Recent evidence gathered by two economists, Bart Hobijn and Aysegul Sahin, suggests the rightwingers have a point in the case of Sweden, whereas in the UK, Spain and Portugal the labour market has been hit not by overgenerous benefits but by a structural shift in the economy away from construction. The supply of jobs no longer matches the supply of workers.

As for the US, Krueger’s research paints a picture of the long-term unemployed as people who are not very different to the rest of us – merely unluckier.

Also published at ft.com.

What next for behavioural economics?

The past decade has been a triumph for behavioural economics, the fashionable cross-breed of psychology and economics. First there was the award in 2002 of the Nobel Memorial Prize in economics to a psychologist, Daniel Kahneman – the man who did as much as anything to create the field of behavioural economics. Bestselling books were launched, most notably by Kahneman himself (Thinking, Fast and Slow , 2011) and by his friend Richard Thaler, co-author of Nudge (2008). Behavioural economics seems far sexier than the ordinary sort, too: when last year’s Nobel was shared three ways, it was the behavioural economist Robert Shiller who grabbed all the headlines.
Behavioural economics is one of the hottest ideas in public policy. The UK government’s Behavioural Insights Team (BIT) uses the discipline to craft better policies, and in February was part-privatised with a mission to advise governments around the world. The White House announced its own behavioural insights team last summer.
So popular is the field that behavioural economics is now often misapplied as a catch-all term to refer to almost anything that’s cool in popular social science, from the storycraft of Malcolm Gladwell, author of The Tipping Point (2000), to the empirical investigations of Steven Levitt, co-author of Freakonomics (2005).
Yet, as with any success story, the backlash has begun. Critics argue that the field is overhyped, trivial, unreliable, a smokescreen for bad policy, an intellectual dead-end – or possibly all of the above. Is behavioural economics doomed to reflect the limitations of its intellectual parents, psychology and economics? Or can it build on their strengths and offer a powerful set of tools for policy makers and academics alike?
A recent experiment designed by BIT highlights both the opportunity and the limitations of the new discipline. The trial was designed to encourage people to sign up for the Organ Donor Register. It was huge; more than a million people using the Driver and Vehicle Licensing Agency website were shown a webpage inviting them to become an organ donor. One of eight different messages was displayed at random. One was minimalist, another spoke of the number of people who die while awaiting donations, yet another appealed to the idea of reciprocity – if you needed an organ, wouldn’t you want someone to donate an organ to you?
BIT devoted particular attention to an idea called “social proof”, made famous 30 years ago by psychologist Robert Cialdini’s book Influence. While one might be tempted to say, “Too few people are donating their organs, we desperately need your help to change that”, the theory of social proof says that’s precisely the wrong thing to do. Instead, the persuasive message will suggest: “Every day, thousands of people sign up to be donors, please join them.” Social proof describes our tendency to run with the herd; why else are books marketed as “bestsellers”?
Expecting social proof to be effective, the BIT trial used three different variants of a social proof message, one with a logo, one with a photo of smiling people, and one unadorned. None of these approaches was as successful as the best alternatives at persuading people to sign up as donors. The message with the photograph – for which the teams had high hopes – was a flop, proving worse than no attempt at persuasion at all.
Daniel Kahneman, one of the fathers of behavioural economics, receiving an award from Barack Obama, November 2013
Three points should be made here. The first is that this is exactly why running trials is an excellent idea: had the rival approaches not been tested with an experiment, it would have been easy for well-meaning civil servants acting on authoritative advice to have done serious harm. The trial was inexpensive, and now that the most persuasive message is in use (“If you needed an organ transplant, would you have one? If so, please help others”), roughly 100,000 additional people can be expected to sign up for the donor register each year.
The second point is that there is something unnerving about a discipline in which our discoveries about the past do not easily generalise to the future. Social proof is a widely accepted idea in psychology but, as the donor experiment shows, it does not always apply and it can be hard to predict when or why.
This patchwork of sometimes-fragile psychological results hardly invalidates the whole field but complicates the business of making practical policy. There is a sense that behavioural economics is just regular economics plus common sense – but since psychology isn’t mere common sense either, applying psychological lessons to economics is not a simple task.
The third point is that the organ donor experiment has little or nothing to do with behavioural economics, strictly defined. “The Behavioural Insights Team is widely perceived as doing behavioural economics,” says Daniel Kahneman. “They are actually doing social psychology.”
. . .
The line between behavioural economics and psychology can get a little blurred. Behavioural economics is based on the traditional “neoclassical” model of human behaviour used by economists. This essentially mathematical model says human decisions can usefully be modelled as though our choices were the outcome of solving differential equations. Add psychology into the mix – for example, Kahneman’s insight (with the late Amos Tversky) that we treat the possibility of a loss differently from the way we treat the possibility of a gain – and the task of the behavioural economist is to incorporate such ideas without losing the mathematically-solvable nature of the model.
Why bother with the maths? Consider the example of, say, improving energy efficiency. A psychologist might point out that consumers are impatient, poorly-informed and easily swayed by what their neighbours are doing. It’s the job of the behavioural economist to work out how energy markets might work under such conditions, and what effects we might expect if we introduced policies such as a tax on domestic heating or a subsidy for insulation.
It’s this desire to throw out the hyper-rational bathwater yet keep the mathematically tractable baby that leads to difficult compromises, and not everyone is happy. Economic traditionalists argue that behavioural economics is now hopelessly patched-together; some psychologists claim it’s still attempting to be too systematic.
Nick Chater, a psychologist at Warwick Business School and an adviser to the BIT, is a sympathetic critic of the behavioural economics approach. “The brain is the most rational thing in the universe”, he says, “but the way it solves problems is ad hoc and very local.” That suggests that attempts to formulate general laws of human behaviour may never be more than a rough guide to policy.
This shift to radical incrementalism is so much more important than some of the grand proposals out there
The most well-known critique of behavioural economics comes from a psychologist, Gerd Gigerenzer of the Max Planck Institute for Human Development. Gigerenzer argues that it is pointless to keep adding frills to a mathematical account of human behaviour that, in the end, has nothing to do with real cognitive processes.
I put this critique to David Laibson, a behavioural economist at Harvard University. He concedes that Gigerenzer has a point but adds: “Gerd’s models of heuristic decision-making are great in the specific domains for which they are designed but they are not general models of behaviour.” In other words, you’re not going to be able to use them to figure out how people should, or do, budget for Christmas or nurse their credit card limit through a spell of joblessness.
Richard Thaler of the University of Chicago, who with Kahneman and Tversky is the founding father of behavioural economics, agrees. To discard the basic neoclassical framework of economics means “throwing away a lot of stuff that’s useful”.
For some economists, though, behavioural economics has already conceded too much to the patchwork of psychology. David K Levine, an economist at Washington University in St Louis, and author of Is Behavioral Economics Doomed? (2012), says: “There is a tendency to propose some new theory to explain each new fact. The world doesn’t need a thousand different theories to explain a thousand different facts. At some point there needs to be a discipline of trying to explain many facts with one theory.”
The challenge for behavioural economics is to elaborate on the neoclassical model to deliver psychological realism without collapsing into a mess of special cases. Some say that the most successful special case comes from Harvard’s David Laibson. It is a mathematical tweak designed to represent the particular brand of short-termism that leads us to sign up for the gym yet somehow never quite get around to exercising. It’s called “hyperbolic discounting”, a name that refers to a mathematical curve, and which says much about the way behavioural economists represent human psychology.
The question is, how many special cases can behavioural economics sustain before it becomes arbitrary and unwieldy? Not more than one or two at a time, says Kahneman. “You might be able to do it with two but certainly not with many factors.” Like Kahneman, Thaler believes that a small number of made-for-purpose behavioural economics models have proved their worth already. He argues that trying to unify every psychological idea in a single model is pointless. “I’ve always said that if you want one unifying theory of economic behaviour, you won’t do better than the neoclassical model, which is not particularly good at describing actual decision making.”
. . .
Meanwhile, the policy wonks plug away at the rather different challenge of running rigorous experiments with public policy. There is something faintly unsatisfying about how these policy trials have often confirmed what should have been obvious. One trial, for example, showed that text message reminders increase the proportion of people who pay legal fines. This saves everyone the trouble of calling in the bailiffs. Other trials have shown that clearly-written letters with bullet-point summaries provoke higher response rates.
None of this requires the sophistication of a mathematical model of hyperbolic discounting or loss aversion. It is obvious stuff. Unfortunately it is obvious stuff that is often neglected by the civil service. It is hard to object to inexpensive trials that demonstrate a better way. Nick Chater calls the idea “a complete no-brainer”, while Kahneman says “you can get modest gains at essentially zero cost”.
David Halpern, a Downing Street adviser under Tony Blair, was appointed by the UK coalition government in 2010 to establish the BIT. He says that the idea of running randomised trials in government has now picked up steam. The Financial Conduct Authority has also used randomisation to develop more effective letters to people who may have been missold financial products. “This shift to radical incrementalism is so much more important than some of the grand proposals out there,” says Halpern.
Not everyone agrees. In 2010, behavioural economists George Loewenstein and Peter Ubel wrote in The New York Times that “behavioural economics is being used as a political expedient, allowing policy makers to avoid painful but more effective solutions rooted in traditional economics.”
For example, in May 2010, just before David Cameron came to power, he sang the praises of behavioural economics in a TED talk. “The best way to get someone to cut their electricity bill,” he said, “is to show them their own spending, to show them what their neighbours are spending, and then show what an energy-conscious neighbour is spending.”
But Cameron was mistaken. The single best way to promote energy efficiency is, almost certainly, to raise the price of energy. A carbon tax would be even better, because it not only encourages people to save energy but to switch to lower-carbon sources of energy. The appeal of a behavioural approach is not that it is more effective but that it is less unpopular.
Thaler points to the experience of Cass Sunstein, his Nudge co-author, who spent four years as regulatory tsar in the Obama White House. “Cass wanted a tax on petrol but he couldn’t get one, so he pushed for higher fuel economy standards. We all know that’s not as efficient as raising the tax on petrol – but that would be lucky to get a single positive vote in Congress.”
Should we be trying for something more ambitious than behavioural economics? “I don’t know if we know enough yet to be more ambitious,” says Kahneman, “But the knowledge that currently exists in psychology is being put to very good use.”
Small steps have taken behavioural economics a long way, says Laibson, citing savings policy in the US. “Every dimension of that environment is now behaviourally tweaked.” The UK has followed suit, with the new auto-enrolment pensions, directly inspired by Thaler’s work.
Laibson says behavioural economics has only just begun to extend its influence over public policy. “The glass is only five per cent full but there’s no reason to believe the glass isn’t going to completely fill up.”

First published on FT.com, Life and Arts, 22 March 2014

How investors get it wrong

‘We trade too often because we’re too confident in our ability to spot the latest bargain’

Flip through the pages of this august newspaper and you will often see reference to how particular investments are doing: gold is up, oil is down and the S&P 500 is going sideways.

Yet illuminating as all this might be, such reporting draws a veil across what we might call the Investor’s Tragedy: that the typical investor doesn’t do nearly as well as the typical investment.

This isn’t just because Wall Street and the City of London cream off all the money, although of course there is something in that. (In 1940, the author Fred Schwed invited us to contemplate the yachts of all the brokers and bankers riding at anchor off downtown Manhattan; the title of Schwed’s book was Where Are the Customers’ Yachts?)

No, the Investor’s Tragedy wouldn’t be much of a tragedy if it was all somebody else’s fault. Alas, the fault is not in our stars but in ourselves: we underperform the market because we’re doing it wrong.

Our first tragic flaw is that we buy and sell too often. In 2000, Brad Barber and Terrance Odean studied the trading performance of more than 65,000 retail investors with accounts at a large discount broker. Looking at the early 1990s – happy days for investors – Barber and Odean found that while an index reflecting US stock markets returned 17.9 per cent a year, the investors who traded most actively earned just 11.4 per cent a year – a huge shortfall that becomes even more dramatic after a few years of compounding.

Hyperactive investors paid corrosive trading costs while failing to improve their underlying investment performance. The typical investor traded less and underperformed by 1.5 per cent per year, a substantial margin. The investors who hardly traded at all were rewarded with market-matching investment performance.

Our second tragic flaw is our tendency to buy high and sell low. Apologies if this is all a bit technical but it turns out that buying high and selling low is not the aim of the investment game.

Here’s how the self-deception works. You put $10,000 into the stock market. It promptly doubles, leaving you with $20,000. So pleased are you that things are going well that you double up, putting a further $20,000 into the market. Now the market falls back to its original level. Licking your wounds, you sell half your shares for $10,000. The market promptly doubles again, leaving you holding $10,000 in cash and $20,000 in shares after investing a total of $30,000. The market has, after a rollercoaster ride, risen by 100 per cent – but somehow you haven’t made a penny of profit.

The most influential study of such behaviour was published in the American Economic Review in 2007 by Ilia Dichev, now at Emory University. Dichev found that dollar-weighted returns were several percentage points lower than buy-and-hold returns: the market did better when only a few people were in it. A number of subsequent studies have examined this tendency, and while not all of them reach the same gloomy conclusion, many do.

Dichev’s work makes sense in the light of research on the psychology of investment. Robert Shiller, one of the most recent winners of the Nobel memorial prize in economics, has found that stock markets tend to revert to long-run average valuations. When things are booming a bust is on the way, and vice versa.

Meanwhile Stefan Nagel and Ulrike Malmendier have discovered that stock market returns in our formative years shape a lifetime of investment behaviour. An awful bear market scares a generation of young investors away, just as they are being presented with a buying opportunity.

Two tragic flaws are probably enough but here’s a third: Odean also showed, in 1998, that investors had a tendency to sell shares that had risen in value while holding on to losing investments, despite tax incentives pushing in the opposite direction. In Odean’s sample of investors, this bias pulled down investors’ returns.

Explanations for these shortcomings aren’t hard to find. We trade too often because we’re too confident in our ability to spot the latest bargain. We buy at the top and sell at the bottom because we’re influenced by what others are doing. And we hold on to shares that have fallen in value because to sell them at a loss would be admitting defeat. (Anyway, those shares in Lehman Brothers are sure to bounce back at some point.)

Armed with a diagnosis, a cure is also readily available: make regular, automated investments in boring, low-cost funds and try to sell in a similarly bloodless fashion.

Unfortunately this advice doesn’t really fit the modern world. The default option for financial reporting is to tell us what the market has done in the past few hours and how everyone is feeling about that. It is hard to think of a brand of journalism more calculated to breed a herd mentality.

Meanwhile, it has never been easier to fidget with our investment portfolios. Investment platform providers have every incentive to turn their websites into something like Facebook, constantly poking us for attention.

Our investments would be far healthier in the equity market equivalent of an old-fashioned piggy bank; the sort that needs a hammer to open.

Also published at ft.com.


  • 1 Twitter
  • 2 Flickr
  • 3 RSS
  • 4 YouTube
  • 5 Podcasts
  • 6 Facebook


  • Messy
  • The Undercover Economist Strikes Back
  • Adapt
  • Dear Undercover Economist
  • The Logic of Life
  • The Undercover Economist

Tim’s Tweets

Search by Keyword

Free Email Updates

Enter your email address to receive notifications of new articles by email (you can unsubscribe at any time).

Join 3,473 other subscribers

Do NOT follow this link or you will be banned from the site!