A bit more detail and links here.
Articles published in June, 2013
The data aren’t useful because they’re spread across a gazillion spreadsheets, says Tim Harford
‘Finding government statistics is not easy. Both expert users and occasional users struggle to navigate their way through the multiple places in which statistics are published.’ UK House of Commons public administration select committee report, May 2013
How hard can it be to find a few statistics? And since when is this a matter for a parliamentary committee?
You’ve obviously never tried to use the Office for National Statistics website. Try a simple-sounding query – such as what households are currently spending in a week, or retail price inflation for the past 50 years – and you are highly unlikely to get anywhere using the search window. It’s like Google on an acid trip, throwing several thousand random results at you.
It can’t be that hard.
I recently sat down with one of the UK’s finest economic journalists, Evan Davis of the BBC, and we tried to get the results we wanted either through the search window or by trying to second-guess the tormented mind of the person who constructed the branches of the database’s hierarchy. It was hopeless. Even when Mr Davis used his expertise to shortcut the process, we found ourselves thwarted at every turn. (As an aside, Google delivered the correct result in seconds.)
I am sure Chris Giles, the FT’s economics editor, would not be defeated so easily.
Perhaps not, but Mr Giles testified to the public administration committee and took the trouble to run through, step by step, just how difficult it would be to find the answer to a simple, practical statistical question – such as whether unemployment today is lower or higher than it was in the mid-1990s. For an expert user, who knows that the relevant code for the data in question is MGSX, finding an answer to that question is slow and awkward. For a more typical user, finding an answer might be impossible.
Let’s return to the question of why a parliamentary committee should care?
It’s encouraging that MPs do care, because professional researchers at the Commons library will do all the hard work for them and they need never do battle with the ONS website. Most other people have to do the leg work themselves; and, if the ONS site is hard to use, they will turn to other sources, which are more likely to be wrong or to contain partisan spin.
Why is this such a hard problem?
I suspect the ONS is making it look harder than it really is, but making statistics accessible isn’t a straightforward task. Our official statistics have their own longstanding categorisation system, which makes little sense to the lay person, so a user-friendly navigation system must help someone sidestep that. There’s a lot of data available, in principle, and there are many different ways in which users might reasonably want to see them presented, not to mention the difficulty of dealing with synonyms such as “family spending” instead of “household expenditure”. All that said: the ONS website is a national embarrassment.
Should I conclude that other countries make a better job of this?
The US’s Fred database (short for Federal Reserve Economic Data) is well-respected for being comprehensive and easy to use. The World Development Indicators, under the guardianship of the World Bank, are impressive if fiddly. But the truth is that this stuff isn’t terribly easy.
I thought the government was going to release more data. Does that mean the problem will get worse?
Demand for data can only rise, so the ONS needs to get its house in order. But the government’s “open data” plan is a somewhat separate issue. That sort of data released could contain almost anything – for instance, the real-time location of every bus in London, to enable applications and websites to help people plan their journeys. Handy stuff – but processed official statistics, which are quality-assured, are something different.
Aren’t government departments and councils meant to be releasing highly detailed data about what they’re spending?
They are – for every item more than £25,000. But the data aren’t very useful at the moment. They are often a bit unreliable and spread across a gazillion spreadsheets. There are many such problems but, as with the ONS website, we are promised improvements in due course. The journey of a thousand miles begins with a single step, as they say. I’ll grant the government this much: the steps have been in the right direction.
Also published at ft.com.
Stories of the formula for the perfect penalty kick are cheaper than an ads, writes Tim Harford
‘People who have surgery towards the end of the week are more likely to die than those who have procedures earlier on, researchers say’ BBC.co.uk, May 29
This is presumably the National Health Service’s equivalent of Detroit’s lemons. If you buy a car that was assembled on a Friday afternoon, woe betide you . . .
It is conceivable surgeons operate after a boozy Friday lunch. A more plausible explanation is the NHS is short-staffed at weekends and so if your surgery leads to complications you may be less likely to get prompt attention from experienced staff. Several studies have suggested it’s not a great idea to be stuck in hospital over the weekend, but there has been a suspicion the problem may not be the staff but the patients. Those who rock up for emergency surgery at three o’clock on a Saturday morning may just be different sorts of people with different conditions, compared to those who arrive at lunchtime on Wednesday. This research looked at planned surgery, not emergency surgery, which (one hopes) removes that source of confusion.
I’m sceptical. Haven’t we heard “researchers say” all sorts of things about different days of the week?
“Researchers say” the strangest things, at least according to the newspapers. You may be thinking of the “Blue Monday” equation, which purported to show the last Monday in January was, scientifically speaking, the year’s most depressing day.
That’s the one!
It’s nonsense. Harry Frankfurt’s essay “On Bullshit” pointed out while a liar knows the truth and is determined to conceal it, the bull merchant has no interest in whether something is true or not. This particular piece of nonsense is arbitrary pseudoscience.
Researchers publish nonsensical pseudoscience in the media all the time. Which is why I was sceptical about the “don’t get sick on Friday” study.
The problem is we constantly read that “researchers say” one thing or another. Perhaps that phrase once conveyed something meaningful – that experts had conducted a rigorous study on a topic and that we didn’t need to worry with the details, but could skip to the punchline. If so, public relations companies have hijacked the phrase, using it as a vector to infect the nervous systems of journalists and their editors. The Blue Monday study was paid for by a travel agency. Other nonsensical equations have been commissioned by ice cream makers, lingerie manufacturers, supermarkets and bookmakers. Some academic is persuaded to attach his good name to the sorry affair – and the definition of “academic” is often very loose. Nobody cares. Stories breathlessly relating the discovery of the mathematical formula for the perfect penalty kick, the perfect pair of breasts or the perfect weekend are routinely published. They are cheaper than paying for advertising.
But you’re going to tell me that the hospital mortality study was different, because it wasn’t sponsored by some corporate PR outfit?
You’re missing the point. The real flaw with Blue Monday wasn’t that it was commissioned by a corporation. It was that it had no scientific content. Science isn’t just whatever emerges from the mouth of someone with a tenuous university affiliation. Science is a process. The hospital study is part of that scientific process. It identified a hypothesis of importance. It gathered data – more than 4m inpatient admissions for every hospital in England over the course of three years. It analysed the data, with enough statistical power that the observed patterns were enormously unlikely to be the result of chance. It found an effect that was large enough to be of real practical concern. The research refers to, and supplements, previous studies in the area – and future studies will refer to, and supplement, this one. It was peer-reviewed and published in the British Medical Journal, an organ with a reputation to defend.
So the research checks out.
With apologies to Star Trek, I’m an economist, Jim, not a doctor. But it looks solid to me. Whether the BMJ study ultimately turns out to be correct, it is a world away from Blue Monday or the equation for the beer-goggles effect. Yet to the casual consumer of newspaper reporting, the difference is far from clear. So now half the country is credulous about pseudoscience, while the other half is disbelieving of perfectly good research. It’s all far more disheartening than a Monday morning.
Also published at ft.com.
Economics will have to change what it recognises as a question, and what it recognises as an answer
According to IBM, the computers with which we have surrounded ourselves are now generating 2.5 quintillion bytes of data a day around the world. That’s about half a CD’s worth of data per person per day. “Big data” is the topic of countless breathless conference presentations and consultants’ reports. What, then, might it contribute to economics?
Not everyone means the same thing when they talk about “big data”, but here are a few common threads. First, the dataset is far too big for a human to comprehend without a lot of help from some sort of visualisation software. The time-honoured trick of plotting a scatter graph to see what patterns or anomalies it suggests is no use here. Second, the data is often available at short notice, at least to some people. Your mobile phone company knows where your phone is right now. Third, the data may be heavily interconnected – in principle Google could have your email, your Android phone location, knowledge of who is your friend on the Google Plus social network, and your online search history. Fourth, the data is messy: videos that you store on your phone are “big data” but a far cry from neat database categories – date of birth, employment status, gender, income.
This hints at problems for economists. We have been rather spoiled: in the 1930s and 1940s pioneers such as Simon Kuznets and Richard Stone built tidy, intellectually coherent systems of national accounts. Literally billions of individual transactions are summarised as “UK GDP in 2012”; billions of price movements are represented by a single index of inflation. The data come in nice “rectangular” form – inflation for x countries over y years, for instance.
The big data approach is very different. Take, for instance, credit card data. In principle Mastercard has a wonderful dataset: it knows who is spending how much, where, and on what kind of product, and it knows it instantly. But this is what economists Liran Einav and Jonathan Levin call a “convenience sample” – not everyone has a Mastercard, and not everyone who has one will use it much.
It would be astonishing if the Mastercard dataset couldn’t tell economic researchers something useful, but it’s very poorly matched to the kind of data we normally use or even the kind of questions we normally ask. We like to find causal links, not just patterns – and for everyone, or a representative sample of people, not for an arbitrary sub-group.
Perhaps it’s no surprise that the most immediate use of big data in economics has been in forecasting (or “nowcasting”), which has always been a pragmatic and academically-marginal activity in economics. Analyses of tweets, of Google searches for unemployment benefit or motor insurance, or of trackers of trucks in Germany, have been used ad hoc to understand how the economy is doing, and they seem to work well enough. MIT’s “billion prices project” provides daily estimates of inflation from around the world.
More traditional attempts to use big data have been influential. For instance, Raj Chetty, John Friedman and Jonah Rockoff linked administrative data on 2.5m schoolchildren from New York City to information on what they earned as adults decades later. A single year’s exposure to a poor teacher turns out to have large and long-lasting effects on career success. Amy Finkelstein and a team of colleagues evaluated Medicaid, the low-income US healthcare programme, linking data on hospital records to credit history and other variables. Without large datasets such research would be impossible.
These recent studies promise much more to come for economics. But to take full advantage of the data revolution, the profession will have to change both what it recognises as a question, and what it recognises as an answer.
Also published at ft.com.