Why bad times call for good data

20th May, 2021

Watching the Ever Given wedge itself across the Suez Canal, it would have taken a heart of stone not to laugh. But it was yet another unpleasant reminder that the unseen gears in our global economy can all too easily grind or stick. From the shutdown of Texas’s plastic polymer manufacturing to a threat to vaccine production from a shortage of giant plastic bags, we keep finding out the hard way that modern life relies on weak links in surprising places.

So where else is infrastructure fragile and taken for granted? I worry about statistical infrastructure — the standards and systems we rely on to collect, store and analyse our data. Statistical infrastructure sounds less important than a bridge or a power line, but it can mean the difference between life and death for millions.

Consider Recovery (Randomised Evaluations of Covid-19 Therapy). Set up in a matter of days by two Oxford academics, Martin Landray and Peter Horby, over the past year Recovery has enlisted hospitals across the UK to run randomised trials of treatments such as the antimalarial drug hydroxychloroquine and the cheap steroid dexamethasone. With minimal expense and paperwork, it turned the guesses of physicians into simple but rigorous clinical trials. The project quickly found that dexamethasone was highly effective as a treatment for severe Covid-19, thereby saving a million lives.

Recovery relied on data accumulated as hospitals treated patients and updated their records. It wasn’t always easy to reconcile the different sources — some patients were dead according to one database and alive on another. But such data problems are solvable and were solved.

A modest amount of forethought about collecting the right data in the right way has produced enormous benefits. In the geek community, statistical infrastructure is cool: the latest World Development Report from the World Bank describes the huge potential for data to do good and laments the lost opportunities that result from weak statistical infrastructure in poor countries.

But it isn’t just poor countries that have suffered. In the US, data about Covid-19 testing was collected haphazardly by states. This left the federal government flying blind, unable to see where and how quickly the virus was spreading. Eventually volunteers, led by the journalists Robinson Meyer and Alexis Madrigal of the Covid Tracking Project, put together a serviceable data dashboard.

“We have come to see the government’s initial failure here as the fault on which the entire catastrophe pivots,” wrote Meyer and Madrigal in The Atlantic. They are right.

What is more striking is that the weakness was there in plain sight. Madrigal recently told me that the government’s plan for dealing with a pandemic assumed that good data would be available — but did not build the systems to create them. It is hard to imagine a starker example of taking good statistical infrastructure for granted.

Instead, as with the Ever Given, we only notice the problems afterwards. Back in October, almost 16,000 positive Covid-19 cases disappeared somewhere between testing labs and England’s contact-tracing system. An outdated Excel file format had been used which simply didn’t have enough rows.

The effects of this absurd Excel lapse have been fatal. One estimate, by the economists Thiemo Fetzer and Thomas Graeber, is that it led to 125,000 further cases and 1,500 deaths.

Happily, there are some inspiring examples of good practice. OpenSAFELY is a project jointly led by Ben Goldacre, another Oxford academic. The OpenSAFELY tools allow researchers to pose statistical queries of an immensely detailed data set — the medical records of millions of NHS patients in the UK. Researchers using OpenSAFELY never gain direct access to the patient records themselves and, by design, they automatically share their statistical code for others to examine and adapt. Vital medical questions can be answered in a transparent, collaborative fashion without compromising patient privacy.

Such statistical sorcery is possible when the data infrastructure is thoughtfully designed from the beginning to gather that data and make sense of it. Once the rules and systems are in place, the data will follow.

Every previous crisis has provoked a realisation that we lacked the data we needed. The Great Depression prompted governments to gather data about unemployment and national income. The banking crisis of 2007-08 showed regulators that they had far too little information about stresses and vulnerabilities in the financial system. The pandemic should prompt us to improve the data we gather on public health. Governments routinely use labour surveys to understand the economic health of households; they should now do the same with literal health. We could assemble a representative panel of volunteers who agreed to medical check-ups every three months. This would provide invaluable data and, in times of crisis, the volunteers could be approached more frequently, for example, for regular swabs to track the spread of a new virus.

Hindsight is a wonderful thing, of course: the next crisis will no doubt demand timely information about something new. But statistical infrastructure can be built to adapt — and it is a great deal cheaper than digging a second canal from the Red Sea to the Mediterranean.

Written for and first published in the Financial Times on 23 April 2021.

The paperback of “How To Make The World Add Up” is now out. US title: “The Data Detective”.

“Nobody makes the statistics of everyday life more fascinating and enjoyable than Tim Harford.”- Bill Bryson

“This entertaining, engrossing book about the power of numbers, logic and genuine curiosity”- Maria Konnikova

I’ve set up a storefront on Bookshop in the United States and the United Kingdom – have a look and see all my recommendations; Bookshop is set up to support local independent retailers.

Pin It on Pinterest

Share This