I never planned to fake my data. My project involved interviewing the customers visiting a games shop in central London, then analysing the distance they had travelled. Arriving at the location with a clipboard, I realised that I didn’t have the nerve. I slunk home and began to dream up some realistic-seeming numbers. I am ashamed but, in mitigation, I was about 14 years old. I am confident that the scientific record has not been corrupted by my sins.
I wish I could say that only schoolchildren fake their data, but the evidence suggests otherwise. Stuart Ritchie’s book Science Fictions argues that “fraud in science is not the vanishingly rare scenario that we desperately hope it to be”.
Some frauds seem comical. In the 1970s, a researcher named William Summerlin claimed to have found a way to prevent skin grafts from being rejected by the recipient. He demonstrated his results by showing a white mouse with a dark patch of fur, apparently a graft from a black mouse. It transpired that the dark patch had been coloured with a felt-tip pen. Yet academic fraud is no joke.
In 2009, Daniele Fanelli estimated that “about 2 per cent of scientists admitted to have fabricated, falsified or modified data or results at least once”. I believe that the majority of researchers would not dream of faking data, but it seems that the dishonest exceptions are not as unusual as we would hope.
This matters. Fraudulent research wastes the time of scientists who try to build on it and the money of funding agencies who support it. It undermines the reputation of good science. Above all, if the insights produced by good science make the world better, then false beliefs produced by fraudulent science make the world worse.
Consider the desperate search for treatments for Covid-19. Medical researchers have scrambled to test out treatments from vitamin D to the deworming drug ivermectin, but the results of these scrambles have often been small or flawed studies. However, an influential working paper, published late last year, described a large trial with very positive results for ivermectin. It gave a lot of people hope and inspired the use of ivermectin around the world, although the European Medicines Agency and the US Food and Drug Administration advise against ivermectin’s use to treat Covid-19.
The research paper was withdrawn on July 14, after several researchers discovered anomalies in the underlying data. Some patients appeared to have died before the study even began, while other patient records seemed to be duplicates. There may be an innocent explanation for this but it certainly raises questions.
On August 17 there was an unsettling development in a quite different field, behavioural science. Data detectives Uri Simonsohn, Joe Simmons, Leif Nelson and anonymous co-authors published a forensic analysis of a well-known experiment about dishonesty.
The experiment, published in 2012, was based on data from a motor insurer in which customers had supplied information about mileage along with a declaration that the information was true. Some signed the declaration at the top of the document, while others signed at the bottom — and those who signed at the top were more likely to tell the truth. It’s an intuitive and influential discovery. The only trouble, conclude Simonsohn and his colleagues, is that it is apparently based on faked data.
“There is very strong evidence that the data were fabricated,” they conclude. Several of the authors of the original article have published statements agreeing. What remains to be seen is who or what was behind the suspected fabrication.
Dan Ariely, the most famous of the authors of the original study, was the one who brought the data to the collaboration. He told me in an email that “at no point did I knowingly use unreliable, inaccurate, or manipulated data in our research”, expressing regret that he did not sufficiently check the data which was supplied to him by the insurance company.
Both episodes are disheartening: science is hard enough when everyone involved is engaged in good faith. Fortunately, science already has the tools it needs to deal with any fraud — much the same tools that it needs to deal with more innocent errors. Scientists need to get back the traditional values of the field, which include the open sharing of scientific ideas and data, and rigorous scrutiny of those ideas.
They should bolster those traditional values with modern tools. For example, journals should demand that scientists publish their raw data unless there is an extraordinary reason not to. This practice dissuades fraud by making it easier to detect, but more importantly allows work to be checked, reproduced and extended. Algorithms can now scan research for anomalies such as statistically implausible data. Automatic systems can warn researchers if they are citing a retracted paper. None of this would have been possible in the era of paper journals, but it should be commonplace now.
Our current scientific institutions reward originality, curiosity and inventiveness, which are classic scientific virtues. But those virtues also need to be balanced with the virtues of rigour, scepticism and collaborative scrutiny. Science has long valued the idea that scientific results can be repeated and checked. Scientists need to do more to live up to that ideal.
Written for and first published in the Financial Times on 8 September 2021.
“One of the most wonderful collections of stories that I have read in a long time… fascinating.”- Steve Levitt (Freakonomics)
“If you aren’t in love with stats before reading this book, you will be by the time you’re done.”- Caroline Criado Perez (Invisible Women)