Undercover Economist

Should we trust the young Turkers?

‘MTurk may be something of an unknown quantity but it is more diverse than the traditional study pool’

Everyone knows that Amazon turns industries on their heads, from books and ebooks to cloud computing. But most people do not realise that Amazon is also upending social science research, thanks to a service called Amazon Mechanical Turk — often known as MTurk or Turk.

MTurk is an online labour marketplace, originally designed to recruit people to do small tasks that computers couldn’t manage — for example, training a spam filter by categorising emails, deciding whether a photograph matched its description, transcribing audio recordings or flagging adult content. Thanks to MTurk, an employer can hire freelancers to work cheaply on a wide variety of computerised tasks.

Psychologists, behavioural economists and political scientists have now realised that it is potentially far cheaper to pay MTurk workers — “Turkers” — to answer questionnaires and participate in activities than it is to assemble a bespoke online panel or to conduct the research in a laboratory filled with student participants sitting at computers. For just a few dollars and at very short notice, economists can look at competition and co-operation, psychologists can examine the way memory works and political scientists can investigate how our ideology skews our logic. The opportunities are vast and have been swiftly embraced.

“The majority of papers presented at the conferences I go to now use Turk,” says Dan Goldstein, a cognitive psychologist at Microsoft Research. Goldstein, an academic who has also worked at London Business School and Columbia University, has used MTurk in his own research, for instance, into the impact of distracting online display ads.

This stampede to MTurk has made some researchers uneasy. Dan Kahan of Yale Law School studies “motivated reasoning” — the way our goals or political opinions can influence the way we process conflicting evidence. He has written a number of pieces warning about the careless use of the Amazon Turk platform.

The most obvious objection is that Turkers aren’t representative of any particular population one might wish to examine. As an illustration of this, two political scientists hired more than 500 Turkers to complete a very brief survey on the day of the 2012 US presidential election. (Tellingly, the entire survey cost the researchers just $28 and the results arrived within four hours.) The researchers, Sean Richey and Ben Taylor, found that 73 per cent of their Turkers said they had voted for Barack Obama; 12 per cent had voted for “other” — compared with 1.6 per cent of all voters. Mitt Romney polled vastly worse with the Turkers than the US public as a whole. Relative to the general population, Turkers were also more likely to vote and be young, male, poor but highly educated. Or so they claimed; it is hard to be sure.

Another objection is that Turkers chat with each other on message boards about the work they’re doing. If a piece of research involves some sort of trick question, they may compare answers. If it measures the ability of workers to co-ordinate without communicating, that communication may be happening anyway through back channels. Researchers may pose clever questions designed to measure personality traits or to probe for logical lapses, not realising the Turkers have seen these questions before and yawn when they roll round again.

Kahan wants academic journals to show more scepticism of MTurk research, and to require researchers to justify their methods in some detail. MTurk may be cheap, says Kahan, but sitting at your desk and thinking through a thought experiment is even cheaper. It doesn’t make either method valid.

Other researchers evidently disagree — perhaps because, in a more purely psychological study, what matters is that Turkers are representatives of the human race rather than of a particular rainbow of American political opinion. Many familiar results from psychology have been replicated on MTurk without trouble.

There’s pragmatism at work here too. After all, the traditional piece of psychological research has been conducted on a small number of undergraduates at Ivy League universities. (Historically, such undergraduates were typically white and male, into the bargain.) MTurk may be something of an unknown quantity but it is more diverse than the traditional study pool. Research can be conducted on a much larger group of subjects, and quickly — no more must researchers wait for students to return after the summer break.

For Dan Goldstein, the downsides are manageable and the advantages enormous. The speed of the research means far more rapid progress and, because MTurk is so cheap, much larger samples can be used. He thinks it’s a big improvement on what went before.

“I think it is one of the most important and beneficial innovations in the history of psychology,” he says, before adding the obvious caveat that like any research tool, it must be wielded with skill.

The original Mechanical Turk was an 18th-century chess-playing “robot” that, in reality, concealed a human chess player. There is something of the Wizard of Oz about the idea — and after a few decades of creating wonderment, the Turk was eventually exposed as nothing more than a clever and seductive trick. For Turk-sceptic Dan Kahan, the analogy is delicious. Having been fooled by a Mechanical Turk once, he says, we should be ashamed to be fooled again.

Written for and first pblished at ft.com.