Research News

Viability of Crowdsourcing for Population Research

IPR sociologist Jeremy Freese compares Mechanical Turk with TESS


mturktessillustration
Jeremy Freese compares TESS with Amazon's Mechanical Turk.

“A problem with survey experiments is that they’ve long been understood as a good idea, but they’re hard and expensive to actually field,” said IPR sociologist Jeremy Freese at an October 6 IPR colloquium.

Jeremy Freese
Jeremy Freese

Along with IPR political scientist James Druckman, Freese is co-principal investigator of Time-sharing Experiments for the Social Sciences (TESS), an NSF-funded online platform for survey experiments that aims to make them easier and cheaper for researchers to conduct.

Researchers apply to TESS, which collects data for their experiments at no cost to them. For the data collection, TESS contracts with GfK Knowledge Networks, a survey panel that guarantees a sample representative of the U.S. population, and uses random-digit-dialing and address-based survey techniques.

A fast-growing alternative way to conduct survey experiments is by using a crowdsourcing platform such as Amazon’s Mechanical Turk (MTurk). Researchers can offer their surveys to anyone who signs up for the MTurk platform; Benefits of this model include convenience, speed, price, and a high volume of responses—researchers can pay participants as little as one cent per experiment. On the other hand, if the MTurk survey-takers were not representative of the greater U.S. population, then experiments conducted on the platform could be skewed.

Recognizing that MTurk’s capabilities made it a direct competitor to TESS, Freese and his colleagues--who included Druckman and several graduate students--decided to investigate the population participating in MTurk experiments, to see if both platforms produced similar experimental results.

Using information supplied by the participants themselves, Freese found that Mechanical Turk participants were mostly male. They also tended to be younger, more liberal, and had attended more college than the GfK sample.

Using comprehension checks, completion times, and other measures, Freese and graduate students Jill Weinberg and David McElhattan compared data quality between the GfK and MTurk samples. Though MTurk participants provided better quality data, Freese cautions against drawing conclusions from this because better quality data might not necessarily represent the entire population. Though it is certainly true that even in conducting a phone survey, a particular respondent could, for example, have had one too many to drink and subsequently provide poor-quality data; nonetheless, such data misfires are still representative of the entire population.

Freese, Druckman, and graduate student Kevin Mullinix then ran experiments on MTurk using vignettes, in which participants are asked to respond to hypothetical situations that replicated TESS experiments. Both MTurk and GfK respondents responded similarly in nearly all of these, with the exception of one. Freese attributes the sole difference to MTurk participants’ more homogeneous ages.

Likewise, when Freese, Druckman, and Mullinix replicated TESS experiments on MTurk that were not moderated by age or education, MTurk and GfK respondents responded similarly. But in experiments where age and education matter, GfK and MTurk responses differed greatly.

In short, the choice of a survey panel such as GfK or a crowdsourcing platform such as MTurk can affect an experiment’s results, Freese said. Still, the crowdsourcing model does offer exciting possibilities for researchers if one pays attention to the caveats.

“We’re bullish about the possibility of online labor markets for experimental platforms,” Freese concluded, adding that “continual study” is needed in case the demographics of platform users change.

Jeremy Freese is John and Ethel Lindgren Professor of Sociology and an IPR fellow.