Skip to main content

‘An Existential Crisis’ for Science

What IPR scholars are doing to solve the replication crisis

Get all our news

Subscribe to newsletter

I hope, but am not sure, that the better methods we now have for assessing replicability and for the design of ensembles of studies to assess replication will become more widely used.”

Larry Hedges
IPR statistician

image of multiple microscopes

In 2011, one of the American Psychological Association’s flagship, peer-reviewed journals published a study that claimed to prove that ESP—or extrasensory perception—exists.

The article, authored by Cornell psychologist Daryl Bem, ignited a firestorm of controversy in psychology, with many researchers questioning the finding. In tracing the article’s back story, Slate would go on to call the finding “methodologically sound and logically insane.” The event was a marker in what would come to be known as the “replication crisis”—a crisis that Stanford medical researcher John Ioannidis had started documenting as early as 2005.

In its most basic sense, the replication crisis refers to a pattern of scientists being unable to obtain the same results previous investigators found. At its most expansive, the crisis threatens the scientific enterprise itself, leading to questions not just about research practices and methods, but the very reliability of scientific results. It is what IPR education researcher and statistician Larry Hedges has called “an existential crisis” for science.

Where does addressing the replication crisis stand today? Coming from multiple disciplines, four of our faculty experts shed light on how science can repair itself.

Larry Hedges explains what replication means and why he sees assessing it as a “statistical problem.” Can artificial intelligence help? Brian Uzzi discusses using machine learning to predict replicability. Jennifer Tackett advocates for changes in research practice and scientific culture in clinical psychology and beyond. Jessica Hullman shows that researchers in both social science and machine learning can learn from each other’s mistakes and work to improve science.

Larry Hedges: Defining Replication and Its Methods

As a statistician, Larry Hedges examines the replication crisis from a methodological perspective and brings his broad experience across many disciplines, including sociology, psychology, and educational policy, to bear as well.

To understand the crisis, Hedges began by defining replication, which, as he told a 2017 American Institutes for Research audience, “is not as simple as it seems.” First, he says you have to distinguish “reproducibility” from “replicability.”

Larry HedgesTo reproduce scientific findings, an investigator takes the original experiment’s data and sees if analysis leads to the same results. To replicate scientific findings, a scientist runs the same experiment and collects new data, after which analysis leads to the same results.

Although the distinction between the two was not clear in the early years of the replication crisis, it is now widely accepted.

“Most scientists now distinguish reproducibility (Can you get the same answers that I did when you analyze my data?) from replicability (Can you get the same answers that I did when you do my experiment and collect your own data?),” Hedges said.

He noted that reproducibility is important and needed, but that replicability is much more difficult to achieve.

To assess replicability, Hedges applied statistical methods. In a series of articles, including some with former IPR postdoctoral scholar Jacob Schauer, now an assistant professor in biostatistics at Feinberg, he explored methods for assessing replication and the design of studies to replicate research in psychology and other fields.

Hedges explained, “Assessing replication is a statistical problem.”

When the replication crisis burst onto the scene, it seemed as if scientists only needed to redo experiments. Hedges’ statistical analysis, however, showed that this simple idea of replication—that is, seeing if doing the same experiment over gets the same results—often is doomed to yield ambiguous results. He also found that the benefits to doing the same study multiple times to test for replicability can have the same limitations unless the proper analysis is conducted.

Better methods to test for replication, Hedges showed, are similar to those used in meta-analysis. If ensembles of studies are performed together—rather than one or more repetitions of a single “original” study, meta-analysis can provide more reliable evaluations of replicability.

“Early empirical work on replication did not use appropriate statistical methods and did not use adequate designs,” Hedges said.

Since the early days of the replication crisis, Hedges notes, progress has been made in improving the scientific enterprise. In addition to understanding the distinction between reproducibility and replicability, scientists know they need to check a study’s replicability. Moreover, they have some good incentives to do so.

As Hedges pointed out, “A scientist can make a big splash by discovering that a famous result cannot be replicated.”

Additionally, preregistration, where scientists document their hypothesis and methodologies before launching into their research, and other types of registries for study protocols now exist, in addition to more elaborate supports, such as the Open Science Framework.

Hedges is cautiously optimistic: “I hope, but am not sure, that the better methods we now have for assessing replicability and for the design of ensembles of studies to assess replication will become more widely used.”

Brian Uzzi: Predicting Replicability

The replication crisis presented science with the big question of how much previous research was true. If studies could not be replicated, were their findings valid in the first place? Were there other ways to predict whether any given study could be replicated without undertaking costly new experiments?

When network scientist, sociologist, and IPR associate Brian Uzzi looked into such questions of replication, he identified two approaches to improve replicability. One would be to improve scientific methods. The other would be to develop procedures to review research that would accurately predict whether it could be replicated. In the last few years, he has focused on the second path—better prediction.

Uzzi used artificial intelligence (AI) to read scientific papers searching for cues that correlate with replicability.Brian Uzzi The model he and his colleagues Youyou Wu at University College London and Yang Yang at Notre Dame created worked quickly, and it correctly predicted replication outcomes 65–78% of the time.

“We wanted a system for self-assessment,” Uzzi told Kellogg Insight in 2020. “We begin with the belief that no scientist is trying to publish bad work. A scientist could write a paper and then put it through the algorithm to see what it thinks. And if it gives you a bad answer, maybe you need to go back and retrace your steps, because it’s a clue that something’s not right.”

In 2023, the researchers applied their machine-learning model’s algorithm to over 40,000 articles published over 20 years in top psychology journals. They learned that just over 40% of the studies were likely to replicate. Specific factors, such as the type of research method used, increased the odds of predicted replicability.

Uzzi and his collaborators found that experiments had only a 39% chance of replicability, while other types of research procedures had about a 50% chance. Their use of the algorithm allowed them to survey a large and diverse field of research.

Uzzi points to three advantages to using AI to tackle replicability. First, rather than looking at a study’s statistics, AI examines the text of the article.

“Most papers only have a handful of statistics,” he noted. “But the text, which is used to guide the readers in interpretation of the statistics in a paper, may involve 5,000 to 12,000 words, which we showed includes valuable, untapped information.”

Second, replication is too expensive to scale individually and manually.

Third, the algorithmic method, he noted, “is the only self-diagnostic tool currently available to help researchers overcome unavoidable blind spots.”

Uzzi and his colleagues are designing a website tool for scientists to score the likelihood of their research study being replicable.

Uzzi is positive about the future: “The error correcting mechanisms in science have been working, making results stronger for all.”

Jennifer Tackett: Improving Structure and Practice

Jennifer TackettPsychologist and IPR associate Jennifer Tackett has made the replication crisis part of her scholarly work for nearly a decade. She has worked to identify the issues and propose solutions for her own field of clinical psychology but does not view the problem in her own discipline as unique within psychology or the social sciences more broadly.

“The replication problems in clinical psychology aren't particularly distinctive compared to other areas of psychology or even other behavioral social sciences in general,” she said. “But certainly, I think, since the replication crisis emerged, it’s become clear that there are a lot of problematic practices that need to be changed.”

The replication crisis makes clear, Tackett explains, that scientific culture must change.

“The reason why this issue is so important is trust,” she said. “If we don’t have a scientific literature that we can trust, then it’s not doing its job, it’s not useful, and it might actually be counterproductive and create other problems.”

Tackett advocates for both immediate changes in practice and larger structural changes in incentives for scientific behavior to address the replication crisis.

In an overview published in the Annual Review of Clinical Psychology, Tackett and her colleagues lay out causes of the crisis and strategies to combat it, focusing on making research more open and transparent. Tools include open materials and open data, preregistration, registered reports, and multisite collaborative research studies.

In preregistration, researchers publicly identify the details of their planned study before beginning work. Tackett called preregistration “a good step in the right direction,” but noted it will not “solve everything.” It also puts a challenging burden on the researcher. It’s time-consuming, labor intensive, and therefore expensive to preregister.

“As somebody who's been preregistering for years now, I appreciate how much extra work it is,” Tackett said. “But I also appreciate how much it really does keep you from fishing around or fooling yourself into thinking you did things a certain way, after the fact, you know, when you have to really put things down in advance.”

Tackett is encouraged by younger scientists who are adopting improved methods to make replication easier and more accurate. However, she warns, the larger issues in science that the replication crisis uncovered should be addressed by structural changes in the incentives and culture of science.

“Very little of the problems in our replicability come from truly unethical behavior, fraud, or cheating,” she noted. “I think the vast majority of the problems do come from this more business-as-usual approach that's been conducted in these problematic ways for a really long time.”

She also calls on funders, publishers, and universities to reward better scientific research.

“The culture [of science] still prioritizes quantity over quality and innovation over rigor,” Tackett said. “If we don’t reward these behaviors, if we don’t find ways to restructure the way we do science, we’re never going to really fully see the kind of change we're looking for.”

Jessica Hullman: Machine Learning and Replication

Computer scientist and IPR associate Jessica Hullman began thinking about replication and reproducibility as a PhD student, which has motivated much of her research. She examines “uncertainty communication,” or the challenges of representing the inherently limited certainty that drawing inferences from data can provide.

Machine learning, a type of artificial intelligence, trains algorithms on data to predict new outcomes. Hullman teaches a course, “Explanation and Reproducibility in Data-Driven Science,” that examines replicability issues across both examples of explanatory modeling, like in social science research, and predictive modeling, like in machine learning research.

Hullman explains that machine learning depends on the data the computer is given. As in social scienceJessica Hullman research, the goal is to create a model of a process or mechanism that will predict outcomes. In both types of modeling, the researchers use the model with varying degrees of success in the real world. In both types of research, she notes, failures in reproducibility and robustness occur when conditions differ from the conditions that the model was trained on.

“To make robust predictive modeling systems, we still need to think about what types of factors the model is paying attention to, and whether these are a good approximation of the real world process we are trying to model," Hullman said. In cases where we want to train a model to do human-like tasks, like understand a passage of text or label the most important object in an image, we can’t avoid having to think about how we would expect a human to do the task.”

The common practice when training in machine learning is to withhold one dataset so that the model is not too attuned to specific data. However, the processes that are typically used by developers to ensure that a model can generalize to new sets of data are not enough, she says, to ensure that machine learning will accurately predict in new situations. For example, the machine learning behind self-driving cars has failed to read slightly altered stop signs on the road, a mistake human drivers do not make.

“We are always limited by our own ability to imagine how conditions might differ from those we studied,” she said.

In her research, Hullman finds similarities between the replication crisis in fields such as psychology and in machine learning. In both, people may put too much faith in theory or a particular procedure, such as randomization, and act as if it is foolproof. They also tend to see their results as either-or, yes-or-no, rather than acknowledging sources of uncertainty. Most importantly, they may overgeneralize from their study sample in social science or the dataset they input in machine learning, leading to claimed effects that cannot hold up when others attempt to replicate them in different circumstances.

Hullman is encouraged by improvements the open science movement has brought about, such as making data and analysis code widely available. She also appreciates that the replication crisis has sparked greater attention to statistical methods.

However, “I would like to see more progress in the form of greater sensitivity to how little we still know about how to improve science,” she said. “Some of the proposed reforms to science are based on relatively simplistic understandings of what makes good science.”

It is vital, Hullman explains, to define carefully and rigorously what needs to be fixed in science. Even the idea of replicability as a hallmark of good science has its limitations. Just because an experiment’s effects can be replicated does not mean that those effects represent scientific truths.

“If effects vary depending on who you are studying—as we should expect—then it doesn’t make sense to think that we’ll see the same result every time we run the same study,” she said. “There may be subtle differences in the sample of people that explain the difference, which doesn’t necessarily mean there isn’t an effect—it’s just not as simple as estimating an average effect on one sample and expecting it to transfer outside of that particular set of experimental participants.”

Where We Go from Here

Science rests, as the science philosopher Karl Popper wrote, on repetition and testing. IPR scholars are intent on bettering that process.

“It’s our responsibility as scientists to figure out how to build a more trustworthy, credible scientific base that serves policymakers, serves the public, serves patients,” Tackett said. “People turn to the science for answers, and we need to feel confident in the answers we’re providing.”

Larry Hedges is Board of Trustees Professor of Statistics and Education and Social Policy, professor of psychology, and of medical social sciences, and an IPR fellow. Jessica Hullman is Ginni Rometty Professor, associate professor of computer science, and an IPR associate. Jennifer Tackett is professor of psychology and an IPR associate. Brian Uzzi is Richard L. Thomas Professor of Leadership and Organizational Change, co-director of Northwestern Institute on Complex Systems (NICO), and an IPR associate.

Graphic by Ysa Quiballo

Published: February 28, 2024.