Report Puts Scientific Replicability and Reproducibility Under the Microscope

MAY 23, 2019

A National Academies report released this month delves into issues pertaining to achieving reproducibility and replicability in scientific results. While encouraging reforms to scientific practices, it stresses that scientific confidence rests less in verifying individual results than in accumulating evidence across studies.

William Thomas

Spencer R. Weart Director of Research in History, Policy, and Culture

Workers remove Albert Herter’s mural of the founders of the National Academy of Sciences for preservation work. — Should science be refurbished for the 21st century? A National Academies study kicking off this week will investigate the problem of research replicability and reproducibility in the sciences. In this picture from 2009, workers deinstall Albert Herter’s mural of the founders of the National Academy of Sciences for preservation work.

(Image credit – Cultural Programs of the National Academy of Sciences)

The National Academies released a report this month that examines how reproducibility and replicability function as touchstones of scientific rigor and unpacks the challenges scientists encounter in upholding them.

Congress required the National Science Foundation to initiate the study two years ago through the American Innovation and Competitiveness Act, responding to widespread concerns about the reliability of scientific findings in fields such as biomedicine and psychology. The effort received additional financial support from the Sloan Foundation and the study committee was chaired by Gordon and Betty Moore Foundation President Harvey Fineberg.

Noting the concepts of “reproducibility” and “replicability” are variable and often conflated, the report defines reproduction as independently deriving a study’s quantitative results from its original data set, and replication as obtaining consistent results across different studies that have separate data sets. It makes recommendations on how to bolster replicability and reproducibility, but also stresses that scientific reliability does not hinge on the validity of particular scientific results.

The report does not attempt to identify which fields may have particular troubles with reproducibility and replicability, and it generally refrains from evaluating how widespread such troubles may be in science, citing the formidable difficulty of doing so. However, speaking at a webinar marking the report’s release, Fineberg dismissed suggestions that these troubles constitute a “crisis” for science, saying,

The committee found there is no crisis but also no time for complacency with respect to reproducibility and replicability in science. Improvements are needed.

Replication of results not always essential, report advises

Replicability presents scientists with a more “nuanced” problem than reproducibility, the report states. It explains, “A successful replication does not guarantee that the original scientific results of a study were correct, nor does a single failed replication conclusively refute the original claims. Furthermore, a failure to replicate can be due to any number of factors, including the discovery of new phenomena, unrecognized inherent variability in the system, inability to control complex variables, and substandard research practices, as well as misconduct.”

The report stresses that in some cases failures to replicate can benefit science by spotlighting problems requiring attention, whereas in others it simply reflects poor methodology. However, the report also emphasizes that it is not always straightforward to establish whether a replication has been successful since no two results are likely to be identical.

The difficulties of affirming a successful replication highlights the importance of researchers’ characterizations of how uncertainty manifests in their studies, according to the report. For instance, the report recommends that researchers more carefully and clearly distinguish how their results reflect inherent randomness versus limitations in their ability to control variables or measure phenomena. The report also laments the “frequent misuse of statistics,” such as the careless invocation of p-values to assess the statistical significance of research results.

The report recommends that funding agencies explicitly consider replicability and reproducibility concerns in the merit review process, requiring grant applications to include “thoughtful discussion” of how uncertainties will be handled. In addition, it broadly supports efforts to encourage more transparency in science, observing that sharing data, methods, code, and other components of research allows other scientists both to better assess studies’ quality and replicate their results. The report points guardedly to some journals’ recent commitment to publish replication studies and null results, noting it will be “useful” to see whether such models will prove viable.

However, the report also emphasizes that parsing the reasons for the non-replication of a result “requires time and resources and is often not a trivial undertaking.” And it generally suggests it is inadvisable to suppose every study should be replicated, stating,

A predominant focus on the replicability of individual studies is an inefficient way to assure the reliability of scientific knowledge. Rather, reviews of cumulative evidence on a subject, to assess both the overall effect size and generalizability, is often a more useful way to gain confidence in the state of scientific knowledge.

Computational innovation leading to reproducibility difficulties

The report asserts that reproducibility has become a more pressing issue in science as computation and shared data have taken on a prominence that was “unthinkable” as little as two decades ago. It observes, “While the abundance of data and widespread use of computation have transformed most disciplines and have enabled important scientific discoveries, the revolution is not yet reflected in how scientific results aided by computations are reported, published, and shared.”

To improve reproducibility, the report recommends NSF provide grant recipients with guidance regarding how to identify and use trusted open repositories, that it work to harmonize repository criteria and data management plans with other funding agencies, and that it consider requiring grantees’ data management plans to encompass other digital artifacts such as software.

Even when a project’s data are publicly available, the report notes that analytical methods described in publications often provide insufficient guidance to reproduce reported results. Accordingly, it recommends researchers make available “clear, specific, and complete information” about their methods, data products, and computational tools, in addition to their data. It also calls on educational institutions, professional societies, and researchers to encourage and facilitate practices that enable data reproducibility.

The report observes that, while the traditional goal of reproduction is to obtain a “bitwise identical numeric result” to the one reported, the rising use of analytical tools that use nondeterministic algorithms, such as machine learning, means precise reproduction can no longer always be expected. It therefore recommends NSF consider supporting research that “explores the limits of computational reproducibility” to help establish what constitutes a valid reproduction in such cases.

Importance of ‘meta-research’ stressed

Given the resources required to systematically reproduce and replicate scientific results, the report recommends such efforts be constrained to instances in which the context of a result endows it with special significance. For example, researchers, funders, and stakeholders might devote resources to replication studies if the original result is important for policy decisions, pertains to a controversial topic, is particularly surprising, or it establishes a basis for future “expensive and important” studies.

However, more than reproduction and replication, the report finds that synthetic exercises play a key role in building confidence in scientific knowledge. Synthesis, it states, “addresses the central question of how the results of studies relate to each other, what factors may be contributing to variability across studies, and how study results coalesce or not in developing the knowledge network for a particular science domain.” The report calls for the further development and use of “meta-research,” which it calls a “new field” that includes but is not limited to the kinds of quantitative analyses associated with the term “meta-analysis.”

Addressing concerns that reports about reproducibility and replicability problems have tarnished the reputation of science, the report notes that surveys show public trust in science has for decades remained strong and stable relative to other U.S. institutions. It observes that “there is currently limited evidence that media coverage of a ‘replication crisis’ has significantly influenced public opinion.”

The report does note that surveys have found significant public skepticism about how scientific results are reported in the media. Consistent with its overall message that the strength of science rests in the accumulation of diverse bodies of evidence, the report urges scientists and journalists alike to avoid ascribing undue significance to individual studies.