Peer Review and Replication Are Important for the Progression of Science
-
Loading metrics
What is replication?
- Brian A. Nosek,
- Timothy M. Errington
10
- Published: March 27, 2020
- https://doi.org/10.1371/journal.pbio.3000691
Figures
Abstract
Credibility of scientific claims is established with evidence for their replicability using new information. According to common understanding, replication is repeating a study's procedure and observing whether the prior finding recurs. This definition is intuitive, easy to apply, and incorrect. Nosotros advise that replication is a study for which any issue would be considered diagnostic prove about a claim from prior research. This definition reduces accent on operational characteristics of the study and increases emphasis on the interpretation of possible outcomes. The purpose of replication is to advance theory past confronting existing understanding with new evidence. Ironically, the value of replication may be strongest when existing understanding is weakest. Successful replication provides evidence of generalizability beyond the weather condition that inevitably differ from the original study; Unsuccessful replication indicates that the reliability of the finding may be more constrained than recognized previously. Defining replication every bit a confrontation of current theoretical expectations clarifies its important, heady, and generative role in scientific progress.
Commendation: Nosek BA, Errington TM (2020) What is replication? PLoS Biol 18(three): e3000691. https://doi.org/10.1371/journal.pbio.3000691
Published: March 27, 2020
Copyright: © 2020 Nosek, Errington. This is an open access commodity distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by grants from Arnold Ventures, John Templeton Foundation, Templeton Earth Charity Foundation, and Templeton Faith Trust. The funders had no role in the preparation of the manuscript or the conclusion to publish.
Competing interests: We have read the journal'southward policy and the authors of this manuscript take the following competing interests: BAN and TME are employees of the Center for Open Science, a nonprofit applied science and culture change arrangement with a mission to increase openness, integrity, and reproducibility of research.
Provenance: Commissioned; not externally peer reviewed.
Introduction
Credibility of scientific claims is established with evidence for their replicability using new data [ane]. This is distinct from retesting a merits using the same analyses and same data (unremarkably referred to as reproducibility or computational reproducibility) and using the same data with different analyses (usually referred to equally robustness). Recent attempts to systematically replicate published claims bespeak surprisingly low success rates. For example, across 6 contempo replication efforts of 190 claims in the social and behavioral sciences, 90 (47%) replicated successfully according to each report's primary success benchmark [2]. As well, a large-sample review of 18 candidate cistron or candidate factor-past-interaction hypotheses for depression institute no back up for whatever of them [3], a particularly stunning result considering that more than than ane,000 articles accept investigated their effects. Replication challenges have spawned initiatives to improve research rigor and transparency such as preregistration and open information, materials, and code [4–half dozen]. Simultaneously, failures-to-replicate take spurred debate about the significant of replication and its implications for enquiry credibility. Replications are inevitably different from the original studies. How do we decide whether something is a replication? The answer shifts the conception of replication from a boring, uncreative, housekeeping action to an exciting, generative, vital correspondent to research progress.
Replication reconsidered
Co-ordinate to mutual agreement, replication is repeating a study's procedure and observing whether the prior finding recurs [7]. This definition of replication is intuitive, like shooting fish in a barrel to apply, and incorrect.
The problem is this definition'south accent on repetition of the technical methods—the process, protocol, or manipulated and measured events. Why is that a trouble? Imagine an original behavioral study was conducted in the United States in English. What if the replication is to be done in the Philippines with a Tagalog-speaking sample? To be a replication, must the materials exist administered in English? With no revisions for the cultural context? If minor changes are allowed, then what counts as minor to nonetheless authorize as repeating the procedure? More than broadly, it is not possible to recreate an earthquake, a supernova, the Pleistocene, or an election. If replication requires repeating the manipulated or measured events of the study, and then information technology is non possible to conduct replications in observational research or research on past events.
The repetition of the study procedures is an highly-seasoned definition of replication considering it oft corresponds to what researchers exercise when conducting a replication—i.eastward., faithfully follow the original methods and procedures equally closely as possible. But the reason for doing and then is not because repeating procedures defines replication. Replications often repeat procedures because theories are as well vague and methods too poorly understood to productively comport replications and accelerate theoretical agreement otherwise [8].
Prior commentators accept drawn distinctions between types of replication such as "direct" versus "conceptual" replication and argue in favor of valuing i over the other (e.g., [9, 10]). By contrast, we argue that distinctions between "direct" and "conceptual" are at to the lowest degree irrelevant and peradventure counterproductive for understanding replication and its part in advancing cognition. Procedural definitions of replication are masks for underdeveloped theoretical expectations, and "conceptual replications" as they are identified in practice often fail to see the criteria we develop here and deem essential for a exam to qualify as a replication.
Replication redux
We advise an alternative definition for replication that is more than inclusive of all enquiry and more relevant for the function of replication in advancing knowledge. Replication is a written report for which any upshot would be considered diagnostic prove about a claim from prior research. This definition reduces emphasis on operational characteristics of the study and increases emphasis on the interpretation of possible outcomes.
To exist a replication, two things must exist true: outcomes consequent with a prior claim would increase conviction in the claim, and outcomes inconsistent with a prior claim would decrease confidence in the claim. The symmetry promotes replication as a machinery for confronting prior claims with new evidence. Therefore, declaring that a report is a replication is a theoretical delivery. Replication provides the opportunity to examination whether existing theories, hypotheses, or models are able to predict outcomes that have non nevertheless been observed. Successful replications increment confidence in those models; unsuccessful replications decrease confidence and spur theoretical innovation to ameliorate or discard the model. This does not imply that the magnitude of belief change is symmetrical for "successes" and "failures." Prior and existing evidence inform the extent to which replication outcomes alter beliefs. All the same, equally a theoretical commitment, replication does imply precommitment to taking all outcomes seriously.
Because replication is divers based on theoretical expectations, non anybody volition agree that one study is a replication of another. Moreover, it is not always possible to make precommitments to the diagnosticity of a study as a replication, frequently for the uncomplicated reason that study outcomes are already known. Deciding whether studies are replications subsequently observing the outcomes can leverage post hoc reasoning biases to dismiss "failures" as nonreplications and "successes" as diagnostic tests of the claims, or the reverse if the observer wishes to ignominy the claims. This tin unproductively retard research progress by dismissing replication counterevidence. Simultaneously, replications can fail to see their intended diagnostic aims considering of error or malfunction in the process that is but identifiable after the fact. When there is incertitude almost the status of claims and the quality of methods, there is no easy solution to distinguishing betwixt motivated and principled reasoning nearly evidence. Scientific discipline's most effective solution is to replicate, again.
At its best, science minimizes the touch on of ideological commitments and reasoning biases by being an open, social enterprise. To attain that, researchers should be rewarded for articulating their theories clearly and a priori and then that they can be productively confronted with bear witness [4,vi]. Better theories are those that brand it clear how they can be supported and challenged by replication. Repeated replication is often necessary to resolve confidence in a claim, and, invariably, researchers volition have plenty to argue about even when replication and precommitment are normative practices.
Replication resolved
The purpose of replication is to advance theory past confronting existing understanding with new evidence. Ironically, the value of replication may be strongest when existing understanding is weakest. Theory advances in fits and starts with conceptual leaps, unexpected observations, and a patchwork of show. That is okay; it is fuzzy at the frontiers of cognition. The dialogue between theory and prove facilitates identification of contours, constraints, and expectations almost the phenomena under study. Replicable evidence provides anchors for that iterative process. If evidence is replicable, and then theory must eventually business relationship for information technology, even if only to dismiss it as irrelevant because of invalidity of the methods. For example, the claims that at that place are more obese people in wealthier countries compared with poorer countries on average and that people in wealthier countries live longer than people in poorer countries on average could both be highly replicable. All theoretical perspectives about the relations between wealth, obesity, and longevity would accept to business relationship for those replicable claims.
At that place is no such affair as exact replication. We cannot reproduce an earthquake, era, or election, only replication is not about repeating historical events. Replication is about identifying the conditions sufficient for assessing prior claims. Replication can occur in observational research when the conditions presumed essential for observing the evidence recur, such equally when a new seismic event has the characteristics deemed necessary and sufficient to observe an outcome predicted by a prior theory or when a new method for reassessing a fossil offers an independent test of existing claims about that fossil. Fifty-fifty in experimental research, original and replication studies inevitably differ in some aspects of the sample—or units—from which data are collected, the treatments that are administered, the outcomes that are measured, and the settings in which the studies are conducted [eleven].
Individual studies do not provide comprehensive or definitive evidence nearly all atmospheric condition for observing evidence about claims. The gaps are filled with theory. A single report examines only a subset of units, treatments, outcomes, and settings. The study was conducted in a particular climate, at detail times of twenty-four hour period, at a particular signal in history, with a particular measurement method, using particular assessments, with a particular sample. Rarely practice researchers limit their inference to precisely those conditions. If they did, scientific claims would be historical claims because those precise conditions volition never recur. If a claim is thought to reveal a regularity nigh the world, so it is inevitably generalizing to situations that take non notwithstanding been observed. The fundamental question is: of the innumerable variations in units, treatments, outcomes, and settings, which ones matter? Time-of-mean solar day for data collection may be expected to exist irrelevant for a merits about personality and parenting or critical for a claim nigh circadian rhythms and inhibition.
When theories are too immature to brand clear predictions, repetition of original procedures becomes very useful. Using the same procedures is an interim solution for non having clear theoretical specification of what is needed to produce evidence almost a claim. And, using the same procedures reduces doubt about what qualifies equally evidence "consistent with" earlier claims. Replication is not about the procedures per se, simply using similar procedures reduces uncertainty in the universe of possible units, treatments, outcomes, and settings that could be of import for the claim.
Because there is no exact replication, every replication test assesses generalizability to the new study'due south unique conditions. However, every generalizability examination is not a replication. Fig 1'due south left panel illustrates a discovery and conditions around information technology to which information technology is potentially generalizable. The generalizability infinite is large because of theoretical immaturity; in that location are many conditions in which the claim might be supported, but failures would not discredit the original merits. Fig 1's right panel illustrates a maturing understanding of the merits. The generalizability infinite has shrunk because some tests identified boundary conditions (grey tests), and the replicability space has increased considering successful replications and generalizations (colored tests) accept improved theoretical specification for when replicability is expected.
For underspecified theories, there is a larger space for which the claim may or may not be supported—the theory does not provide clear expectations. These are generalizability tests. Testing replicability is a subset of testing generalizability. Every bit theory specification improves (moving from left console to right panel), usually interactively with repeated testing, the generalizability and replicability space converge. Failures-to-replicate or generalize shrink the space (dotted circumvolve shows original plausible space). Successful replications and generalizations aggrandize the replicability space—i.eastward., broadening and strengthening commitments to replicability across units, treatments, outcomes, and settings.
Successful replication provides evidence of generalizability beyond the weather condition that inevitably differ from the original study; unsuccessful replication indicates that the reliability of the finding may exist more constrained than recognized previously. Repeatedly testing replicability and generalizability across units, treatments, outcomes, and settings facilitates improvement in theoretical specificity and futurity prediction.
Theoretical maturation is illustrated in Fig 2. A progressive inquiry program (the left path) succeeds in replicating findings across weather condition presumed to be irrelevant and also matures the theoretical account to more conspicuously distinguish atmospheric condition for which the phenomenon is expected to exist observed or not observed. This is illustrated by a shrinking generalizability space in which the theory does not brand articulate predictions. A degenerative enquiry program (the right path) persistently fails to replicate the findings and progressively narrows the universe of weather to which the claim could apply. This is illustrated by shrinking generalizability and replicability space because the theory must be constrained to ever narrowing conditions [12].
With progressive success (left path) theoretical expectations mature, clarifying when replicability is expected. Also, boundary conditions become clearer, reducing the potential generalizability space. A complete theoretical account eliminates generalizability space because the theoretical expectations are so clear and precise that all tests are replication tests. With repeated failures (right path) the generalizability and replicability space both shrink, eventually to a theory so weak that information technology makes no commitments to replicability.
This exposes an inevitable ambiguity in failures-to-replicate. Was the original evidence a false positive or the replication a false negative, or does the replication identify a boundary condition of the claim? We tin can never know for certain that earlier evidence was a false positive. It is always possible that it was "real," and we cannot identify or recreate the atmospheric condition necessary to replicate successfully. Simply that does not hateful that all claims are true, and science cannot be self-correcting. Accumulating failures-to-replicate could consequence in a much narrower but more precise set of circumstances in which evidence for the merits is replicable, or it may result in failure to ever establish weather condition for replicability and relegate the merits to irrelevance.
The ambivalence betwixt disconfirming an original claim or identifying a boundary status also ways that understanding whether or non a study is a replication can change due to accumulation of cognition. For example, the famous experiment by Otto Loewi (1936 Nobel Prize in Physiology or Medicine) showed that the inhibitory factor "vagusstoff," later on determined to be acetylcholine, was released from the vagus nerve of frogs, suggesting that neurotransmission was a chemical process. Much subsequently, after his and others' failures-to-replicate his original claim, a crucial theoretical insight identified that the time of year at which Loewi performed his experiment was disquisitional to its success [13]. The original study was performed with so-called winter frogs. The replication attempts performed with summertime frogs failed because of seasonal sensitivity of the frog heart to the unrecognized acetylcholine, making the effects of vagal stimulation far more than difficult to demonstrate. With subsequent tests providing supporting bear witness, the understanding of the claim improved. What had been perceived as replications were not anymore because new bear witness demonstrated that they were non studying the same thing. The theoretical understanding evolved, and subsequent replications supported the revised claims. That is not a trouble, that is progress.
Replication is rare
The term "conceptual replication" has been applied to studies that utilize different methods to examination the same question as a prior written report. This is a useful enquiry activity for advancing agreement, but many studies with this label are not replications by our definition. Recollect that "to be a replication, 2 things must be true: outcomes consistent with a prior merits would increment confidence in the merits, and outcomes inconsistent with a prior claim would decrease conviction in the merits." Many "conceptual replications" come across the first benchmark and fail the 2d. That is, they are not designed such that a failure to replicate would revise confidence in the original claim. Instead, "conceptual replications" are often generalizability tests. Failures are interpreted, at most, as identifying boundary weather condition. A cocky-cess of whether ane is testing replicability or generalizability is answering—would an outcome inconsistent with prior findings cause me to lose confidence in the theoretical claims? If no, then it is a generalizability test.
Designing a replication with a dissimilar methodology requires understanding of the theory and methods and then that whatever issue is considered diagnostic testify near the prior claim. In do, this means that replication is often limited to relatively close adherence to original methods for topics in which theory and methodology is immature—a circumstance commonly called "direct" or "shut" replication—because the similarity of methods serves as a stand-in for theoretical and measurement precision. In fact, conducting a replication of a prior claim with a different methodology can be considered a milestone for theoretical and methodological maturity.
Conclusion
Replication is characterized every bit the boring, rote, clean-up work of science. This misperception makes funders reluctant to fund information technology, journals reluctant to publish information technology, and institutions reluctant to reward information technology. The disincentives for replication are a likely contributor to existing challenges of brownie and replicability of published claims [xiv].
Defining replication equally a confrontation of current theoretical expectations clarifies its of import, heady, and generative function in scientific progress. Single studies, whether they pursue novel ends or face up existing expectations, never definitively confirm or disconfirm theories. Theories make predictions; replications exam those predictions. Outcomes from replications are forage for refining, altering, or extending theory to generate new predictions. Replication is a primal part of the iterative maturing cycle of description, prediction, and caption. A shift in attitude that includes replication in funding, publication, and career opportunities will accelerate research progress.
Acknowledgments
Nosotros give thanks Alex Holcombe, Laura Scherer, Leonhard Held, and Don van Ravenzwaaij for comments on earlier versions of this paper, and we thank Anne Anecdote for graphic design back up.
References
- 1. Schmidt South. Shall nosotros really practise it over again? The powerful concept of replication is neglected in the social sciences. Rev Gen Psychol. 2009;13(two): 90–100. https://doi.org/10.1037/a0015108
- View Commodity
- Google Scholar
- 2. Camerer CF, Dreber A, Holzmeister F, Ho T-H, Huber J, Johannesson M, et al. Evaluating Replicability of Social Science Experiments in Nature and Science between 2010 and 2015. Nat Hum Behav. 2018;2: 637–644. pmid:31346273
- View Article
- PubMed/NCBI
- Google Scholar
- 3. Border R, Johnson EC, Evans LM, Smolen A, Berley N, Sullivan PF, et al. No back up for historical candidate gene or candidate gene-by-interaction hypotheses for major depression across multiple large samples. Am J Psychiatry. 2019;176(5): 376–387. https://doi.org/10.1176/appi.ajp.2018.18070881 pmid:30845820
- View Article
- PubMed/NCBI
- Google Scholar
- 4. Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, Percie du Sert N, et al. A manifesto for reproducible scientific discipline. Nat Hum Behav. 2017;1: 0021. https://doi.org/10.1038/s41562-016-0021
- View Article
- Google Scholar
- 5. Nosek BA, Alter Yard, Banks GC, Borsboom D, Bowman SD, Breckler SJ, et al. Promoting an open research culture. Science. 2015;348(6242): 1422–1425. pmid:26113702
- View Article
- PubMed/NCBI
- Google Scholar
- half dozen. Nosek BA, Ebersole CR, DeHaven A, Mellor DM. The preregistration revolution. Proc Natl Acad Sci U Southward A. 2018;115(11): 2600–2606. https://doi.org/ten.1073/pnas.1708274114 pmid:29531091
- View Article
- PubMed/NCBI
- Google Scholar
- 7. Jeffreys H. Scientific Inference. 3rd ed. Cambridge: Cambridge University Press; 1973.
- 8. Muthukrishna One thousand, Henrich J. A problem in theory. Nat Hum Behav. 2019;three: 221–229. pmid:30953018
- View Article
- PubMed/NCBI
- Google Scholar
- 9. Crandall CS, Sherman JW. On the scientific superiority of conceptual replications for scientific progress. J Exp Soc Psychol. 2016;66: 93–99.
- View Commodity
- Google Scholar
- 10. Stroebe W, Strack F. The alleged crisis and illusion of exact replication. Perspect Psychol Sci. 2014;9(1): 59–71. pmid:26173241
- View Article
- PubMed/NCBI
- Google Scholar
- eleven. Shadish WR, Chelimsky E, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. 2nd ed. Boston: Houghton Mifflin; 2002.
- 12. Lakatos I. Falsification and the Methodology of Scientific Research Programmes. In: Harding SG, editors. Can Theories be Refuted? Synthese Library (Monographs on Epistemology, Logic, Methodology, Philosophy of Science, Sociology of Science and of Cognition, and on the Mathematical Methods of Social and Behavioral Sciences). Dordrecht: Springer;1976. p. 205–259. https://doi.org/10.1007/978-94-010-1863-0_14
- xiii. Bain WA. A method of demonstrating the humoral manual of the furnishings of cardiac vagus stimulation in the frog. Q J Exp Physiol. 1932;22(3): 269–274. https://doi.org/ten.1113/expphysiol.1932.sp000574
- View Article
- Google Scholar
- fourteen. Nosek BA, Spies JR, Motyl M. Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspect Psychol Sci. 2012;vii(vi): 615–631. pmid:26168121
- View Article
- PubMed/NCBI
- Google Scholar
Source: https://journals.plos.org/plosbiology/article?id=10.1371%2Fjournal.pbio.3000691
0 Response to "Peer Review and Replication Are Important for the Progression of Science"
Post a Comment