Over half of psychology studies fail reproducibility test

doi:doi:10.1038/nature.2015.18248

Nature | News

Over half of psychology studies fail reproducibility test

Largest replication study to date casts doubt on many published positive results.

Monya Baker

27 August 2015

Brian Nosek's team set out to replicate scores of studies.

Don’t trust everything you read in the psychology literature. In fact, two thirds of it should probably be distrusted.

In the biggest project of its kind, Brian Nosek, a social psychologist and head of the Center for Open Science in Charlottesville, Virginia, and 269 co-authors repeated work reported in 98 original papers from three psychology journals, to see if they independently came up with the same results.

The studies they took on ranged from whether expressing insecurities perpetuates them to differences in how children and adults respond to fear stimuli, to effective ways to teach arithmetic.

According to the replicators' qualitative assessments, as previously reported by Nature, only 39 of the 100 replication attempts were successful. (There were 100 completed replication attempts on the 98 papers, as in two cases replication efforts were duplicated by separate teams.) But whether a replication attempt is considered successful is not straightforward. Today in Science, the team report the multiple different measures they used to answer this question¹.

The 39% figure derives from the team's subjective assessments of success or failure (see graphic, 'Reliability test'). Another method assessed whether a statistically significant effect could be found, and produced an even bleaker result. Whereas 97% of the original studies found a significant effect, only 36% of replication studies found significant results. The team also found that the average size of the effects found in the replicated studies was only half that reported in the original studies.

There is no way of knowing whether any individual paper is true or false from this work, says Nosek. Either the original or the replication work could be flawed, or crucial differences between the two might be unappreciated. Overall, however, the project points to widespread publication of work that does not stand up to scrutiny.

Although Nosek is quick to say that most resources should be funnelled towards new research, he suggests that a mere 3% of scientific funding devoted to replication could make a big difference. The current amount, he says, is near-zero.

Replication failure

The work is part of the Reproducibility Project, launched in 2011 amid high-profile reports of fraud and faulty statistical analysis that led to an identity crisis in psychology.

John Ioannidis, an epidemiologist at Stanford University in California, says that the true replication-failure rate could exceed 80%, even higher than Nosek's study suggests. This is because the Reproducibility Project targeted work in highly respected journals, the original scientists worked closely with the replicators, and replicating teams generally opted for papers employing relatively easy methods — all things that should have made replication easier.

But, he adds, “We can really use it to improve the situation rather than just lament the situation. The mere fact that that collaboration happened at such a large scale suggests that scientists are willing to move in the direction of improving.”

The work published in Science is different from previous papers on replication because the team actually replicated such a large swathe of experiments, says Andrew Gelman, a statistician at Columbia University in New York. In the past, some researchers dismissed indications of widespread problems because they involved small replication efforts or were based on statistical simulations.

But they will have a harder time shrugging off the latest study, says Gelman. “This is empirical evidence, not a theoretical argument. The value of this project is that hopefully people will be less confident about their claims.”

Publication bias

The point, says Nosek, is not to critique individual papers but to gauge just how much bias drives publication in psychology. For instance, boring but accurate studies may never get published, or researchers may achieve intriguing results less by documenting true effects than by hitting the statistical jackpot; finding a significant result by sheer luck or trying various analytical methods until something pans out.

Nosek believes that other scientific fields are likely to have much in common with psychology. One analysis found that only 6 of 53 high-profile papers in cancer biology could be reproduced² and a related reproducibility project in cancer biology is currently under way. The incentives to find results worthy of high-profile publications are very strong in all fields, and can spur people to lose objectivity. “If this occurs on a broad scale, then the published literature may be more beautiful than reality," says Nosek.

The results published today should spark a broader debate about optimal scientific practice and publishing, says Betsy Levy Paluck, a social psychologist at Princeton University in New Jersey. “It says we don't know the balance between innovation and replication.”

The fact that the study was published in a prestigious journal will encourage further scholarship, she says, and shows that now “replication is being promoted as a responsible and interesting line of enquiry”.

Journal name:: Nature
DOI:: doi:10.1038/nature.2015.18248

Tweet Follow @NatureNews

References

Open Science Collaboration. Science http://dx.doi.org/10.1126/science.aac4716 (2015).
Begley, C. G. & Ellis, L. M. Nature 483, 531–533 (2012)

For the best commenting experience, please login or register as a user and agree to our Community Guidelines. You will be re-directed back to this page where you will see comments updating in real-time and have the ability to recommend comments to other users.

Comments for this thread are now closed.

Comments

12 comments Subscribe to comments

Marin Panovic • 2015-11-13 11:24 AM

Dr. Nosek work definitely goes to 61% result in his research before reading his methodology and hypothesis, no great scientist, no average or no mentally retarded scientist believes that her or his word is final, so theoretically Dr. Nosek can prove 100% research unreliable, or unconservative or Dr. Nosek can prove that we live in year 7525 of Byzantine calendar, that research is 100% reliable, no believer doubts, in science on the other hand the question is important, if the first answer is incorrect we'll get correct answer later, there is no correct answer to question not made, what is he trying to say is that he is white anglo saxon protestant man and those who want to prove him evolutionary not superior shouldn't even dare
- Share to Twitter Share to Facebook Share link to this comment
jack guy • 2015-09-17 12:46 AM

The study has problems. Read this response by Dr. Jenny Davis. http://thesocietypages.org/cyborgology/2015/09/08/the-reproducibility-projects-fatal-design-flaw/
- Share to Twitter Share to Facebook Share link to this comment
Boris Shmagin • 2015-09-03 09:26 PM

The coordinate systems are the point to start and then discuss reproducibility. Mathematics, technology and natural sciences have different coordinate systems. Mathematics has the most logical and reproducible cases. They are abstract and exist as cultural events. Technology creates sophisticated objects, reproducibility of which is a goal and the difference in their properties (errors) might be very small. This is not the case for natural object like human. This topic was special considered for natural object like river watershed: https://www.researchgate.net/publication/268334171_Modeling_the_Nature_System_Analysis_for_Knowledge__its_Uncertainty https://www.researchgate.net/publication/264555209_Hydrology_Modeling_an_Uncertainty
- Share to Twitter Share to Facebook Share link to this comment
Peter MetaSkeptic • 2015-09-04 05:24 AM

reading your comment, I can't help myself thinking about Sokal & Bricmont's book "Intellectual Impostures".
- Share to Twitter Share to Facebook Share link to this comment
Boris Shmagin • 2015-09-09 08:32 PM

Peter I put my name because my comment based of my results
- Share to Twitter Share to Facebook Share link to this comment
Peter MetaSkeptic • 2015-09-12 07:20 PM

Putting its own name means that you're ready to defend your view and I respect that. However it doesn't mean that I have to agree with your point of view. Obvious statement, isn't it. we won't settle our argument here. That's the pitfall of comments. I wish you the best in your research. Sincerely, Peter.
- Share to Twitter Share to Facebook Share link to this comment
This comment was deleted.
Peter MetaSkeptic • 2015-09-07 05:41 AM

Argumentum ad hominem. What a surprise! You could have ask me why I found the lack of clarity of the comment above misleading, but that option has not crossed your mind. Clear expression of ideas, concepts, theories, solutions, problems, ... is required in science and most scientists are trying to do just that, because there is a link with intellectual honesty.
- Share to Twitter Share to Facebook Share link to this comment
Anna Neumann • 2015-09-03 04:15 PM

"But contrary to the implication of the Reproducibility Project, there is no replication crisis in psychology. The “crisis” may simply be the result of a misunderstanding of what science is." Dr. Lisa Feldman Barrett offers a sound response to said "crisis" in a NY Times op-ed this week http://www.nytimes.com/2015/09/01/opinion/psychology-is-not-in-crisis.html?_r=1
- Share to Twitter Share to Facebook Share link to this comment
Peter MetaSkeptic • 2015-09-04 05:14 AM

It reminds me of an old philosophical trick. When reality isn't on your side, redefine it until the new reality you invent can cope with your theories. As Richard Feynman said what we forgot to teach explicitly in science is a kind of utter honesty
- Share to Twitter Share to Facebook Share link to this comment
Richard Plant • 2015-09-03 03:20 PM

We’ve been saying this for years in relation to Psychology experiments administered using computers. Put simply researchers may not be doing what they think they are doing when they present a stimulus; synchronise with other equipment, e.g. fMRI, EEG, eye trackers; and record Reaction Times. We’d like to see researchers actually publish timing validation data with their papers to prove the figures they quote are accurate. The majority of researchers simply don’t have any insight into this or how their equipment really works and that’s before you get onto the statistics! Some training could certainly help here. At the moment there’s a lot of focus on new technology and flashy experiments or running large numbers of participants on the web. It’s almost as though some researchers have forgotten the basics of the Scientific Method and constructing Psychology experiments on a computer using an experiment generator is too easy? We don’t care how researchers do this, just that that they should. A quick look at a couple of our recent papers might scare institutions and the researchers themselves into doing something? We think that funders and publishers should play a bigger role. At the moment researchers in any discipline won’t care unless there are solid consequences in terms of reduced funding or higher quality thresholds for publications. --8x---------------------------- Could millisecond timing errors in commonly used equipment be a cause of replication failure in some neuroscience studies? A reminder on millisecond timing accuracy and potential replication failure in computer-based psychology experiments: An open letter
- Share to Twitter Share to Facebook Share link to this comment
Djordje Vilimanovic • 2015-08-31 08:40 PM

So there is a 61% chance that this too can't be replicated :)
- Share to Twitter Share to Facebook Share link to this comment
phoebe moon • 2015-08-28 02:40 AM

The "Truth" Wears Off. http://www.newyorker.com/magazine/2010/12/13/the-truth-wears-off
- Share to Twitter Share to Facebook Share link to this comment

[b1] Open Science Collaboration. Science http://dx.doi.org/10.1126/science.aac4716 (2015).

[b2] Begley, C. G. & Ellis, L. M. Nature 483, 531–533 (2012)

Article
PubMed
ChemPort

[3] Article

[4] PubMed

[5] ChemPort

Replication failure

Publication bias

References

Related stories and links

From nature.com

From elsewhere

Author information

Author details

Monya Baker

Search for this author in

Comments for this thread are now closed.

12 comments Subscribe to comments

See other News & Comment articles from Nature

Wildlife in decline: Earth's vertebrates fall 58% in past four decades

Martian clouds form, a frozen ship heads home and an orangutan goes climbing

Cosmic rays may threaten space-weather satellite

World’s largest marine reserve hailed as diplomatic breakthrough

Climate change could flip Mediterranean lands to desert

Controversial head of French agricultural agency speaks out

Record-breaking common swifts fly for 10 months without landing

The art of engineering: 9 Evenings revisited

Economic history: The roots of growth

Extremophiles: Life at the deep end

HIV’s Patient Zero exonerated

Sustainability: Laying waste

The plight of young scientists

US mental-health chief: psychiatry must get serious about mathematics

Hungarian science spat, Kuwait’s DNA law and a transparency milestone

Rio fights Zika with biggest release yet of bacteria-infected mosquitoes

Young, talented and fed-up: scientists tell their stories

Let researchers try new paths

Fewer numbers, better science

Young scientists under pressure: what the data show

Social Media Box - AML

Suffering in science

Top Content - Article Page

Recent

Read

Commented

Newsletter

HIV history

Autism advance

ExoMars

US presidential race

Nature Podcast

Science jobs from nature jobs

Science events from nature events