Nature | News

Over half of psychology studies fail reproducibility test

Largest replication study to date casts doubt on many published positive results.

Article tools

Rights & Permissions

Brian Nosek's team set out to replicate scores of studies.

Don’t trust everything you read in the psychology literature. In fact, two thirds of it should probably be distrusted.

In the biggest project of its kind, Brian Nosek, a social psychologist and head of the Center for Open Science in Charlottesville, Virginia, and 269 co-authors repeated work reported in 98 original papers from three psychology journals, to see if they independently came up with the same results. 

The studies they took on ranged from whether expressing insecurities perpetuates them to differences in how children and adults respond to fear stimuli, to effective ways to teach arithmetic.

According to the replicators' qualitative assessments, as previously reported by Natureonly 39 of the 100 replication attempts were successful. (There were 100 completed replication attempts on the 98 papers, as in two cases replication efforts were duplicated by separate teams.)  But whether a replication attempt is considered successful is not straightforward. Today in Science, the team report the multiple different measures they used to answer this question1.

The 39% figure derives from the team's subjective assessments of success or failure (see graphic, 'Reliability test'). Another method assessed whether a statistically significant effect could be found, and produced an even bleaker result. Whereas 97% of the original studies found a significant effect, only 36% of replication studies found significant results. The team also found that the average size of the effects found in the replicated studies was only half that reported in the original studies.

There is no way of knowing whether any individual paper is true or false from this work, says Nosek. Either the original or the replication work could be flawed, or crucial differences between the two might be unappreciated. Overall, however, the project points to widespread publication of work that does not stand up to scrutiny.

Although Nosek is quick to say that most resources should be funnelled towards new research, he suggests that a mere 3% of scientific funding devoted to replication could make a big difference. The current amount, he says, is near-zero.

Replication failure

The work is part of the Reproducibility Project, launched in 2011 amid high-profile reports of fraud and faulty statistical analysis that led to an identity crisis in psychology.

John Ioannidis, an epidemiologist at Stanford University in California, says that the true replication-failure rate could exceed 80%, even higher than Nosek's study suggests. This is because the Reproducibility Project targeted work in highly respected journals, the original scientists worked closely with the replicators, and replicating teams generally opted for papers employing relatively easy methods — all things that should have made replication easier.

But, he adds, “We can really use it to improve the situation rather than just lament the situation. The mere fact that that collaboration happened at such a large scale suggests that scientists are willing to move in the direction of improving.”

The work published in Science is different from previous papers on replication because the team actually replicated such a large swathe of experiments, says Andrew Gelman, a statistician at Columbia University in New York. In the past, some researchers dismissed indications of widespread problems because they involved small replication efforts or were based on statistical simulations.

But they will have a harder time shrugging off the latest study, says Gelman. “This is empirical evidence, not a theoretical argument. The value of this project is that hopefully people will be less confident about their claims.”

Publication bias

The point, says Nosek, is not to critique individual papers but to gauge just how much bias drives publication in psychology. For instance, boring but accurate studies may never get published, or researchers may achieve intriguing results less by documenting true effects than by hitting the statistical jackpot; finding a significant result by sheer luck or trying various analytical methods until something pans out.

Nosek believes that other scientific fields are likely to have much in common with psychology. One analysis found that only 6 of 53 high-profile papers in cancer biology could be reproduced2 and a related reproducibility project in cancer biology is currently under way. The incentives to find results worthy of high-profile publications are very strong in all fields, and can spur people to lose objectivity. “If this occurs on a broad scale, then the published literature may be more beautiful than reality," says Nosek.

The results published today should spark a broader debate about optimal scientific practice and publishing, says Betsy Levy Paluck, a social psychologist at Princeton University in New Jersey. “It says we don't know the balance between innovation and replication.”

The fact that the study was published in a prestigious journal will encourage further scholarship, she says, and shows that now “replication is being promoted as a responsible and interesting line of enquiry”.

Journal name:
Nature
DOI:
doi:10.1038/nature.2015.18248

References

  1. Open Science Collaboration. Science http://dx.doi.org/10.1126/science.aac4716 (2015).

  2. Begley, C. G. & Ellis, L. M. Nature 483, 531533 (2012)

For the best commenting experience, please login or register as a user and agree to our Community Guidelines. You will be re-directed back to this page where you will see comments updating in real-time and have the ability to recommend comments to other users.

Comments for this thread are now closed.

Comments

12 comments Subscribe to comments

  1. Avatar for Marin Panovic
    Marin Panovic
    Dr. Nosek work definitely goes to 61% result in his research before reading his methodology and hypothesis, no great scientist, no average or no mentally retarded scientist believes that her or his word is final, so theoretically Dr. Nosek can prove 100% research unreliable, or unconservative or Dr. Nosek can prove that we live in year 7525 of Byzantine calendar, that research is 100% reliable, no believer doubts, in science on the other hand the question is important, if the first answer is incorrect we'll get correct answer later, there is no correct answer to question not made, what is he trying to say is that he is white anglo saxon protestant man and those who want to prove him evolutionary not superior shouldn't even dare
  2. Avatar for jack guy
    jack guy
    The study has problems. Read this response by Dr. Jenny Davis. http://thesocietypages.org/cyborgology/2015/09/08/the-reproducibility-projects-fatal-design-flaw/
  3. Avatar for Boris Shmagin
    Boris Shmagin
    The coordinate systems are the point to start and then discuss reproducibility. Mathematics, technology and natural sciences have different coordinate systems. Mathematics has the most logical and reproducible cases. They are abstract and exist as cultural events. Technology creates sophisticated objects, reproducibility of which is a goal and the difference in their properties (errors) might be very small. This is not the case for natural object like human. This topic was special considered for natural object like river watershed: https://www.researchgate.net/publication/268334171_Modeling_the_Nature_System_Analysis_for_Knowledge__its_Uncertainty https://www.researchgate.net/publication/264555209_Hydrology_Modeling_an_Uncertainty
  4. Avatar for Peter MetaSkeptic
    Peter MetaSkeptic
    reading your comment, I can't help myself thinking about Sokal & Bricmont's book "Intellectual Impostures".
  5. Avatar for Boris Shmagin
    Boris Shmagin
    Peter I put my name because my comment based of my results
  6. Avatar for Peter MetaSkeptic
    Peter MetaSkeptic
    Putting its own name means that you're ready to defend your view and I respect that. However it doesn't mean that I have to agree with your point of view. Obvious statement, isn't it. we won't settle our argument here. That's the pitfall of comments. I wish you the best in your research. Sincerely, Peter.
  7. This comment was deleted.

  8. Avatar for Peter MetaSkeptic
    Peter MetaSkeptic
    Argumentum ad hominem. What a surprise! You could have ask me why I found the lack of clarity of the comment above misleading, but that option has not crossed your mind. Clear expression of ideas, concepts, theories, solutions, problems, ... is required in science and most scientists are trying to do just that, because there is a link with intellectual honesty.
  9. Avatar for Anna Neumann
    Anna Neumann
    "But contrary to the implication of the Reproducibility Project, there is no replication crisis in psychology. The “crisis” may simply be the result of a misunderstanding of what science is." Dr. Lisa Feldman Barrett offers a sound response to said "crisis" in a NY Times op-ed this week http://www.nytimes.com/2015/09/01/opinion/psychology-is-not-in-crisis.html?_r=1
  10. Avatar for Peter MetaSkeptic
    Peter MetaSkeptic
    It reminds me of an old philosophical trick. When reality isn't on your side, redefine it until the new reality you invent can cope with your theories. As Richard Feynman said what we forgot to teach explicitly in science is a kind of utter honesty
  11. Avatar for Richard Plant
    Richard Plant
    We’ve been saying this for years in relation to Psychology experiments administered using computers. Put simply researchers may not be doing what they think they are doing when they present a stimulus; synchronise with other equipment, e.g. fMRI, EEG, eye trackers; and record Reaction Times. We’d like to see researchers actually publish timing validation data with their papers to prove the figures they quote are accurate. The majority of researchers simply don’t have any insight into this or how their equipment really works and that’s before you get onto the statistics! Some training could certainly help here. At the moment there’s a lot of focus on new technology and flashy experiments or running large numbers of participants on the web. It’s almost as though some researchers have forgotten the basics of the Scientific Method and constructing Psychology experiments on a computer using an experiment generator is too easy? We don’t care how researchers do this, just that that they should. A quick look at a couple of our recent papers might scare institutions and the researchers themselves into doing something? We think that funders and publishers should play a bigger role. At the moment researchers in any discipline won’t care unless there are solid consequences in terms of reduced funding or higher quality thresholds for publications. --8x---------------------------- Could millisecond timing errors in commonly used equipment be a cause of replication failure in some neuroscience studies? A reminder on millisecond timing accuracy and potential replication failure in computer-based psychology experiments: An open letter
  12. Avatar for Djordje Vilimanovic
    Djordje Vilimanovic
    So there is a 61% chance that this too can't be replicated :)
  13. Avatar for phoebe moon
    phoebe moon
    The "Truth" Wears Off. http://www.newyorker.com/magazine/2010/12/13/the-truth-wears-off

Suffering in science

young-researchers

Young, talented and fed-up: scientists tell their stories

Scientists starting labs say that they are under historically high pressure to publish, secure funding and earn permanent positions — leaving precious little time for actual research.

Newsletter

The best science news from Nature and beyond, direct to your inbox every day.

HIV history

HIV-patient-zero

HIV’s Patient Zero exonerated

A study clarifies when HIV entered the United States and dispels the myth that one man instigated the AIDS epidemic in North America.

Autism advance

autism-children

Autism study finds early intervention has lasting effects

Some autism symptoms reduced in children six years after their parents receive communications training.

ExoMars

lost-mars-lander

Computing glitch may have doomed Mars lander

Researchers sift through clues after Schiaparelli crash in hopes of averting mistakes in 2020 mission.

US presidential race

Trump-supporters

The scientists who support Donald Trump

Science policy fades into background for many who back Republican candidate in US presidential race.

Nature Podcast

new-pod-red

Listen

This week, the challenges facing young scientists, pseudo-pseudogenes, and the history of HIV in the US.

Science jobs from naturejobs

Science events from natureevents