What is the significance of: 37 Labs, 1,000 Babies, and Psychology's Replication Reckoning?

Large-scale collaborative replication projects in psychology are delivering mixed results - often failing to confirm the very hypotheses they set out to test.

37 Labs, 1,000 Babies, and Psychology's Replication Reckoning

Only one-third to one-half of results published in three psychology journals in 2008 could be replicated — according to the Reproducibility Project: Psychology, a landmark effort led by Brian Nosek at the Center for Open Science. That number alone should make any researcher pause.

Now a fresh wave of large-scale collaborative projects is trying to answer whether psychology can salvage its credibility. The most ambitious: a coalition of 37 research groups across 18 countries testing the same hypothesis with over 1,000 babies.

The Baby Social Evaluation Experiment That Sparked a Mega-Replication

Kiley Hamlin's 2007 study — the one where babies watch a puppet show and overwhelmingly choose the helpful blue square over the hindering yellow triangle — was highly cited. It claimed the ability to evaluate others' behavior develops before speech. Over the next decade, dozens of attempted replications found conflicting results.

Frustrated, Hamlin organized a massive collaboration in 2017 to run the exact same experiment with 37 labs and more than 1,000 infant participants. The logic: a sample this large would settle the debate.

What 37 Labs and 1,000 Babies Actually Found

The results have been trickling in — and they haven't always backed the original hypothesis. Michael Frank, a developmental psychologist at Stanford, noted that the field's reproducibility crisis drew huge press coverage about how “psychology was garbage.” Small sample sizes were a major culprit, distorting results or producing conclusions that applied only to limited groups.

Going big was the obvious fix. But big doesn't guarantee confirmation. Kelsey Lucca of Arizona State University, a co-lead on the study, argues the extra rigor is worth the logistical pain: “Joining forces by combining groups of subjects across labs solves this problem by giving us the statistical power to test important research questions.”

Big Team Science Beyond Babies — Dogs, Fish, and the SCORE Project

Psychology isn't alone. Researchers are now running massive cross-lab replications on cognition in dogs, fish, and flamingoes. The SCORE project, published this April and again involving Nosek, assessed 164 papers across business, economics, education, political science, psychology, and social science. Only 49% could be replicated independently.

SCORE went further than most: it also tested whether results held when the data were reanalyzed different ways, and whether the original data could reproduce the published findings using the original code.

Why Replicability Is Harder Than Assumed

Brian Nosek put it bluntly: “Establishing replicability is a lot harder than people assumed.” The Many Labs studies, for instance, found that 10 out of 13 classic psychology effects could be successfully replicated — but that still leaves a substantial failure rate.

These projects aren't just debunking old results. They're forcing the field to build a new standard of evidence — one where a single flashy study won't cut it, and where armies of babies, dogs, and data analysts become the new normal.

Source: Can an army of babies and dogs rescue psychology from its reproducibility crisis?
Domain: nature.com