Head Truth: Irredeemable: Reproducibility and Power Size in Neuroscience Are Very Bad, and Not Getting Any Better

Sunday, September 28, 2025

Irredeemable: Reproducibility and Power Size in Neuroscience Are Very Bad, and Not Getting Any Better

A recent study offers some encouraging news about psychology research. The paper is entitled "Increasing Sample Sizes in Psychology Over Time." The paper reports this:

"We collected data from 3176 studies across six journals over three years. Results show a significant increase in sample sizes over time (b=44.83, t(6.25)=4.48, p=.004, 95%CI[25.23,64.43]), with median sample sizes being 40 in 1995, 56.5 in 2006, and 122.5 in 2019. This growth appears to be a response to the credibility crisis....The increase in sample sizes is a promising development for the replicability and credibility of psychological science."

The credibility crisis referred to is the widely reported reproducibility crisis in fields such as psychology and neuroscience. For decades it has been reported that experimental studies in psychology and neuroscience tend to be unreliable and poorly reproducible, largely because the sample sizes used were way too small. This was commonly called a "reproducibility crisis in psychology," although it was very much a reproducibility crisis in both psychology and neuroscience. A tendency to produce studies with too-small sample sizes was just as prevalent in neuroscience as psychology.

Psychology experiments typically involve humans, and advances in internet technology may have been a factor helping to lead to increased study group sizes in psychology. Decades ago a scientist might have found it necessary to recruit subjects to come into some laboratory where an experiment can be done. But now there are online platforms that allow people to sign up to be subjects in psychology experiments, while being paid for their efforts. This provides a very large pool of potential test subjects. A psychologist can now run experiments using subjects from across the USA or even multiple countries, by designing some experiment that subjects can participate in over the internet, while the subjects stay in the comfort of their homes.

But while there may have been an increase in study group sizes used in psychology experiments, there has apparently been no such increase in the field of neuroscience. How could you honestly describe the state of experimental neuroscience? You might honestly describe it as an irredeemable cesspool consisting mostly of junk science studies that continue to have the same old fatal defects such as the use of way-too-small study group sizes. Well-designed studies in cognitive neuroscience seem to be in the minority, and are outnumbered by junk science studies guilty of very bad Questionable Research Practices.

Scientific studies that use small sample sizes are typically unreliable, and often present false alarms, suggesting a causal relation when there is none. Such small sample sizes are particularly common in neuroscience studies, which often require expensive brain scans, not the type of thing that can be inexpensively done with many subjects. In 2013 the leading science journal Nature published a paper entitled "Power failure: why small sample size undermines the reliability of neuroscience." There is something called statistical power that is related to the chance of a study producing a false alarm. The Nature paper found that the statistical power of the average neuroscience study is between 8% and 31%. With such a low statistical power, false alarms and false causal suggestions will be very common.

A scientific study with a statistical power of 50% is one that will have about a 50% chance of being successfully reproduced when someone attempts to reproduce it. Even when a statistical power of 50% is reached, the statistical power is not high enough for robust evidence to be claimed. In order to be robust evidence for an effect, a study much reach a higher statistical power such as 80%. When that power is reached, there is about an 80% chance that an attempt to reproduce the results will be successful.

The Nature paper said, "It is possible that false positives heavily contaminate the neuroscience literature."

An article on this important Nature paper states the following:

"The group discovered that neuroscience as a field is tremendously underpowered, meaning that most experiments are too small to be likely to find the subtle effects being looked for and the effects that are found are far more likely to be false positives than previously thought. It is likely that many theories that were previously thought to be robust might be far weaker than previously imagined."

Scientific American reported on the paper with a headline of "New Study: Neuroscience Gets an 'F' for Reliability."

So, for example, when some neuroscience paper suggests that some part of your brain controls or mediates some mental activity, there is a large chance that may simply be a false positive. As this paper makes clear, the more comparisons a study makes, the larger a chance for a false positive. The study has an example: if you test whether jelly beans cause acne, you'll probably get a negative result, but if your sample size is small, and you test 30 different colors of jelly bean, you'll probably be able to say something like "there's a possible link between green jelly beans and acne" -- simply because the more types of comparisons, the larger the chance of a false positive. So when a neuroscientist tries to look for some part of your brain that causes some mental activity, and makes 30 different comparisons using 30 different brain regions, with a small sample size, he'll probably come up with some link he can report as "such and such a region of the brain is related to this activity." But there will be a high chance this is simply a false positive.

The 2013 "Power Failure" paper discussed above was widely discussed in the neuroscience field, but a 2017 paper indicated that little or nothing had been done to fix the problem. Referring to an issue of the Nature Neuroscience journal, the author states, "Here I reproduce the statements regarding sample size from all 15 papers published in the August 2016 issue, and find that all of them except one essentially confess they are probably statistically underpowered," which is what happens when too small a sample size is used.

A 2017 study entitled "Effect size and statistical power in the rodent fear conditioning literature -- A systematic review" looked at what percentage of 410 experiments used the standard of 15 animals per study group (needed for a moderately compelling statistical power of 50 percent). The study found that only 12 percent of the experiments met such a standard. What this basically means is that 88 percent of the experiments had low statistical power, and are not compelling evidence for anything.

The 2017 scientific paper "Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature" contains some analysis and graphs suggesting that neuroscience is less reliable than psychology. Below is a quote from the paper:

"With specific respect to functional magnetic resonance imaging (fMRI), a recent analysis of 1,484 resting state fMRI data sets have shown empirically that the most popular statistical analysis methods for group analysis are inadequate and may generate up to 70% false positive results in null data. This result alone questions the published outcomes and interpretations of thousands of fMRI papers. Similar conclusions have been reached by the analysis of the outcome of an open international tractography challenge, which found that diffusion-weighted magnetic resonance imaging reconstructions of white matter pathways are dominated by false positive outcomes Hence, provided that here we conclude that FRP [false report probability] is very high even when only considering low power and a general bias parameter (i.e., assuming that the statistical procedures used were computationally optimal and correct), FRP is actually likely to be even higher in cognitive neuroscience than our formal analyses suggest.

The paper draws a shocking conclusion that most published neuroscience results are false. The paper states the following: "In all, the combination of low power, selective reporting, and other biases and errors that have been well documented suggest that high FRP [false report probability] can be expected in cognitive neuroscience and psychology. For example, if we consider the recent estimate of 13:1 H0:H1 odds, then FRP [false report probability] exceeds 50% even in the absence of bias." The paper says of the neuroscience literature, "False report probability is likely to exceed 50% for the whole literature."

In June of 2025 I searched on Google Scholar, trying to find some paper reporting on an improvement of sample sizes in neuroscience research. I could find no such paper. The sample sizes used in neuroscience research are very bad, and are not getting any better. Today's neuroscience research is a cesspool of dysfunction and misleading claims. There are no signs that it is improving its horribly dysfunctional ways.

Why does this situation persist? There are two main reasons: economics and ideology.

The economic explanation for bad science practices is explained rather well in the paper "The Natural Selection of Bad Science" by Paul E. Smaldino and Richard McElreath. In that paper we read this:

"Poor research design and data analysis encourage false-positive findings. Such poor methods persist despite perennial calls for improvement, suggesting that they result from something more than just misunderstanding. The persistence of poor methods results partly from incentives that favour them, leading to the natural selection of bad science. This dynamic requires no conscious strategizing—no deliberate cheating nor loafing—by scientists, only that publication is a principal factor for career advancement. Some normative methods of analysis have almost certainly been selected to further publication instead of discovery....We first present a 60-year meta-analysis of statistical power in the behavioural sciences and show that power has not improved despite repeated demonstrations of the necessity of increasing power. To demonstrate the logical consequences of structural incentives, we then present a dynamic model of scientific communities in which competing laboratories investigate novel or previously published hypotheses using culturally transmitted research methods. As in the real world, successful labs produce more ‘progeny,’ such that their methods are more often copied and their students are more likely to start labs of their own. Selection for high output leads to poorer methods and increasingly high false discovery rates."

The paper has a shocking confession by a scientist who has worked on search committees searching for scientists to be hired. The scientist states this:

"I’ve been on a number of search committees. I don’t remember anybody looking at anybody’s papers. Number and IF [impact factor] of pubs are what counts."

This is a description of an economic ecosystem in which what determines a scientist's career advancement is not the quality and reliability of the papers he has published, but the mere quantity of such papers, and how many citations such papers are getting.

The paper ("The natural selection of bad science") states this: "In fields such as psychology, neuroscience and medicine, practices that increase false discoveries remain not only common, but normative." In this context "normative" means "more the rule than the exception." The paper states, "Some of the most powerful incentives in contemporary science actively encourage, reward and propagate poor research methods and abuse of statistical procedures." Later the paper gives us some insight on the economics that help to increase the likelihood of scientists producing lots of low-quality research papers:

"If researchers are rewarded for publications and positive results are generally both easier to publish and more prestigious than negative results, then researchers who can obtain more positive results—whatever their truth value—will have an advantage. ...One way to better ensure that a positive result corresponds to a true effect is to make sure one’s hypotheses have firm theoretical grounding and that one’s experimental design is sufficiently well powered. However, this route takes effort and is likely to slow down the rate of production. An alternative way to obtain positive results is to employ techniques, purposefully or not, that drive up the rate of false positives. Such methods have the dual advantage of generating output at higher rates than more rigorous work, while simultaneously being more likely to generate publishable results. Although sometimes replication efforts can reveal poorly designed studies and irreproducible results, this is more the exception than the rule. For example, it has been estimated that less than 1% of all psychological research is ever replicated and failed replications are often disputed. Moreover, even firmly discredited research is often cited by scholars unaware of the discreditation. Thus, once a false discovery is published, it can permanently contribute to the metrics used to assess the researchers who produced it....Campbell’s Law, stated in this paper’s epigraph, implies that if researchers are incentivized to increase the number of papers published, they will modify their methods to produce the largest possible number of publishable results rather than the most rigorous investigations."

What the paper is suggesting is that junk science is strongly incentivized in today's science research ecosystem. A scientist is more likely to succeed in academia if he produces a high quantity of low-quality research papers than if he produces a lower quality of high-quality research. There are several online sources that keep track of the number of papers that a scientist wrote or co-wrote, and the number of citations such papers got. There are no online sources that keep track of the quality and reliability of the papers that such a scientist produced. In such an environment, a scientist will be more likely to get ahead if he produces many low-quality papers rather than a smaller number of papers that are more reliable and truthful in the results reported.

The economic motivations of badly behaving neuroscientists and similar bad actors are sketched in my diagram below, and the post here explaining the diagram. At the top left corner is the starting point of "quick and dirty" experimental designs with way too few subjects. The diagram charts how various types of people in various industries benefit from such malpractice.

Another huge explanatory factor that helps explain the massive persistence of junk neuroscience studies is ideology. What we should never forget is that neuroscientists are members of a belief community. That belief community is dedicated to promoting various dubious belief dogmas such as the dogma that the brain is the source of the human mind, and the dogma that the brain is the storage place of human memories. So in many cases junk science studies that a peer reviewer or an editor would normally be ashamed to approve for publication will be approved for publication, because the study appears to support some dogma or narrative that is cherished by members of the neuroscientist belief community.

The main beliefs of the neuroscientist belief community are false beliefs. Because of innumerable reasons discussed on this blog, there is no credibility in the claim that the brain is the source of the human mind, and there is no credibility in the claim that the brain is a storage place of human memories. When the beliefs of a belief community are true, the community does not need to rely on studies involving bad science practices or bad scholarly practices. But when the beliefs of a belief community are false, that belief community may need to keep producing studies involving bad science practices or bad scholarly practices. That way the belief community can try to maintain an illusion that the evidence is favoring its cherished beliefs.

Head Truth

Sunday, September 28, 2025

Irredeemable: Reproducibility and Power Size in Neuroscience Are Very Bad, and Not Getting Any Better

No comments:

Post a Comment