Saturday, June 24, 2023

Why Most Correlation-Fishing Experimental Neuroscience Is Worthless

At the Myths of Vision Science blog, written by a vision scientist, there is a post quoting quite a few juicy tidbits in which neuroscientists speak candidly about how they are stumbling about in the dark and following poor methods. We hear of neuroscientists calling their work BS that doesn't replicate. The author states, " As Mehler and Kording (2018) have discussed, despite the intrinsic inability of post hoc sample correlations to generalize due to massive confounding, neuroscience practitioners continue to employ language in their publications improperly implying causality."  In another post, the scientist author explains the situation more clearly, stating this: 

Neuroscience, as it is practiced today, is a pseudoscience, largely because it relies on post hoc correlation-fishing....As previously detailed, practitioners simply record some neural activity within a particular time frame; describe some events going on in the lab during the same time frame; then fish around for correlations between the events and the 'data' collected. Correlations, of course, will always be found. Even if, instead of neural recordings and 'stimuli' or 'tasks' we simply used two sets of random numbers, we would find correlations, simply due to chance. What’s more, the bigger the dataset, the more chance correlations we’ll turn out (Calude & Longo (2016)). So this type of exercise will always yield 'results;' and since all we’re called on to do is count and correlate, there’s no way we can fail. Maybe some of our correlations are 'true,' i.e. represent reliable associations; but we have no way of knowing; and in the case of complex systems, it’s extremely unlikely. It’s akin to flipping a coin a number of times, recording the results, and making fancy algorithms linking e.g. the third throw with the sixth, and hundredth, or describing some involved pattern between odd and even throws, etc. The possible constructs, or 'models' we could concoct are endless. But if you repeat the flips, your results will certainly be different, and your algorithms invalid...As Konrad Kording has admitted, practitioners get around the non-replication problem simply by avoiding doing replications.”

Here is an example. Suppose neuroscientist Joe wants to show that some particular region of the brain is activated more strongly than normal when people recall an old memory. Joe has 10 people undergo brain scans, and at a particular point in time Joe asks them to recall some memory from their childhood. Later Joe scrutinizes the brain scans of the people. He is looking for some tiny hundredth or thousandth of the brain that shows a little more activity when the recall occurred. Since brain areas have random variations in activity from moment to moment, differing in activity by a hundredth or a two hundredth from one minute to the next, it will be rather easy for Joe to find some hundredth or thousandth of the brain that he can declare as showing greater activity when the subjects recalled their old memories. 

Joe is engaging in correlation fishing, what is sometimes called data mining. If Joe merely reports a difference in some brain area of a hundredth or a two-hundredth, he has provided no real evidence that this brain area is more active when memory recall occurs.  Purely by chance we would expect one such hundredth or thousandth of the brain to show a hundredth or a two-hundredth more activity at any particular moment, purely because of chance variations. 

There are several conventions or tendencies in the world of experimental neuroscience that aid and abet Joe in this particular piece of junk science he is producing. 

(1) A lack of pre-registration. Pre-registration is when a scientist commits himself to testing one exact hypothesis, and also spells out exactly how data will be gathered and analyzed, before any data is gathered. It is generally recognized that pre-registration greatly reduces the amount of junk science. If, for example, Joe had to pick some particular hundredth of the brain for analysis, and test the hypothesis that this particular hundredth of the brain is more active during memory recall, it would be much less likely that Joe would produce a false alarm result coming from mere correlation fishing.  But, sadly, pre-registration is rarely practiced in experimental neuroscience. Once some neuroscientist has gathered data, he is free to check for correlations in 1001 different places, using a hundred and one different analysis pipelines, each a different way of analyzing the data. It will therefore be easy for some spurious correlation to be found, one that is not a real sign of a causal relation. 

(2) Small study group sizes, and failing tests put in file drawers (no guaranteed publication). If you limit yourself to small study group sizes, it's always easier to find spurious correlations that do not involve causal relations. For example, let's suppose you are looking for a correlation between birth year and death year, or birth month and death month, or birth day of week and death day or week, or any of those correlations occurring between a father and a son or a mother and a son. You won't find any such thing examining data on the lives of 1000 children and their parents, but it wouldn't be too hard to find such a correlation if you merely need to show such a correlation within a group of ten people. You could try using data from 10 or 20 people, and if you don't find anything, you could just keep trying, using a different set of 10 or 20 people. Before long you would be able to report finding such a correlation, even though it involves no causal relation. 

Sadly, this state of affairs matches what goes on in experimental neuroscience. It is very, very common for neuroscientists to publish papers based on small study group sizes such as those involving only 10 or 15 subjects. Also, very few studies occur as "registered reports" in which there is guaranteed publication. Neuroscientists are aware of what is called publication bias, meaning the tendency of journals to not publish results that are negative. So suppose someone like neuroscientist Joe can find no correlation between brain activity and recall activity, after testing with 10 subjects. He can just "file drawer" his study, and start over using another 10 subjects.  He can keep doing this several times, until he has some marginal correlation to report, reporting on only the subjects in his most recent iteration of his experiment.

(3) Weak variations in brain activity regarded as adequate evidence.  Neuroscientist Joe couldn't get away with his shady correlation fishing if there were standards such as a standard of requiring a 1% difference in brain activity as evidence of correlation.  But there are no such standards or conventions.  Many a neuroscientist has published papers reporting correlations involving differences in brain activity of no more than 1 part in 200 or 1 part in 500. Such reports are almost all false alarms. 

(4) Easy-to-obtain ".05" statistical significance as a standard for publication.  Somehow there arose in experimental science a convention that if you could show something has a "statistical significance" of .05, then that's good enough for publication in a science paper.  This is a very loose and weak standard that is all too easy to reach. Roughly speaking, with such a standard, anything is regarded as "statistically significant" as long as you would expect it to show up by chance only 1 time in 20 or less.  But when neuroscientists have not committed themselves (by pre-registration) to testing one particular hypothesis in one particular way, they are free to try 101 different ways to analyze their data looking for correlations. It's all too easy to find something that can be reported as "statistically significant." Even if you fail after fifty or a hundred attempts, you can just "file drawer" your study, start over, and you'll probably be able to report some "statistically significant" correlation on Version 2 or Version 3 of your experiment. In his book "The Cult of Statistical Significance," economist Stephen Thomas Zilliac laments this habit of judging science papers as being acceptable if they reach .05 statistical significance.  He says that science took a giant wrong turn by adopting such a convention.  He says, "Statistical significance is neither necessary nor sufficient for a scientific result." 

torture the data until it confesses
This can be done when there's no pre-registration

Underpowered studies are those with a statistical power of under 50%. A scientific paper says this about the practice of accepting p=0.05 as a "good enough" mark for publication of science papers:

"If you use p=0.05 to suggest that you have made a discovery, you will be wrong at least 30% of the time. If, as is often the case, experiments are underpowered, you will be wrong most of the time....Button et al. [9] said, 'We optimistically estimate the median statistical power of studies in the neuroscience field to be between about 8% and about 31%' ".

(5) Phony visuals are allowed in correlation-fishing papers. It should be a standard in experimental neuroscience that any paper will be rejected if it misleadingly has a visual giving people the impression that a difference in brain activity was greater than it was. Unfortunately, no such standard exists. Correlation-fishing studies routinely include "lying with colors" visuals that dishonestly depict differences of only 1 part in 200 or 1 part in 500, making them look like much greater differences such as 1 part in 10. See here for how this type of visual deception occurs. 

At the end of the long quote above was the sentence, "As Konrad Kording has admitted, practitioners get around the non-replication problem simply by avoiding doing replications." That's right. As scientist Randall J. Ellis stated in 2022, "There are no completed large-scale replication projects focused on neuroscience."

Involving only very slight reported differences in brain activity, correlation-fishing experimental neuroscience studies do nothing to show that the brain is the source of the human mind or the storage place of human memories. Search for the phrase "percent signal change" and you will find that almost all of such studies are reporting changes of only about 1 part in 200 or smaller. In the rare case when a signal change of 1% is reported, it is usually because of head movements, which are a large source of false alarms in such studies. A person being brain scanned and not perfectly following instructions to keep his head motionless can cause a brain scan blip that is reported as a region activation.  A paper states, "The signal derived from functional MRI (fMRI) can also be greatly perturbed by motion; two detailed reports by Power and colleagues describe the complex and variable manner by which different types of motion can impact fMRI acquisitions and increase the proportion of spurious correlations across the brain." Another paper says this:

"A general relationship between head motion and changes in BOLD signal across the brain can be seen in every subject examined in this paper (N=119 in four cohorts)....Any and all movement tends to increase the amplitude of rs-fcMRI signal changes."

Scientists use a variety of types of "motion scrubbing" to try and get rid of the effects of head movements during brain scans, and there is no standard for such data massaging; it's "roll your own." The paper notes that "motion scrubbing tends to decrease many short-range correlations, and to increase many medium- to long-range correlations. " We can imagine how it is for some scientist fishing for correlations between mental activity and brain activity in brain scans. If he is dissatisfied with the size of the correlation reported, he can just tweak his "motion scrubbing" technique to easily get some more correlation that can be reported. 

No comments:

Post a Comment