Head Truth: "Brains Make Minds" Idea Flunks an Audit of a Large Brain Scan Database

Monday, April 4, 2022

"Brains Make Minds" Idea Flunks an Audit of a Large Brain Scan Database

For many years neuroscientists have been claiming important results about brains and minds, after doing brain imaging experiments using small sample sizes. Typically such claims will be based on way-too-small sample sizes smaller than 15. A new press release from the University of Minnesota Twin Cities announces results which indicate that such small-sample correlation-seeking imaging experiments are utterly unreliable. The headline of the press release is "Brain studies show thousands of participants are needed for accurate results."

There is a technique to measure the reliability of brain scans when used to make claims about supposed neural signs of cognitive activity. The technique involves measuring what is called the test-retest reliability of brain scans. The technique involves trying to determine to what extent some claimed neural sign of cognitive activity shows up both times when two different brain scans are taken of the same person.

So, to imagine a hypothetical example, suppose some claim is made that the hippocampus of some subject activated more strongly when the subject recalled something. A check can be made as to whether the same thing was seen when the same subject had his brain scanned a second time, doing the same recall task. If no such increased activation is seen on the second brain scan, we have a good reason for thinking that the claim about the first scan is unwarranted, and that the first scan has simply given a false alarm, a result of random brain fluctuations.

Conveniently "covering their tracks," the vast majority of neuroscientists fail to do a retest of subjects when doing brain scanning experiments. However, there are some large databases of brain scans that include scanning retests of many subjects. It is therefore possible to judge how well claimed neural correlations of cognitive activity tend to replicate when a second test is done of the same subject.

One such brain imaging database is the Adolescent Brain Cognitive Development Database. The database includes scans of thousands of subjects doing particular tasks such as a Monetary Incentive Delay task. a Stop Signal task and an n-back or nBack task (as described here). The database includes brain scans of more than 10,000 adolescents, and for more than 7000 of these adolescents a second set of scans were taken two years later, with the subjects performing the same tasks as in the first scan. Such a database provides an excellent platform to test whether correlations between brain states and mental activity tend to repeat when the same subjects were scanned two years later.

Such an examination is reported in the scientific paper entitled "Reliability and stability challenges in ABCD task fMRI data" by James T. Kennedy and others, which you can read here or here. The study used a measure of retest reliability called the intraclass correlation. An intraclass correlation of less than .4 is generally regarded as "poor." In the wikipedia.org article on the intraclass correlation we read the following:

"Cicchetti (1994) gives the following often quoted guidelines for interpretation for kappa or ICC inter-rater agreement measures:

Less than 0.40—poor.
Between 0.40 and 0.59—fair.
Between 0.60 and 0.74—good.
Between 0.75 and 1.00—excellent.

A different guideline is given by Koo and Li (2016):

below 0.50: poor
between 0.50 and 0.75: moderate
between 0.75 and 0.90: good
above 0.90: excellent"

The results reported in the scientific paper entitled "Reliability and stability challenges in ABCD task fMRI data" by James T. Kennedy and others were devastatingly negative. In the paper's abstract we read this:

"Reliability and stability [quantified via an intraclass correlation (ICC) that focuses on rank consistency] was poor in virtually all brain regions, with an average ICC of .078 and .054 for short (within-session) and long-term (between-session) ICCs, respectively, in regions of interest (ROIs) historically-recruited by the tasks. ICC values in ROIs did not exceed the ‘poor’ cut-off of .4, and in fact rarely exceeded .2 (only 5.9%).... Poor reliability and stability of task-fMRI, particularly in children, diminishes potential utility of fMRI data due to a drastic reduction of effect sizes and, consequently, statistical power for the detection of brain-behavior associations."

What this means is that there was extremely low level of repetition of effects between one scan on a subject and a later scan on the same subject. As mentioned above, an intraclass correlation of less than .4 or .5 is commonly described as "poor." The very low intraclass correlations reported (only .078 and .054) can be described as extremely poor or appallingly poor. In the quote below, the authors of the study describe their results as a "particularly disappointing outcome," and wonder what factors contributed to so poor an outcome. We read the following:

"Our main finding was that within-session reliability and longitudinal stability of individual differences in task-related brain activation was consistently poor for all three ABCD tasks. Data cleaning approaches like outlier removal, movement regression, and rank normalization significantly increased reliability and stability, but by a small, seemingly inconsequential amount (average change of less than .025). While the finding of poor within-session reliability and longitudinal stability in the ABCD task fMRI data did not come as a surprise, given the mounting evidence for generally lackluster reliability of task-fMRI in mostly adult samples (Elliott et al., 2020; Herting et al., 2018; Noble et al, 2021), the present estimates are far below the .397 average reliability of task-fMRI activation estimated in the meta-analysis by Elliott et al. (2020). The question then arises, what factors could contribute to this particularly disappointing outcome? "

These results are what we would expect under the idea that the brain is not the source or cause of human mental activity, and not the storage place of memories. In such a case we would expect that when scientists claimed some correlation between brain activity and mental activity after brain scanning some subjects, they would almost always be finding mere false alarms that would strongly tend to disappear when a second brain scan was made of the same subjects.

5 comments:

ShadowApril 7, 2022 at 7:55 AM
Hi mark,

Enjoyed the article, off topic question if that’s alright. I’ve been studying up on NDE literature and come across this proposed hypothesis https://www.frontiersin.org/articles/10.3389/fnhum.2013.00533/full

It seems it was proposed in 2013 but after that there’s no follow up (that I could find), so I was curious if you’ve come across anything about phosphene’s and there supposed association with nde’s?
ReplyDelete
Replies
ShadowApril 11, 2022 at 6:39 AM
Thanks for the forwarded article, it was an informative read and has given much food for thought. I definitely consider the ‘biophoton’ hypothesis the strangest hypothesis to be advanced thus far.
On another note, what are your thoughts on the DMT/ketamine hypothesis?
From what I understand they advance similarities between NDE/DMT etc, but seem to lack an appropriate mechanism for how DMT/ketamine could be the ‘cause’ of near-death experiences.
ReplyDelete
Replies
Mark MahinApril 11, 2022 at 7:22 AM
See my post below on this topic:
https://futureandcosmos.blogspot.com/2018/08/how-could-mere-microtrace-of-dmt.html
ReplyDelete
Replies
ShadowApril 13, 2022 at 9:12 AM
Wonderful article mark thanks, it would seem both the ketamine and dmt models rely on a lot of speculation surrounding the purported release of chemicals in the brain (or produced by the brain as is the case with don’t model) and there interactions with NMDA receptors etc.
my preliminary conclusions on the subject of NDE research is that it shares a lot in common with research into the origins of life, both subject have a wide variety of hypothesis and purported evidence to back them up but there doesn’t appear to be any consensus and to some degree any acknowledgement of negative results.
ReplyDelete
Replies

Add comment