Wednesday, May 4, 2022

EEG Studies Fail to Provide Robust Evidence That Brains Think or Retrieve Memories

To try to provide evidence for their claims that memories are stored in the brain and that brains produce mental phenomena such as thinking and imagination, neuroscientists look for what they call neural correlates of mental activity. A neural correlate of a mental activity would be some sign that a brain acts differently or looks differently when someone does a particular type of mental activity such as thinking or recalling. 

The most common way that a neuroscientist will look for a neural correlate of a mental activity is to image someone's brain while he is doing some mental activity, typically using an fMRI scanner.  In my post "The Brain Shows No Sign of Working Harder During Thinking or Recall," I discussed the failure of such studies to provide robust evidence that brains produce thinking or recall.  The following are tips for analyzing such studies:

(1) Search for the phrase "percent signal change" to quickly find out how much of a difference was found during some mental activity. A large fraction of all fMRI-based neural correlate studies will use such a phrase. 

(2) Find out the sample size used, and whether a sample size calculation was used to determine whether the sample size was adequate. The vast majority of fMRI-based neural correlate studies fail to provide a sample size calculation, and the vast majority of such studies use way-too-small study group sizes, so small that they are not reliable evidence for anything. 

What you will typically find is that such studies will show only extremely small changes in brain activity, involving changes of smaller than one half of one percent.  Such variations of only about  one part in 200 or smaller are no robust evidence that brains produce thinking or produce recall. We would expect to get variations of such a size given random moment-to-moment fluctuations in brains, variations that would occur even if a person was not thinking and not recalling anything. And the fact that the vast majority of fMRI-based neural correlate studies use way-too-small study group sizes means that such studies are not robust evidence of anything.  As discussed here, a recent large study was announced with a headline of "Brain studies show thousands of participants are needed for accurate results," but a typical fMRI-based neural correlate study will not even use dozens of participants. 

But there is an entirely different way in which neuroscientists can look for neural correlates of memory recall and thinking. Rather than using big fMRI machines to scan the brain, a neuroscientist can hook up brains to electroencephalography machines (EEG machines) that read electrical activity of the brain.  To produce such readings, many different electrodes will be attached to the heads of subjects who are being tested. The output is not an image of the brain, but a reading showing lines that go up and down.  A neuroscientist can study such lines, looking for some neural correlate of thinking or recall that shows up as a difference in a wiggly line. 

Scientists studying such EEG outputs are looking for what they call an event-related potential or ERP.  In theory an ERP is some EEG pattern that might be repeated whenever some mental event occurs such as recognition or recall or concentration. In the literature an ERP is typically described as some blip occurring over less than a second. Figure 5 of the paper here gives us a "heat map" of various claimed ERP effects relating to cognition. The claimed effects have various names listed on the right side of the heat-map, names such as N400 and CDA (standing for contralateral delay activity). 

What typically goes on is a cherry-picking affair.  A neuroscientist will typically use a type of EEG device with 128 electrodes, each of which is attached to a different part of the head.  After the device records neural activity,  there will be 128 different readings, each from a different part of the head.  Each reading will be some long wavy line.  Imagine a paper scroll about three inches high and 100 meters long, with a wavy line stretching from beginning to end, and you'll have a rough idea of the output from any electrode.  Neuroscientists will not typically show us some graph showing the statistical average of all of these lines. Instead, they will be free to choose any group of electrodes, to try to show some correlation effect.  

Imagine you are a neuroscientist. Did you fail to get any correlation effect from averaging the outputs from electrodes 98, 99, 100 and 101? Then you can just keep playing around with electrode combinations until you get something that looks like an effect. For example, maybe you'll get something that looks like an effect if you average the results of electrodes 34, 35, 37 and 38.  If the studies were properly designed, using a pre-registration in which an exact methods description was published before data gathering, such dubious "slice and dice until you get a desired result" techniques would not be possible. But we almost never see any such pre-registration in these EEG neural correlate studies. Also, there's no rule that you cannot cherry-pick two or three electrodes that were not adjacent. 

So, for example, in the study here in Figure 3 we have a diagram showing two graphs of nice-looking ERP effects. The caption tells us the first graph is from electrode 65, and that the second graph is from electrode 91. But Figure 2 shows that 128 electrodes were used, and that electrode 65 is on the other side of the head, nowhere close to electrode 91.  Our authors have apparently cherry-picked the results from 128 electrodes, looking for the results that would best show the desired effect.  

A scientific paper about the shortfalls of studies looking for these ERP effects tells us the following:

"An example of this issue is described in a recent paper by Luck and Gaspelin (2017), who demonstrated how 'researcher degrees of freedom' could influence statistical analysis of ERP data. ERP recordings typically employ dozens of electrodes and result in hundreds of time points, which results in an almost unlimited variety of possible data analysis approaches, and, consequently, in the probability of a false significant finding approaching certainty."

Many such studies have been done, but they have failed to produce any robust evidence that human brains produce memory recall or thinking. Let us look at some of these studies, and the results that have been claimed. I will use the "heat map" in Figure 5 of the paper here to select the best-reported claimed ERP effects for cognitive activity. According to that figure, the best-reported ERP effects relating to cognition are:

(1) A CDA or contralateral delay effect having something to do with memory;

(2) an FN400/N400 effect (also called an "ERP old/new effect) having something to do with recognition;

(3) an N170 effect having something to do with categorization;

(4) a P100 effect  (also called a P1 effect) having something to do with attention.

It is claimed that an "ERP old/new effect" (apparently the same or similar to an FN400/N400 effect) is some EEG sign of recognition.  Looking at the  papers attempting to show this effect, we see nothing that looks very impressive. The claim is that when you have people look at some list of words that includes words they were asked to memorize and words they were not asked to memorize,  that for only about a fifth of a second some type of brain wave looks slightly different when that wave is read from the parietal region of the brain. 

No robust evidence has been provided for such an effect, because the study group sizes used in the studies claiming such an effect are too small. Even if such a fraction of a second effect was observed, it can be explained without assuming that a memory has been retrieved from the brain.  When somebody recognizes something, there can be a kind of "aha" effect in which muscular responses differ very slightly.  For example, after recognizing a face in the crowd, a person's facial expressions can be different than when encountering a stranger, with the difference lasting only an instant. Such a difference could easily be the explanation for some marginal fraction-of-a-second difference showing up in a reading of brain waves. 

In one paper I read claiming to get this fraction of a second "ERP old/new effect," the instructions were for subjects to click an "Old" button if they recognized a word, and a "New" button if they did not. The instructions stated that the "Old" button should only be clicked if the subject was sure he had seen the word before. With such instructions, there easily could be a kind of momentary pausing effect when people thought they recognized a word, during which they were wondering whether they were sure about seeing the word before.  Such a muscular pausing could be the cause of this fleeting "ERP old/new effect," with the effect having nothing to do with a difference in brain activity during recognition. 

This "ERP old/new effect is apparently the same (or involves or is related to) something called the N400 response. A paper described it like this:

"The N400 is a negative-going wave peaking at about 400 ms, whose amplitude is larger after presentation of a stimulus whose probability of occurrence is low within its semantic context (Kutas & Federmeier, 2011). For example 'He spread the warm bread with socks' would elicit a larger N400 than 'He spread the warm bread with butter” (Kutas & Hillyard, 1980).' "

This is another alleged neural correlate of cognitive activity that can easily be explained purely by muscle activity having nothing to do with the mind. The person presented with some crazy sentence may have a different muscle response, perhaps a look of bemusement on his face, or a kind of "huh?" look on his face.  Since the reported N400 response only involves a fraction of a second difference, we can't tell whether evidence is being picked up of brains thinking, or merely evidence of a tiny-bit different muscle response. 

A meta-analysis of studies about this claimed N400 response tells us that the average number of subjects used is only about 15.  Is such a sample size large enough? It is not, judging from the paper here. That paper is devoted to estimating how large a study group size would be needed to detect a particular ERP effect, one similar to the claimed N400 response and the claimed "ERP old/new effect." The paper tells us that to get a fairly good 80% statistical power would require at least "30– 50 clean trials with a sample of 25 subjects." 

There's another claimed ERP effect called the contralateral delay effect or CDA. The effect is claimed to occur as a fraction-of-a-second blip when people are shown screens having colored circles  or colored squared, and asked to identify whether a later screen matches the previous screen. Figure 1 of the paper here shows the type of screens shown.  The visual below shows the kind of screens shown, and how long the inputs were shown.

After taking EEG recording of brain waves of people during such an activity, scientists have claimed that there is some distinctive blip that shows up (lasting only a fraction of a second), something they call a contralateral delay effect or CDA. It has been claimed that such an effect is a correlate of working memory.  But since the alleged effect is extremely short-lived, it provides no evidence that brains store memories. What is showing up could simply be related to vision or to some color persistence effect by which a perceived color will hang around in the mind or brain for a second or two. 

It is well-known that there is something called an "afterimage," in which you can see something after you stopped looking at it.  For example, the web page here has a photo of Amy Whitehouse that is strangely colored. Look at the dot at the center of the photo for 30 seconds, and then look to the blank white area to the right of the photo. You will then see a ghostly afterimage of Amy Whitehouse. Whatever that type of effect is, it isn't memory.  It's just a "lingering of perception" thing.  The claimed CDA effect may merely be picking up that type of short-term thing, not something related to a brain storage or retrieval of memories. 

The N170 effect is some ERP effect supposedly produced when someone is shown a picture of a face. Referring to a mere fraction of a second, the wikipedia.org article on the effect claims that this alleged effects only lasts "130-200 msec after stimulus presentation." Figure 1 of the paper here has a diagram similar to the schematic diagram below, with the black line representing the response from seeing faces, and the gray line representing the response from seeing objects that are not faces.


This meta-analysis tells us that most of the faces used in studies of the N170 effect have involved emotional faces. The faces shown usually had expressions such as fear, disgust or joy. You can easily explain the fraction-of-a-second blip shown without imagining that viewing faces involves some recognition activity by the brain, and that all that is being picked up is a slight physiological response in regard to emotional stimulus. Studies of the N170 effect do not rule out some scenario like this:

(1) You see a face with an emotional expression, and your mind or soul (not your brain) recognizes the emotion. 

(2) Seeing emotion on someone's face produces a slight physiological response, which shows up as a fleeting blip in brain waves. 

The P100 effect (also called a P1 effect) is also some claimed small-fraction-of-a-second effect supposedly occurring for about 50 milliseconds when a person engages in visual selective attention, such as looking at only the left part of a screen. Eye muscles behave differently when you focus on only one side of a screen. Since such an instantaneous effect can easily be explained in terms of muscle activity involving the eyes, it provides no good evidence that brains are producing mental attention.  

Nothing we have discussed provides any good evidence that brains produce thinking, that brains store memories, or that brains retrieve memories. What kind of test can we imagine that would be a good test of such claims? The test might go something like this:

(1) Subjects wearing EEG electrodes on their head would be asked to look at photos displayed on a computer screen, with each photo shown for five seconds.  Most of the photos would be photos of people who were not famous and could not be recognized. One third of the photos would be photos of famous people with neutral expressions, none of whom were scary or threatening.  A computer program would assure a random shuffling of the photos. 

(2) Subjects would be asked to remain motionless and expressionless. Subjects would be told to simply say in their mind (without speech)  "Go" if they recognized the face, and "No" if they did not. 

(3) Attempts would be made from reading brain waves to determine whether there was any correlation between the perception of recognized faces and the perception of faces that were not recognized. 

Such a test would fail. No robust evidence would be found for a neural correlate of recognition. 

I used the "heat map" in Figure 5 of the paper here to select the best-reported claimed ERP effects for cognitive activity. It is interesting what is not reported in that heat map. According to the map it seems:

(1) There are no strong ERP/EEG effects for learning. 

(2) There are no strong ERP/EEG effects for decision making. 

(3) There are no strong ERP/EEG effects for prediction. 

(4) There are no strong ERP/EEG effects for executive function. 

(5) There are no strong ERP/EEG effects for perception.

(5) There are no strong ERP/EEG effects for speech.

Overall, EEG studies fail to provide robust evidence that thinking or decisions or memory retrieval or memory storage occurs because of the brain. The shape-seeking scientists eagerly looking for these slight, fleeting blip effects in EEG lines can be compared to people eagerly scanning the clouds looking for shapes that resemble animals, to back up some belief that the ghosts of dead animals live in the sky. 

The sample sizes used in these EEG/ERP studies are generally way too small to provide a robust evidence for a real effect. The headline of a news release of an important recent study is "Brain studies show thousands of participants are needed for accurate results." But these EEG/ERP studies typically involve only about 15 subjects per experiment.  A huge defect calling into question the reliability of all such studies is that the researcher is free to scan the results from 120 electrodes, and cherry-pick the output from whatever few electrodes he finds most shows some sub-second effect that is being eagerly sought, doing additional cherry picking that involves looking for some one-second slice of time in which the effect will show the most. This is a recipe for "conjuring phantoms." Given such complete freedom to scan data looking for some fleeting blip in wavy lines, it is easy to find almost any imaginary effect you might be hoping to find. In general, the fleeting ERP blips that are found can be explained as brain involvement in muscle activity and physiological activity, without postulating that brains are the source of thinking and memory. 

No comments:

Post a Comment