Head Truth: Misleading Tricks of Those Claiming to Decode "Inner Speech"

Tuesday, August 19, 2025

Misleading Tricks of Those Claiming to Decode "Inner Speech"

You can tell when a person is engaging in muscle activity by analyzing the squiggly lines of EEG readings obtained when someone puts on his head a device containing electrodes. Muscle movements of every type (including speech) produce deviations or disturbances in the wavy lines produced by EEG devices picking up brain waves. Because different types of visual images may produce different types of muscle movements (as illustrated in the visual below), it may be possible to predict above chance which of three photos a person is shown. Different photos may produce different types of muscle movements and different durations of muscle movements. So a computer program analyzing the squiggly lines of EEG readings may score above chance, by considering blips in EEG readings that may have different characteristics when different types of photos are shown. Such an ability is no evidence that brains produce minds, but merely evidence that different visual stimuli may produce different types of reactive muscle movements.

There is no brain-related technology that allows any person or computer program to figure out what a person is thinking by looking at MRI scans of a brain or EEG electrode readings of brain waves. But there are various tricks and cheats that can be used by someone trying to persuade you that he has decoded a person's thoughts or "inner speech" by analyzing brain states or brain waves. Below are some of these cheats and tricks.

Trick #1: The leveraging of failures of follow fast-paced hard-to-follow instructions. I have noticed this sleazy trick in some neuroscience papers. It is the trick of doing an experiment that requires an experimental subject to very rapidly switch between speaking a word and merely thinking of a word. So, for example, there may be a computer program that flashes instructions like this, with the instructions appearing on the screen for the times shown below:

Say "hippopotamus" (3 seconds)

Pause (2 seconds)

Say "asparagus" (3 seconds)

Pause (2 seconds)

Think "perfect" (2 seconds)

Pause (2 seconds)

Say "principle" (3 seconds)

Pause (2 seconds)

Say "asparagus" (3 seconds)

Pause (2 seconds)

Think "inventiveness" (3 seconds)

Pause (2 seconds)

When instructions like this appear on a computer screen, with a very fast pace, and rapid switches between the type of instruction, there is a good chance that a subject will sometimes fail to follow the instructions exactly. So during some percentage of the time that the subject was supposed to be only thinking of a word, the subject may be speaking a word, in audible speech or all-but-silent speech or silent speech involving lip movement. This may allow a neuroscientist to brag about "above chance" results during intervals when supposed "inner speech" occurred. What is going on is that the instructions have been almost designed in a way to produce a fair amount of audible speech or all-but-silent speech or silent speech involving lip movement during intervals when subjects were supposed to be engaging in only mouth-motionless "inner speech." And if you are using very sick patients with speech difficulties (as the main paper discussed below did), and using a very fast rapidly-switching pace, then it is all-but-certain that a large fraction of the brain waves recorded during intervals that are supposed to be only "inner speech" will instead be audible speech, near-audible speech or mouthed speech, an effect that basically invalidates any boasts experimenters may make about decoding "inner speech."

Trick #2: Failing to prevent mouth-movement during intervals supposed to be "inner speech." There is a simple way to prevent or minimize muscle movement from the mouth during testing intervals that are supposed to be thought-only "inner speech." One way is to have a test subject wear something in his mouth designed to prevent any movement of the lips or tongue, with the subject wearing such a device during any test interval in which he is supposed to be engaging in speechless "inner speech." Another way is to make use of some specialized motion detector that will sound an alarm whenever the subject moves his lips or tongue. No such devices are used by neuroscientists doing experiments claiming to decode "inner speech." So whenever they claim that something involved only "inner speech" we should distrust such claims, and suspect that there was a lot of actual speech or muscle movement (audible or not) going on during the recorded periods of supposed "inner speech."

Trick #3: The word length cheat. I have noticed this sleazy trick in some neuroscience papers. It is the cheat of doing an experiment that attempts to predict which of a small set of words a person is thinking about, while leveraging the fact that some of the words have longer lengths than others. So, for example, in some quick-paced instructions appearing on a screen, a user may be asked to think (without speaking) one of these words: dog, chameleon, apple, hippopotamus, triangle. If the pace is fast enough, with enough tricky switches between "say this" and "think this," some little traces of muscle movement may show up in the EEG readings, even during intervals when the subject is only supposed to be speaking; and from the length of such muscle movement it may be rather easy to predict which word the user was asked to think of.

Trick #4: No exact specification of the experimental procedure. This is a very bad defect of most papers claiming to decode inner speech from brain scans or EEG readings. Such papers will typically offer some sketchy outline of the experiment that went on, without specifying the exact procedure. The rule of thumb we should follow is: regard as worthless any paper claiming successful experimental results which fails to specify in sufficient detail the exact experience that subjects underwent, in a way sufficient for someone to attempt a replication of the reported results.

Trick #5: Cherry-picking best results. Using multiple subjects and many different electrodes reading from different regions of the brain, a researcher can cherry-pick a best result from the many results (a result that might easily be obtainable by pure chance), and then try to give the impression that such a result was a typical result. Something similar would be going on if you had 20 people try to guess 50 five-digit numbers, and then had some visual graph heading bragging about "60% accuracy" with the fine print revealing that this was for guess target number 35 and guesser number 17 (when the target was 44392 and the guesser guessed 44291).

Trick #6: Leveraging data backdoors in a sneaky way. This trick goes on when some researcher claims that they got an impressive result "from brain scans" or "EEG readings" when brain scans or EEG readings were only part of the inputs used, with the success mainly coming from some data backdoor. An example is when researchers have subjects look at images obtained from the COCO image dataset. That dataset includes text annotations corresponding to each of the images, an example being that a picture of an apple may be labeled as "apple" or "fruit." So a computer program analyzing EEG readings while test subjects saw particular images can find out words corresponding to the observed image, by using the data backdoor of the text annotation corresponding to each image. With a little obfuscation and "clouding the waters," a success so unimpressive might be passed off as "mind reading" even though what is powering the success is 98% simply looking up the text annotations corresponding to the images, a feat no more impressive than looking up the definition of a word.

Trick #7: Leveraging sound inputs. Some people with speech problems have the ability to produce sounds when trying to speak, sounds that an average person is unable to understand. This may sound like someone trying to speak with his mouth filled with food. Some scientist may connect such a person to some EEG device, either one that is invasive (involving brain-implanted electrodes) or not invasive. Some computer program may then train on the person's speech while he is reading something or trying to read something. The computer may get a good idea about correlations between sounds that a human listener cannot understand, and words that a person is attempting to speak. Then the computer program may report success at "decoding" something that may be called "inner speech" or "brain states" or "brain outputs," even though the success is coming mainly from sound inputs rather than brain states. The effort may be wrongly called "brain-to-text" or a "decoding of brain speech" although such terms are inappropriate under such circumstances.

Trick #8: Leveraging phoneme or attempted phoneme EEG correlates. I noted before that muscle movements of every type (including speech) produce deviations or disturbances in the wavy lines produced by EEG devices picking up brain waves. There may be particular EEG correlates for particular phonemes or attempted phenomes that a person may make. So when someone makes the sound at the beginning of "achoo" and "apple," that may tend to produce a particular type of EEG blip; and when someone makes the sound in the middle of the words "cheese" and "sneeze," that may tend to produce some other type of EEG blip. So if you have a computer program that is trained to recognize such characteristic EEG blips, by training after someone connected to an EEG device tries to read some long body of text, that program may gain some ability to pick up lots of what a person is saying from his EEG brain wave readings. This may be described as "brain reading" although it is more accurately described as muscle movement EEG correlation reading. A program trained to recognize particular type of EEG correlates of phoneme pronunciation or attempted phoneme pronunciation may use some fancy AI "fill in the blanks" algorithm (possibly involving frequentist word-guessing or syllable guessing or phoneme guessing) to enhance some limited success it has at picking up EEG correlates of attempted syllable pronunciations. None of what I describe in this paragraph is correctly described as "decoding inner speech," although it may be described as that, particularly under some fast-pace hurry-up methodology in which a good deal of actual speech or attempted speech is occurring during two-second intervals in which someone is supposed to be only thinking of a word, because of a study design that almost guarantees there will be a large amount of this spillover "talking or trying to talk when you were supposed to only think."

Trick #9: The "as high as X percent accurate" trick. This trick is as old as the hills. You slice and dice the prediction results into something like 100 different portions, and pick the portion with the highest predictive accuracy. You then say something like "my method is up to 75% accurate," mentioning the accuracy of the most successful little portion, rather than the overall results.

Trick #10: Leveraging AI and large language models. An AI system that has trained on very many web pages and online books may be able to fill in lots of blanks in sentences, using guesswork based on word frequencies and the frequency of words used in a particular type of sentence or sentence fragment. So for example, if you have a fragment of a sentence such as "I'm hungry so __ ____ __ ____ ______," the AI system might be able to predict "I'm going to make some food" or some similar phrase as the missing part. Leveraging such AI systems, an experiment might produce some success level at "decoding inner speech" much higher than it would get without using such an AI system, particularly if some experiment uses carefully chosen test sentences of a type that allow an AI system to predict the full sentence from only half of the sentence.

The latest example of a misspeaking neuroscience paper boasting about decoding inner speech is the paper "Inner speech in motor cortex and implications for speech neuroprostheses" which you can read here. We get in the paper various boast soundbites that are not backed up by anything reported in the paper. The paper starts out by making the false claim that "Attempted, inner, and perceived speech have a shared representation in motor cortex." Speech is not represented in the cortex or any other part of the brain. The beginning of the paper contains quite a few untrue statements about the previous results of researchers, statements that are untrue because of various defects in the results published by such researchers.

Many of today's neuroscientists misspeak like crazy when they use the words "represent," "representations," "decode" and "decoding." Misstatements by neuroscientists using these words are extremely abundant. As a general rule you should never trust a neuroscientist using the words "represent," "representations," "decode" and "decoding." When it comes to "representations" neuroscientists are often guilty of very bad pareidolia and noise-mining, which involves a kind of seeing things that are not really there. Nowadays it easy for a scientist to kind of see things that are not there, by using "keep torturing the data until it confesses" tactics that often involve shady manipulations of data by dubious custom-written computer programs. We also should have a default distrust over any neuroscientist statement made by a neuroscientist about a decoding percentage accuracy. Such statements are typically extremely dubious, involving very dubious or easy-to-discredit calculation methods, or claims in which no calculation method is ever adequately specified. Often in a paper some impressive "decoding accuracy" figure is stated, but never justified.

Our first reason for distrusting the "Inner speech in motor cortex and implications for speech neuroprostheses" paper comes when we read that it involved only four subjects. As a general rule, correlation-seeking neuroscience experiments have no value unless they use a study group size of at least 15 or 20 subjects; and usually the required study group size is much larger.

Another strong reason for distrusting the "Inner speech in motor cortex and implications for speech neuroprostheses" paper comes when we consider the endangerment-of-the-sickest procedure that its researchers engaged in. The study involves invasively inserting microelectrodes into the brains of four very sick patients. This was not done for any medical benefit for these patients. The very sick patients had diseases such as the muscle-wasting disease ALS, sometimes called Lou Gehrig's disease. The insertion of microelectrodes into brains involves very serious medical hazards, and when used on very sick patients it may worsen their difficulties. In this case the very sick patients were used as "experimental guinea pigs," without any medical benefits coming to them from the medical risks they were enduring.

Whenever such shady business is going on, we should all-the-more tend to distrust any statements made by the people engaging in the shady business. We should nowhere be giving "the benefit of the doubt" when such researchers make grand boasts, but demand the clearest evidence that such boasts are justified.

In the case of the paper "Inner speech in motor cortex and implications for speech neuroprostheses" no such clear evidence is given. The paper fails to give any very exact specification of the experimental procedures it followed. But from its Supplemental Information document we should have the strongest suspicion that some of the tricks listed above were used.

When asked to produce "inner speech," instructions were given that seem designed to produce muscle movement rather than pure thought. According to Table S1, the instructions were these:""

"Imagine mouthing the word. Focus on what your mouth, tongue, lips, jaw and throat would be doing and how they would feel."
"Imagine uttering the word aloud. Focus on the sound you would be producing."
"Imagine hearing me (or someone’s voice you know well) say the word, focus on the sound of my (their) voice."

The same table tells us that instructions such as these were alternated with instructions like these:

"Say the word aloud (to the best of your ability.'
"Mouth the word as if you were mouthing to someone across a room, without sound."

How fast were these instructions switched? We cannot tell exactly, because the paper authors have failed to describe their exact test procedure in a way that would allow anyone to reproduce it exactly. But from Table S4 in the Supplemental Information, we have every reason to suspect that the authors were guilty of Trick #1 described above. We have some table suggesting that very fast, rapidly switching time intervals were used. The table makes it sound as if the subjects were required to do some super-hurried affair in which they had to very rapidly switch between "speak the word" instructions and "think the word" instructions.

Now let us look at some of the unwarranted and dubious statements made in the paper:

(1) The caption of Figure 1F refers to a "T16-i6v Decoding Accuracy of 92.1%." This gives an impression of high accuracy, until you figure out that this referring to only a single subject (subject T16) and a single electrode location (corresponding to the name i6V). The figure seems to have been cherry-picked from Figure 1E, which shows a grid of 63 percentages ranging from 11 to 97.9. We may note how misleading this is. A casual viewer of the paper, looking at the figures, may get the idea that some high decoding accuracy was achieved, when no such thing occurred. Something shady as this should deepen our distrust of this paper. We have no decent explanation of how these numbers in Figure 1E were obtained, and the whole grid should be regarded with suspicion. What little explanation is given (some mention of a "Gaussian naive Bayes" with a 500 millisecond window) is something that does not inspire confidence. Figure 1D graphs a suspiciously hurried-up affair that seems to involve a trick like described in Trick #1 above.

(2) The careful critical reader of the paper will tend to suspect that what is going on is noise-mining and cherry-picking from electrode data corresponding to many different reading locations in the brain. Each of the four patients had multiple electrodes inserted into their brains. So when Figure 1F refers to a "T16-i6v Decoding Accuracy of 92.1%," this is referring to only a single subject (subject T16) and a single electrode location (corresponding to the name i6V). It is not at all the average accuracy of decoding attempts using this subject. Do we have here any reason for thinking that the results are better than chance, when you consider the results from each patient's electrodes? There seems to be no such reason.

Each of the four subjects had about 6 electrode arrays in their brains. So with 24 or more possible areas to check, it is hardly surprising that some researcher might be able to report a relatively high "decoding accuracy" involving one of those areas and one of these subjects. Similarly, if I ask 24 people to pick the score and teams of the next Super Bowl, I will probably have one that I can claim as having a high predictive accuracy, even if mere chance is involved.

We also have some insinuations in the paper ""Inner speech in motor cortex and implications for speech neuroprostheses" that some relatively high accuracy was achieved in experiments involving a 125,000 word vocabulary. None of the claims should be trusted, because the procedure involved is not described in adequate detail. We have a link to a video showing a woman (subject T16) seeing a computer screen that displays some text. The video says, "In this task the target sentence appears at the top of the screen, and the inner speech BCI [brain computer interface] is shown below, generated in real time." First, the computer displays the sentence "That's probably a good idea." Then we see below that a line slowly appearing; "That's probably a good idea."

We should treat with the greatest skepticism any claim that this is a "decoding" of what the very sick subject was thinking. Some computer program already knew the target sentence. We don't know what tricks are going on for the computer program to go from this known target to a supposed "decoding" matching the target, because the testing procedure and programming is nowhere decently described in the paper or its Supplemental Information. Were the sentences randomly selected from some very large set such as a group of 100,000 sentences? Or were the sentences only a very limited number of sentences that some AI program had trained on, which would tend to create a vastly higher chance of success? We don't know, because the authors haven't explained their method decently. We have no idea of what kind of tricks and cheats may have helped produced this impressive-looking result. Part of what is going on seems to be AI prediction based on phrase frequencies in sentences starting a particular way. An AI system can predict "a good idea" as one of the most likely endings of a sentence beginning "That's probably..."

Seeing the video you might assume that there was some "Chinese wall" affair in which one part of the software knew that the target sentence was "That's probably a good idea," and some other part of the software (a decoding part) did not know that this sentence was the target, and figured out the target from brain waves. But you should not make any such assumption, because it is never made explicitly in the paper; and what was going on when you see that video clip is never adequately explained. The paper authors have given us reasons for distrusting their work, and our default attitude should be distrust, rather than making generous assumptions the authors are trying to suggest.

The video is attempting to give us the impression that some randomly generated sentence (created from a vocabulary list of 125,000 words) is being decoded by brain signal analysis. But nowhere in the text do we actually have a claim that any of the sentences were randomly generated from such a vocabulary list; and nowhere in the text do we have an assertion that the sentence was randomly chosen from a very large set of sentences such as a set of 100,000 sentences. For all we know there may be only a very small number of sentences, each of which was previously given to the subjects. So the impressive-looking "decoding" might actually be something a thousand times less impressive, something easily obtainable by a few statistical or programing tricks, even if it is utterly impossible to decode what word someone is thinking of by gathering EEG signals from someone whose mouth is immobile.

Referring to the subject T16 shown in their little video clip, the paper says, "T16 had online retraining only for the 125,000-word vocabulary evaluation blocks, in which the cued sentences were used as ground truth to retrain the model, but only after those sentences had been decoded online." Although obscure, that sentence should be enough to make us suspect that the video involving subject T16 is just some smoke-and-mirrors affair, not any real decoding of what someone was thinking from the person's brain states or brain waves.

As the paper lacks adequate documentation on what was going on, we should have no confidence in the results. The authors of the paper help create a fog of mystery about what they did by having the paper document about five different experiments, none of which is clearly and consistently named, and none of which is very well documented in regard to the exact procedure followed. This is not how to do a persuasive experiment showing an ability to decode "inner speech" from brain waves. Instead, do a single experiment in which everything that went on is so well-documented that someone else might be able to reproduce the result.

Whenever a completely silent person's lips and tongue are motionless, and he is not moving any of his muscles, it is impossible to decode what a person is thinking or imagining (or a sentence he is trying to speak) using only brain scans produced by MRI machines or the brain waves picked up by EEG readings. But using a variety of misleading tricks such as the ones listed above and many other possible misleading tricks, researchers can create misleading impressions that they are making progress at a task that is impossible.

Head Truth

Tuesday, August 19, 2025

Misleading Tricks of Those Claiming to Decode "Inner Speech"

No comments:

Post a Comment