It cannot be said that the reliability of an experimental neuroscience paper is directly proportional to the reliability of measurement techniques it uses. There are various reasons why you might have an utterly unreliable neuroscience experiment that used reliable measurement techniques, such as the reason that the experiment may have used too-small a study group size to have produced a reliable result. But it can be said (roughly speaking) that the unreliability of an experimental neuroscience paper is directly proportional to the unreliability of any measurement techniques upon which the experiment depends. That is why when examining neuroscience experiments, we should always pay extremely close attention to whether the experiment used reliable measurement techniques.
For decades very many neuroscience researchers have been senselessly using a ridiculously unreliable measurement technique: the case of "freezing behavior" estimations. "Freezing behavior" estimations occur in scientific experiments involving memory. "Freezing behavior" judgments work like this:
(1) A rodent is trained to fear some particular stimulus, such as a red-colored shock plate in his cage.
(2) At some later time (maybe days later) the same rodent is placed in a cage that has the stimulus that previously provoked fear (such as the shock plate).
(3) Someone (or perhaps some software) attempts to judge what percent of a certain length of time (such as 30 seconds or 60 seconds or maybe even four minutes) the rodent is immobile after being placed in the cage. Immobility of the rodent is interpreted as "freezing behavior" in which the rodent is "frozen in fear" because it remembered the fear-causing stimulus such as the shock plate. The percentage of time the rodent is immobile is interpreted as a measurement of how strongly the rodent remembers the fear stimulus.
This is a ridiculously subjective and inaccurate way of measuring whether a rodent remembers the fear stimulus. There are numerous problems with this technique, which I explain in my post " All Papers Relying on Rodent 'Freezing Behavior' Estimations Are Junk Science." The technique is so unreliable that all experimental neuroscience studies relying on such a technique should be dismissed as worthless.
There are other techniques used in neuroscience experiments. There are various types of maze techniques used. A mouse may be trained to find some food that requires traversing a particular maze. It is easy to time exactly how long the mouse takes to find the food, after a series of training trials. Then some modification might be made to the mouse (such as giving it an injection or removing part of its brain). The mouse can be put again in the maze, and a measurement can be made of how long it takes to find the food. It if took much longer to find the food, this might be evidence of a reduction in memory or learned knowledge.
This seems like a pretty reliable technique. But there's another much less reliable technique called the "free exploratory paradigm." When this technique is used, a mouse is given some compartments to explore. The mouse is first only allowed to explore half or two-thirds of the compartments. Then later the mouse is given the freedom to explore all of the compartments, including previously unexplored compartments. Some attempt is made to measure what percent of the time the mouse spends in the never-previously-explored compartments compared to the previously explored compartments.
A figure in the paper "The free-exploratory paradigm as a model of trait anxiety in rats: Test–retest reliability" shows how this method might be used. First the mouse is allowed to explore only the three compartments on the right, with access to the left compartments blocked. Then the mouse is allowed to access all of the compartments, and some attempt is made to judge whether the mouse spent more time in the left compartments than the right.
The assumption is made that this can be some kind of test of memory. The experiment designers seem to have assumed that when a mouse goes to compartments already visited, the mouse will kind of recognize those compartments, and be less likely to explore them, perhaps having some kind of "I need not explore something I've already explored" experience. This is a very dubious assumption.
It's as if the designers of this apparatus were assuming that a mouse is thinking something like this:
"My, my, these experimenter guys have given me six compartments to explore! Well, there's no point in exploring any of the three compartments I already explored. Been there, done that. So I guess I'll spend more time exploring the compartments I have not been to. I'm sure there will just be exactly the same stuff in the three compartments I've already explored, and that I need not spend any time re-exploring them to check whether there's something new in them."
The assumptions behind this experimental design seem very dubious. It is not at all clear that a mouse would have any such tendency to recognize previous compartments the mouse had been in, and to think that such previously visited compartments were less worthy of exploration.
The best way to test whether such assumptions are correct is by experimentation. Without doing anything to modify a mouse's memory, you can simply test normal mice, and see whether they are less likely to spend time in compartments they previously visited. Figure 2 of the paper "The free-exploratory paradigm as a model of trait anxiety in rats: Test–retest reliability" gives us a good graph testing how reliable this "free-exploratory paradigm" is, using a 10-minute observation period. The test involved 30 mice:
The figure suggests that this "free-exploratory paradigm" is not a very reliable technique for judging whether mice remembered something. In the first test, there was no tendency of the mice to spend more time exploring the unexplored compartments. In the second test there was only a slightly greater tendency of the mice to explore the previously unexplored compartments. Overall the mice spent only 55 percent of their time in the previously unexplored compartments, versus 45 percent of their time in the previously explored compartments.
What is the relevance of this? It means that any neuroscience experiment that is based on this "free-exploratory paradigm" and fails to use a very large study group size is worthless. An example of a worthless study based on such a technique is the study hailed by a press release this year, one with a headline of "Boosting brain’s waste removal system improves memory in old mice." No good evidence for any such thing was produced.
The press release is promoting a study called "Meningeal lymphatics-microglia axis regulates synaptic physiology" which you can read here. That study all hinges upon an attempt to measure recall or recognition by mice, using something called a Y-maze, which consists of 3 compartments, the overall structure being shaped like the letter Y. The Y-maze (not actually a maze) is an implementation of the unreliable "free-exploratory paradigm" measurement technique described above. The study used a study group size of only 17 mice. But since the "free-exploratory paradigm" requires study group sizes much larger than 17 to provide any compelling evidence for anything, the study utterly fails as reliable evidence.
Using a binomial probability calculator, we can compute the chance of getting a false alarm, using a measurement technique like the "free-exploratory paradigm." Figure 1C of the paper "Meningeal lymphatics-microglia axis regulates synaptic physiology" shows only a very slight difference between the "free-exploratory paradigm" performance for the modified mice and the unmodified mice:
Given this "free-exploratory paradigm" that is something like only 55% effective in measuring recognition memory, the probability of getting results like this by chance (even if the experimental intervention has no real effect) is roughly the same as what we see in the calculation below:
Produced using the calculator here
The chance of getting purely by chance a result like the result reported in the paper is roughly the 1 in 3 shown in the bottom line above. When we consider publication bias and the "file drawer" effect, getting a result like the reported result means nothing. Why? Because it would be merely necessary to try the experiment a few times before you could report a success, even if the experimental intervention had no effectiveness whatsoever.
We should never be persuaded by results like this, because what could easily be happening is something like this:
- Team 1 at some college tries this intervention, seeing no effect. Realizing null results are hard to get published, Team 1 files its results in its file drawer.
- Team 2 at some other college tries this intervention, seeing no effect. Realizing null results are hard to get published, Team 2 files its results in its file drawer.
- Team 3 tries this intervention, seeing a "statistically significant" effect of a type you would get in maybe 1 time in three tries. Team 3 submits its positive result for publication, and gets a paper published.
No comments:
Post a Comment