Saturday, September 23, 2023

All Papers Relying on Rodent "Freezing Behavior" Estimations Are Junk Science

Normally we assume that when scientists do experiments, they want to measure things as accurately as possible. But that may not always be the case. There are some reasons why scientists may actually prefer to use a method that poorly measures something.  They include the following:

(1) Nowadays there exist these two very bad problems in science:  publication quotas and publication bias. Publication bias is the tendency of science journals to prefer to publish scientific papers that announce positive results showing some effect, rather than null results that fail to show any effect. Publication quotas are prevailing traditions in academia that every professor is supposed to have his name as an author on a certain number of papers by some particular point in his career. Often described under the name of "publish or perish," publication quotas are typically informal, but very real. An assistant professor may not be formally told that he has to have a certain number of papers on his name to become a full professor, but he will tend to know that his chance of advancement in academia will be very low if he does not have enough published papers on his resume. 

The combination of publication bias and publication quotas may create a strong preference for inaccurate and subjective measurement techniques. The more inaccurate and subjective a measurement technique, the greater the possibility of "see whatever you want to see," the greater the chance that the fervently desired "positive result" can be reported. 

(2)   Another very large problem in scientific research is ideological bias: the tendency of science publication to prefer papers that conform with the most popular ideas prevailing in research communities. Whenever an ideology is incorrect, it can be true that the more inaccurate and subjective a measurement technique, the greater the likelihood that the writer of a scientific paper can report some result that conforms with the ideology prevailing in his research community. 

Let us look at a case in which scientists for decades have been senselessly using a ridiculously unreliable measurement technique: the case of "freezing behavior" estimations. "Freezing behavior" estimations occur in scientific experiments involving memory. "Freezing behavior" judgments work like this:

(1) A rodent is trained to fear some particular stimulus, such as a red-colored shock plate in his cage. 

(2)  At some later time (maybe days later) the same rodent is placed in a cage that has the stimulus that previously provoked fear (such as the shock plate). 

(3) Someone (or perhaps some software) attempts to judge what percent of a certain length of time (such as 30 seconds or 60 seconds or maybe even four minutes) the rodent is immobile after being placed in the cage. Immobility of the rodent is interpreted as "freezing behavior" in which the rodent is "frozen in fear" because it remembered the fear-causing stimulus such as the shock plate. The percentage of time the rodent is immobile is interpreted as a measurement of how strongly the rodent remembers the fear stimulus. 

This is a ridiculously subjective and inaccurate way of measuring whether a rodent remembers the fear stimulus. There are numerous problems with this technique:

(1) There are two contradictory ways in which a rodent might physically respond after seeing something associated with fear: a flight response (in which the rodent attempts to escape) and a freezing response (in which the rodent freezes, not moving). It is all but impossible to disentangle which response is displayed when the rodent is presented with a fear stimulus. A rodent who remembers a fear stimulus might move around trying to escape the feared stimulus. But under the "freezing behavior" method, such movement would not be recorded as memory of the feared stimulus, even though the fear stimulus was recalled. 

(2) Rodents often have hard-to-judge movement behavior that neither seems like immobility nor fleeing behavior, and it is subjective and unreliable to judge whether such movement is or is not "freezing behavior" or immobility. 

(3) Movement of a rodent in a cage may be largely random, and not a good indication of whether the rodent is afraid and whether the rodent is recalling some fear stimulus. 

(4) Rodents encountering a fear-provoking stimulus in human homes (such as a mouse hearing a human shriek) almost never display freezing behavior, and much more commonly display fleeing behavior. I lived in a New York City apartment for many years in which I would suddenly encounter mice, maybe about 10 times a year. I never once saw a mouse freeze, but invariably saw them flee. 

(5) Freezing behavior in a rodent may  last for a mere instant, as in humans. So it may be extremely fallacious to do something such as trying to observe 30 seconds or 60 seconds or several minutes of rodent movement or non-movement, and try to judge whether fear or recall occurred  by judging a "freezing percentage" over such an interval. Almost all of that time may be random behavior having nothing to do with fear in the rodent or memory recall in the rodent. Contrary to all sensible methods, what we often seen in neuroscience papers is some technique in which someone tries to judge "freezing behavior" by judging non-movement over a length of several minutes. An example is the science paper here, in which the authors senselessly judge fear recall by estimating non-movement in a rodent over the span of four minutes. 

(6) Attempts to judge "freezing behavior" typically ignore a fact of key importance: whether the rodent avoided the stimulus the rodent was conditioned to fear. Let's imagine two cases. In case 1 a rodent put in a cage with a stimulus he was conditioned to fear (such as a shock plate)  spends most of the measured interval not moving, and then goes directly to the fear stimulus, such as stepping on the shock plate. In case 2 a rodent nervously moves around in the cage, entirely avoiding the fear stimulus such as a shock plate.  Clearly the rodent in case 2 acts like an animal who remembers the fear stimulus, and the animal in case 1 acts like an animal that does not remember the stimulus. But under the absurd method of judging fear recall by estimating "freezing behavior,"  the rodent in case 1 will be counted as better remembering the fear stimulus, because that rodent displayed more "freezing behavior."  This example shows how absurd "freezing behavior" estimations are as a measure of whether a rodent recalled something or feared something.  Obviously there's something very wrong if a technique can lead you to think that remembering rodents forget, and that forgetting rodents remembered. 

fallacy of freezing behavior estimation

How is it that memory and fear recall can reliably be measured in rodents? There are at least three techniques. One costs a little bit of money, and the other two can be done without spending much of anything. 

Reliable Way of Measuring Rodent Fear Recall #1: Measuring Heart Rate Spikes

It has been shown that when animals such as mice are exposed to fear-inducing stimuli, their heart rate dramatically spikes. According to a scientific paper, a simple air puff will cause a mouse's heart rate to increase from nearly 500 beats per minute to near 700 beats per minute. We read this: "The mean HR [heart rate] responses from the seven mice showed that HR increased significantly from the basal level of 494±27 bpm to 690±24 bpm to the first air puff (P<0.001)."  The same paper tells us that similar increases in heart rate occur when mice are dropped or subjected to a simulated earthquake by means of a little cage shaking. So rather than using the very unreliable method of trying to judge "freezing behavior" to determine how well a mouse remembered a fearful stimulus, scientists could use the reliable method of looking for sudden heart rate spikes. 

Reliable Way of Measuring Rodent Fear Recall #2: Tracking Fearful Stimulus Avoidance

The method described above has the slight drawback of requiring the purchase of rodent heart rate monitors.  But there's another method that does not have any such drawback: the method of simply recording whether a fearful stimulus was avoided. The method is shown in the diagram below. 


Using this technique, a mouse is trained to avoid a fear stimulus -- the red shock plate shown in the center of the diagram. At some later date the mouse (in a hungry state) is put into the cage. If the mouse does not remember that the shock plate will cause pain, the mouse will take the direct route to the cheese, which requires crossing over the shock plate. If the mouse does remember that the shock plate will cause pain, the mouse will take an indirect and harder route, requiring it to jump up and down a set of stairs.  This is an easy and foolproof method of testing memory recall in rodents. Here we have a nice binary result -- either the mouse touches the shock plate, or it doesn't. There's no subjective element at all. 

You could use this fear stimulus avoidance technique with a setup even simpler than the one above, a setup with no stairs. You simply put a hungry mouse in a special cage with only one route to the cheese, a route that requires walking over the shock plate. If the mouse avoids the cheese, and fails to touch the shock plate, that would count as remembering that the shock plate will shock; but any touching of the shock plate would count as forgetting that the shock plate will shock. 



A Third Way of Measuring Rodent Recall: The Morris Water Maze Test

The widely used Morris Water Maze test can be a fairly reliable way of measuring recall in rats, if the test is used in a straightforward way. The water maze consists of a circular open tank rather like a child's bathing tub, deeper than a rodent's length, with a hidden platform on one side of the tank, about an inch or two below the water surface. A rodent is placed in the tub, and has to tread water to stay alive. Eventually the rodent will discover that by swimming to the hidden platform the rodent can comfortably rest, without having to tread water.  You test the rodent's memory by exposing him to the water maze a certain number of times, until you find that the rodent immediately goes to the hidden platform.  Then later the rodent's memory can be tested by putting the rodent in the same Morris Water Maze tank, and seeing whether it quickly swims to the platform. The main drawback of the Morris Water Maze is that if something was done to a mouse to inhibit muscular skills but not memory, a mouse may fail the Morris Water Maze test even though there was no change in memory.  See the Appendix for an important caveat about using this test. 

Why Do Neuroscientists Continue to Use Unreliable "Freezing Behavior" Estimations for Judging Rodent Recall?

The methods discussed above are obviously superior to the error-prone and subjective "freezing behavior" estimation method. So why do experimental neuroscientists continue to cling to such a "freezing behavior" estimation method, using it so often? It is entirely reasonable to suspect that many neuroscientists cling to their "freezing behavior" method for the very reason that it is unreliable and subjective, allowing neuroscientists to see whatever they want to see. By clinging to unreliable "freezing behavior" estimation, neuroscientists have a better chance of being able to report some result they can call a positive result. 

A paper describing variations in how "freezing behavior" is judged reveals that no standard is being followed. The paper is entitled "Systematic Review and Methodological Considerations for the Use of Single Prolonged Stress and Fear Extinction Retention in Rodents." The paper has the section below telling us that statistical techniques to judge "freezing behavior" in rodents are "all over the map," with no standard statistical method being used:

"For example, studies using cued fear extinction retention testing with 10 cue presentations reported a variety of statistical methods to evaluate freezing during extinction retention. Within the studies evaluated, approaches have included the evaluation of freezing in individual trials, blocks of 2–4 trials, and subsets of trials separated across early and late phases of extinction retention. For example, a repeated measures analysis of variance (RMANOVA) of baseline and all 10 individual trials was used in Chen et al. (2018), while a RMANOVA was applied on 10 individual trials, without including baseline freezing, in Harada et al. (2008). Patterns of trial blocking have also been used for cued extinction retention testing across 10 trials, including blocks of 2 and 4 trials (Keller et al., 2015a). Comparisons within and across an early and late phase of testing have also been used, reflecting the secondary extinction process that occurs during extinction retention as animals are repeatedly re-exposed to the conditioned cue across the extinction retention trials. For example, an RMANOVA on trials separated into an early phase (first 5 trials) and late phase (last 5 trials) was used in Chen et al. (2018) and Chaby et al. (2019). Similarly, trials were averaged within an early and late phase and measured with separate ANOVAs (George et al., 2015). Knox et al. (2012a,b) also averaged trials within an early and late phase and compared across phases using a two factors design.

Baseline freezing, prior to the first extinction retention cue presentation, has been analyzed separately and can be increased by SPS (George et al., 2015) or not affected (Knox et al., 2012bKeller et al., 2015a). To account for potential individual differences in baseline freezing, researchers have calculated extinction indexes by subtracting baseline freezing from the average percent freezing across 10 cued extinction retention trials (Knox et al., 2012b). In humans, extinction retention indexes have been used to account for individual differences in the strength of the fear association acquired during cued fear conditioning (Milad et al., 20072009Rabinak et al., 2014McLaughlin et al., 2015) and the strength of cued extinction learning (Rabinak et al., 2014).

In contrast with the cued fear conditioning studies evaluated, some studies using contextual fear conditioning used repeated days of extinction training to assess retention across multiple exposures. In these studies, freezing was averaged within each day and analyzed with a RMANOVA or two-way ANOVA across days (Yamamoto et al., 2008Matsumoto et al., 2013Kataoka et al., 2018). Representative values for a trial day are generated using variable methodologies: the percentage of time generated using sampling over time with categorically handscoring of freezing (Kohda et al., 2007), percentage of time yielded by a continuous automated software (Harada et al., 2008), or total seconds spent freezing (Imanaka et al., 2006Iwamoto et al., 2007). Variability in data processing, trial blocking, and statistical analysis complicate meta-analysis efforts, such that it is challenging to effectively compare results of studies and generate effects size estimates despite similar methodologies."

As far as the techniques that are used to judge so-called "freezing behavior" in rodents, the techniques are "all over the map," with the widest variation between researchers. The paper tells us this:

"Another source of variability is the method for the detection of behavior during the trials (detailed in Table 1). Freezing behavior is quantified as a proxy for fear using manual scoring (36% of studies; 12/33), automated software (48% of studies; 16/33), or not specified in 5 studies (15%). Operational definitions of freezing were variable and provided in only 67% of studies (22/33), but were often explained as complete immobility except for movement necessary for respiration. Variability in freezing measurements, from the same experimental conditions, can derive from differential detection methods. For example, continuous vs. time sampling measurements, variation between scoring software, the operational definition of freezing, and the use of exclusion criteria (considerations detailed in section Recommendations for Freezing Detection and Data Analysis). Overall, 33% of studies did not state whether the freezing analysis was continuous or used a time sampling approach (11/33). Of those that did specify, 55% used continuous analysis and 45% used time sampling (12/33 and 10/33, respectively). Several software packages were used across the 33 studies evaluated: Anymaze (25%), Freezescan (14%), Dr. Rat Rodent's Behavior System (7%), Packwin 2.0 (4%), Freezeframe (4%), and Video Freeze (4%). Software packages vary in the level of validation for the detection of freezing and the number and role of automated vs. user-determined thresholds to define freezing. These features result in differential relationships between software vs. manually coded freezing behavior (Haines and Chuang, 1993Marchand et al., 2003Anagnostaras et al., 2010). Despite the high variability that can derive from software thresholds (Luyten et al., 2014), threshold settings are only occasionally reported (for example in fear conditioning following SPS). There are other software features that can also affect the concordance between freezing measure detected manually or using software, including whether background subtraction is used (Marchand et al., 2003) and the quality of the video recording (frames per second, lighting, background contrast, camera resolution, etc.; Pham et al., 2009), which were also rarely reported. These variables can be disseminated through published protocols, supplementary methods, or recorded in internal laboratory protocol documents to ensure consistency between experiments within a lab. Variability in software settings can determine whether or not group differences are detected (Luyten et al., 2014), and therefore it is difficult to assess the degree to which freezing quantification methods contribute to variability across SPS studies with the current level of detail in reporting. Meuth et al. (2013) tested the differences in freezing measurements across laboratories by providing laboratories with the same fear extinction videos to be evaluated under local conditions. They found that some discrepancies between laboratories in percent freezing detection reached 40% between observers, and discordance was high for both manual and automated freezing detection methods." 

It's very clear from the quotes above that once a neuroscience researcher has decided to use "freezing behavior" to judge fear, then he pretty much has a nice little "see whatever I want to see" situation. Since no standard protocol is being used in these estimations of so-called "freezing behavior," a neuroscientist can pretty much report exactly whatever he wants to see in regard to "freezing behavior," by just switching around the way in which "freezing behavior" is estimated, until the desired result appears. We should not make here the mistake of assuming that those using automated software for judging "freezing behavior" are getting objective results.  Most software has user-controlled options that a user can change to help him see whatever he wants to see. 

To help get reliable and reproducible results, neuroscientists doing experiments involving recall or fear recall in animals should use only a simple and reliable method for measuring fear or recall in rodents: either the measurement of heart rate spikes, or the Fear Stimulus Avoidance technique described above, or the Morris Water Maze test.  But alas, experimental neuroscientists seem to prefer to use an unreliable "see whatever you want to see" method, quite possibly because that vastly increases the opportunity for them to report "statistically significant" results or positive results rather than null results. 

What we must always remember is that the modern experimental neuroscientist is not primarily interested in producing accurate results, but is instead primarily interested in producing publishable results, defined as any result that will end up getting published in a scientific journal. The   modern experimental neuroscientist is also extremely interested in producing "citation magnet" results, defined as any results that will end up getting more paper citations.  Alas, today's neuroscientists are not judged by whether they use intelligent and accurate experimental methods. Today's neuroscientists are rather mindlessly judged by their peers on the basis of how many papers they can claim to have co-authored, and how many citations such papers have got. And so we see neuroscience papers like the one below, in which more than 100 scientists appear as the authors of a single paper, as if the main idea was just to up the paper count of as many people as possible. 

scientific paper with more than 100 authors

A simple rule should be followed about this matter: any and all papers writing up experimental research and depending upon  claims of freezing behavior by rodents should be regarded as junk science unworthy of serious attention. Trying to measure "freezing behavior" is not a reliable way of measuring memory recall or fear in rodents.  Very many of the most widely reported neuroscience studies rely on this junk method, and all such studies are junk studies.  A high use of "freezing behavior " estimation is only one of the glaring defects of neuroscience experimental research, where Questionable Research Practices are extremely common.  Other glaring procedural defects very common in neuroscience experimental research include the all-too-common use of way-too-small study group sizes, a failure to pre-register a hypothesis and methods to be used for gathering and analyzing data,  p-hacking,  a failure to follow blinding protocols, and a failure to do sample size calculations to determine how large study group sizes could be. 

You should not assume that peer review prevents bad neuroscience research from getting published.  The people who peer-review neuroscience research routinely fail to exclude poorly designed experimental research.  The peer reviewers of such research are typically neuroscientists who perform the same kind of poorly designed research themselves.  Peer reviewers senselessly follow a rule of "allow papers to be published if they resemble recent previously published papers."  When some group of scientists is following bad customs (such as we see massively in theoretical physics, theoretical phylogenetics, theoretical cosmology,  and experimental neuroscience),  such a rule completely fails to block junk research from being published. 

Postscript: The paper "To freeze or not to freeze" gives us additional reasons for disbelieving that "freezing behavior" judgments are reliable ways of measuring fear or recall in rodents.  We read that "Male and female rats respond to a fearful experience in different ways, but this was not previously taken into account in research." Below are some quotes:

"Gruene, Shansky and their colleagues – Katelyn Flick and Alexis Stefano of Northeastern, and Stephen Shea of Cold Spring Harbor Laboratories – found that instead of freezing, many female rats display a brief, high-velocity movement termed darting...Gruene et al. found that female rats performed more darts per minute than males. However, not all females dart, and not all males freeze: in the experiments approximately 40% of the females engaged in darting behavior, but only about 10% of males did so....The finding that a higher proportion of female rats dart may explain why previous studies have reported less freezing in females (e.g., Maren et al., 1994; Pryce et al., 1999)."

The paper "The Difference between Male and Female Rats in Terms of Freezing and Aversive Ultrasonic Vocalization in an Active Avoidance Test" tells us this: "We found that males were more likely to experience freezing (40%) than females (3.7%)."   Evidently male rats perform much differently than female rats in regard to freezing, but our neuroscientists very often fail to even specify which sex was used some experiment they did. 

When "freezing behavior" judgments are made, there are no standards in regard to how long a length of time an animal should be observed when recording a "freezing percentage"  (a percentage of time the animal was immobile). An experimenter can choose any length of time between 30 seconds and five minutes or more (even though it is senseless to assume rodents might "freeze in fear" for as long as a minute).  Neuroscience experiments typically fail to pre-register experimental methods, leaving experimenters to make analysis choices "on the fly." So you can imagine how things work. An experimenter might judge how much movement occurred during five minutes or ten minutes after a rodent was exposed to a fear stimulus. If a desired above-average amount of immobility (or a desired below-average amount of immobility) occurred over 30 seconds, then 30 seconds would be chosen as the interval to be used for a "freezing percentage" graph. Otherwise,  if a desired above-average amount of immobility (or a desired below-average amount of immobility) occurred over 60 seconds, then 60 seconds would be chosen as the interval to be used for a "freezing percentage" graph. Otherwise,  if a desired above-average amount of immobility (or a desired below-average amount of immobility) occurred over two minutes, then two minutes would be chosen as the interval to be used for a "freezing percentage" graph. And so on and so forth, up until five minutes or ten minutes. If the researcher still has no "more freezing" effect he can report, the researcher can always do something like report on only the last minute of a larger time length, or the last two minutes, or the last three minutes, or the last four minutes. 

And also the researcher can arbitrarily choose what time length of immobility will be counted as some "freezing" to be added to the "freezing percentage" figure.  That time length of immobility can be 1 second or 2 seconds or any number of seconds between 1 and 10.

Then there's a whole other way in which a researcher can keep trying to get the "freezing behavior difference" that he desires while trying to show some memory difference between a control group of mice and some mice that have been drugged, manipulated or modified in some way. The researcher can simply make different attempts at measuring a "freezing behavior difference," at different time intervals after the tested animals were trained to learn something. Mice trained to fear some shock plate can be tested for their recall of this fearful stimulus, using a "freezing behavior" method, one day after they were trained to fear the shock plate. If that test does not give the desired "freezing behavior" difference between those mice and a control group,  the mice can be tested again at a later time, so that there is now a 2-day gap between the learning and the testing of recall.  If that test does not give the desired "freezing behavior" difference between those mice and a control group,  the mice can be tested again at a later time, so that there is now a 3-day gap between the learning and the testing of recall. can be tried again 1 day later. And so it goes, with the researcher having up to 20 different days on which the mice can be tested. 

Because there are 20 or 30 or 50 or 100 different ways in which the data can be gathered and analyzed, each with about a 50% chance of success, the likelihood of the researcher being able to report some "higher freezing level" is almost certain, even if the tested interventions or manipulations had no real effect on memory. Such shenanigans drastically depart from good, honest, reliable experimental methods, and any researcher engaging in such shenanigans should be ashamed of himself. 

I may note there is no rule that scientists have to report all of the experiments they made. And if a scientist gets a "higher freezing level" on one day and does not get such a result while testing on another day, and if he wants to report all of his experimental results, such a result can be weaved into some story line contrived to match the results. We see an example of this in the paper here, where researchers get slightly more "freezing behavior" in manipulated mice at 7 days, without such an effect at 5 hours (Figure 2), with the authors trying to weave these inconsistent results into a narrative they contrive. 

In order to give a little bit of reassurance that such shenanigans are not occurring in the worst way, it is essential that every scientific paper providing a "freezing percentage" graph should always at least tell us what the time interval used was when such an estimation of "freezing behavior" was made. Astonishingly, most papers providing such "freezing behavior" charts fail to even specify the time interval corresponding to such charts.  So we will get again  and again charts claiming that some percentage of "freezing behavior" occurred over some time interval, but we are usually not even told what the time interval was. This is experimental science at its clumsiest and most dysfunctional. Of course, by failing to specify the time interval used, a researcher makes it easier to hide his malfeasance if he arbitrarily uses different time intervals in different places in his analysis, in order to gin up more convincing "freezing behavior" charts, or if the researcher uses some arbitrary time interval (chosen to yield more pleasing results) different from the time interval most commonly used when such "freezing behavior" judgments are made. And the more researchers fail to specify the time interval used when making such "freezing behavior" judgments, the harder it is to tell that researchers are not following any research standard, but are simply analyzing using whatever time interval leaves them with the result that produces the more convincing freezing-behavior charts. 

For more than 12 years, it has been very easy for almost anyone to create hour-long videos, and upload them to www.youtube.com. It would be very easy for any scientist  claiming "different percentages of freezing behavior" in two groups (an experimental group and a control group) to document such a claim by creating an hour-long video and uploading that video to Youtube.com, so that anyone could check by looking at a Youtube link provided in the paper. Such a video would simply show how each mouse in the experimental group responded during some two-minute or 90-second period matching the displayed "freezing behavior" chart, and also how each mouse in the control group responded during some two-minute or 90-second period matching the displayed "freezing behavior" chart. Although such videos would be very easy to make and upload to Youtube.com, we never see neuroscience papers providing such links. This is probably because those producing such papers with "Freezing %" charts do not want independent observers to be able to check on their work in making such charts, which will not hold up well to scrutiny. Claims in "big boast" science papers of greater or smaller "freezing behavior" should never be trusted unless such a Youtube.com link (or an equivalent link) is provided. Similarly, you should never trust any person today who claims the ability to levitate, if the person does not even provide a video showing this claimed ability.  

It should be crystal-clear by now: no one is reliably measuring fear or recall or memory in a paper relying on "freezing behavior" judgments, and in such a paper we should trust no claims made about fear or recall or memory in rodent subjects.

freezing behavior charts in junk neuroscience

critique of freezing behavior judgments


Appendix on the Morris Water Maze test

The Morris water maze test (MWM) may be a reliable technique for testing memory, when it is used with rats, in a straightforward way, with an adequate study group size. By "in a straightforward way," I mean doing something such as simply recording the time it took rats placed in the Morris water maze to reach the submerged platform. This time is called the "escape latency" time.  When the Morris water maze test is done in a reliable way, we will see a simple bar graph comparing this "escape latency" time for two different groups, an experimental group and a control group. That "escape latency" is simply the average time it took a rat in the group to reach the submerged platform. The graph might look like the graph below. If the study group size was large enough, this might be good evidence that the experimental group was remembering better than the control group. 



But there are many studies that use the Morris water maze test (MWM) in an objectionable way, doing analytics in a way that is not straightforward, in a way that smells like "keep torturing the data until it confesses." For example, we may see charts showing how much time rats spent in a particular quadrant of the Morris water maze. Or we may see charts plotting the exact path that particular rats traversed in the Morris water maze test.  When data analysis this complicated and arbitrary starts going on, there then occurs a plummeting of the reliability the Morris water maze (MWM) as a test of memory. Whenever you are allowed to analyze data in very many different ways, you will be able to find some desired difference between a control group and an experimental group. Finding that difference will be as easy getting a desired "heads" flip of a coin when you are free to flip the coin a dozen times. 

While the Morris water maze test can be a reliable test when it is done in a straightforward way with rats, it must be used with a big enough study group size, and very many neuroscience experimenters fail to do that. A paper notes the problem, stating this about the Morris Water Maze test (MWM):

"Many MWM experiments are reported with small group sizes. In our experience with the MWM and other water mazes, group sizes less than 10 can be unreliable and we use 15 to 20 animals per group, especially for mice, whose performance in learning and memory tests tends to be more variable than for rats. It is noteworthy that regulatory authorities require that safety studies have 20 or 25 animals per group. This number is for each of at least four groups (control and three dose levels) (Food and Drug Administration 2007; Gad 2009; Tyl and Marr 2012). Such group sizes are used by the US Environmental Protection Agency, the US Food and Drug Administration, the Organization for Economic Cooperation and Development, and Japanese and European Union regulatory agencies. Although the 3 Rs (reduce, refine, and replace) are worthwhile goals in the use of animals in research, it is not a justification to underpower experiments and run the risk of false positives, which, in the long run, cost more time, more animals, and more money to prove or disprove."

scientific paper cautions that the Morris water maze test may not work well with many strain of mice, saying this: "Neuroscientists have been warned that many strains [of mice] perform poorly on the submerged-platform water escape test task, which is better suited to rats than to mice, yet it is used widely for the study of memory in mice."  Another paper gives  a similar reason for thinking that the Morris water maze test (MWM) may only be suitable for rats, stating this: "Interestingly, when MWM data were analyzed in a large dataset of 1500 mice by factor analysis, the principle factors
affecting MWM performance in mice were noncognitive
(Lipp and Wolfer 1998).... It is important to note
that this is not the case in rats, but the fact that performance
factors are salient in mice provides an important cautionary
note when interpreting mouse MWM data."

Referring to the scientific paper here, the Wikipedia.org article on the Morris water maze test (MWM) now states the following:

"Changes in measures of Morris water navigation task performance may not necessarily reflect specific impairments in mechanisms of spatial learning or memory. The reason for a longer time spent looking for the platform, or the lack of searching in the target quadrant, may not necessarily have to do with an effect on the rat's or mouse's spatial memory, but can be due to other factors. For example, a large study of Morris water navigation task performance in mice concluded that almost half of all variance in performance scores was due to differences in thigmotaxis, the tendency of mice to stay close to the walls of the pool. About 20% of the variability was explained by differing tendencies of mice to float passively in the water until 'rescued' by the experimenter. Differences in spatial memory were only the third factor, explaining just 13% of the variation between animals' performance.[16]"

It seems the Morris water maze test is not a reliable test of memory in mice, although it might be fairly reliable when testing rats. 

No comments:

Post a Comment