Wednesday, August 27, 2025

A Recent Study Suggests Very Many Biologists Are "AI Cheating"

 It is huge mistake to rely on AI tools such as ChatGPT or Gemini when dealing with any controversial topic. Such tools make use of computer systems that have no real understanding of anything. The answers they give are produced through a combination of various complicated methods.  The main way in which such AI tools get their "smarts" is by stealing text written by human authors. 

A corporation creating such a system starts out by creating a very massive "question and answer" database consisting of hundreds of millions or billions of entries. A web crawling and book crawling system could look for text passages in any of these forms:

  • A phrase or sentence ending with a question mark, followed by some lines of text. 
  • A header beginning with the words "How" or "Why" and followed by some lines of text (for example, a header of  "How the Allies Expelled the Nazis from France" followed by an explanation). 
  • A header not beginning with the words "How" or "Why" and not ending with a question mark, but followed by some lines that can be combined with the header to make a question and answer (for example, a header of "The Death of Abraham Lincoln," along with a description, which could be stored as a question "How did Abraham Lincoln die?" and an answer).
  • A header written in the form of a request or an imperative, and some lines following such a header (for example a header of "write a program that parses a test line and says 'you mentioned a fruit' whenever the person mentioned  a fruit" would be stored so that the header was converted to a question of "how do you write a program" and the solution stored as the answer. 

Crawling the entire Internet and vast online libraries of books such as www.archive.org and Google Books, the corporation can create a database of hundreds of millions or possibly even billions of questions and answers. In many cases the database would have multiple answers to the same question. But there could be some algorithm that would handle such diversity.  The system might give whichever type of answer was the most popular. Or it might choose one answer at random. Or it might give an answer giving multiple answers, adding text such as "Some people say..." or "It is generally believed" and "Some people say." Included in this question and answer database would be the answer to almost every riddle ever posed. So suppose someone asked the system a tricky riddle such as "which timepiece has the most moving parts?" The system might instantly answer "an hourglass." This would not occur by the system doing anything like thinking. The system would simply be retrieving an answer to that question it had already stored. And when you asked the system to write a program in Python that lists all prime numbers between 20,000 and 30,000, the system might simply find a closest match stored in its vast database of questions and answers, and massage the answer by doing some search and replace. 

With such a system there is a big "plagiarism problem." A large fraction of the answers are plagiarized from materials protected by copyright. The system would presumably "cover its tracks" by refusing to provide the sources of its answers. There could also be various types of merging and search-and-replace that would make it hard to track down cases where the system was using plagiarism.  There are all kinds of programmatic ways that text can be massaged to make it harder to detect that plagiarized text was not an original composition. 

There are very many other methods that such an AI system could use to be able to quickly provide answers.  The systems probably include an army of utility programs that can be utilized to calculate answers to various mathematical questions, programming questions and puzzle questions.  Probably the systems make use of general-knowledge relational databases that have been filled up by servers traversing the billions of web pages and millions of books that are online.  Data stored in a relational database can be queried very conveniently by use of the powerful SQL language. 


An article on The Guardian is entitled "How thousands of ‘overworked, underpaid’ humans train Google’s AI to seem smart." We read this:

"Thousands of humans lend their intelligence to teach chatbots the right responses across domains as varied as medicine, architecture and astrophysics, correcting mistakes and steering away from harmful outputs A great deal of attention has been paid to the workers who label the data that is used to train artificial intelligence. There is, however, another corps of workers, including Sawyer, working day and night to moderate the output of AI, ensuring that chatbots’ billions of users see only safe and appropriate responses....“ 'AI isn’t magic; it’s a pyramid scheme of human labor,'  said Adio Dinika, a researcher at the Distributed AI Research Institute based in Bremen, Germany. 'These raters are the middle rung: invisible, essential and expendable.' ”

People misunderstand the inputs involved in so-called artificial intelligence. One stream of inputs is human writing, which is continuously gathered by AI systems that crawl the Internet to grab content. Another stream of inputs is provided by thousands of human employees, continually working to steer the outputs of so-called artificial intelligence, so that such outputs sound intelligent. The equation is really this:

Computer programming + data processing + billions of pages of human writings + continuous output of thousands of AI-steering human workers = so-called "artificial intelligence"

What are the disadvantages of using such AI tools? For one thing, they often give answers that are dead wrong, wrong in the worst kind of way.  For example, ask ChatGPT whether DNA stores a specification for building a human body, and you will get the dead-wrong answer that DNA does  store such a thing. No such specification exists in DNA, which does not specify how to build any visible thing.  The only thing that DNA specifies is very low-level chemical information such as how to build microscopic protein molecules. DNA contains no information about visible anatomy. 

How did ChatGPT end up giving us an answer so wrong on this very important topic?  The reason is that its answer was not obtained by any actual reasoning process, but by web-crawling, frequency counting and source-ranking.  Every time its web-crawling came across someone attempting an answer to whether DNA stores a specification for making a body,  that attempted answer was added to the system.  With any arrangement like this, whenever there is a preponderance of false answers online to a particular question, the AI system will end up giving a false answer. So if 90% of the people who address online the question of "are those from Madagascar bad people," then if the AI system is asked "are those from Madagascar bad people" it will answer "Yes," even if there is no good basis for such a claim.  For a discussion of why it is that authorities started repeating false claims about what is in DNA, see my post here.  The same post has a list of about 25 quotes from scientists and doctors stating that DNA is not a specification for making a human body, and is not any such thing as a blueprint, recipe or program for making a human body. 

Because such truthful statements are apparently less common online than untruthful and groundless statements claiming DNA is a specification for making a human body,  ChatGPT gives us the wrong answer on this topic.  We must always remember that ChatGPT and other AI systems are myth amplifiers.  Whenever some erroneous idea is held by a majority of authorities, ChatGPT will tend to repeat such an erroneous idea. Ask such an AI system about the source of the human mind and the nature of human memory, and you will get many a false-as-false-can-be answer.  

A recent science news article at the Phys.Org site is entitled "Massive study detects AI fingerprints in millions of scientific papers." Referring to LLM (Large Language Models) that are the basis of AI tools such as ChatGPT and Gemini, we read this:

"This spike in questionable authorship has raised concerns in the academic community that AI-generated content has been quietly creeping into peer-reviewed publications. To shed light on just how widespread LLM content is in academic writing, a team of U.S. and German researchers analyzed more than 15 million biomedical abstracts on PubMed to determine if LLMs have had a detectable impact on specific word choices in journal articles. Their investigation revealed that since the emergence of LLMs there has been a corresponding increase in the frequency of certain stylist word choices within the academic literature. These data suggest that at least 13.5% of the papers published in 2024 were written with some amount of LLM processing."

Here is a quote from the scientific paper the article is referring to. The LLM acronym refers to Large Language Models that are AI.

"Our analysis of the excess frequency of such LLM-preferred style words suggests that at least 13.5% of 2024 PubMed abstracts were processed with LLMs. With ~1.5 million papers being currently indexed in PubMed per year, this means that LLMs assist in writing at least 200,000 papers per year. This estimate is based on LLM marker words that showed large excess usage in 2024, which strongly suggests that these words are preferred by LLMs like ChatGPT that became popular by that time. This is only a lower bound: Abstracts not using any of the LLM marker words are not contributing to our estimates, so the true fraction of LLM-processed abstracts is likely higher."

What is the problem if those writing biology papers are massively using AI tools such as ChatGPT to help write their papers? There are two main problems.

(1) The false statements in abstracts problem. There is a very massive problem in biology papers these days that paper abstracts are very commonly making claims that are not justified by any research done by the authors of the paper. If a scientist uses some AI system to write a paper's abstract after submitting the main text of the paper to the AI system, this problem will tend to become worse. When I ask Google about the topic of "exaggeration when AI is used to summarize a scientific paper," I get this answer:

"A major concern with using AI to summarize scientific papers is the potential for exaggeration and overgeneralization of findings. 
Specifically:
  • AI summaries are more prone to overgeneralization than human summaries: Studies have shown that AI summaries are significantly more likely to overstate the scope of research findings compared to summaries written by the original authors or expert reviewers.
  • Newer AI models may be worse: Some studies suggest that newer AI models, such as ChatGPT-4o and DeepSeek, may be even more likely to produce broad generalizations than older ones.
  • Ignoring nuances and limitations: AI summaries tend to ignore or downplay uncertainties, limitations, and specific conditions mentioned in the original paper, leading to a potentially misleading presentation of the research. This can have dangerous consequences, especially in fields like medicine, where overgeneralized findings could lead to incorrect medical decisions.
  • 'Unwarranted confidence': AI models might prioritize generating fluent and confident-sounding responses, even if the underlying evidence does not fully support the strong claims they make in their summaries.

 (2) The bad citation problem and legend recitation problem.  Scientific papers very frequently reiterate false or groundless claims about previous scientific research. For example, in the world of neuroscience very many thousands of very low-quality papers have been published, describing poorly-designed experiments guilty of multiple examples of Questionable Research Practices such as way-too-small study group sizes.  What happens is that these junk science papers end up getting cited over and over again by other papers.  You might call this "the afterlife of junk science." 


Very often when this happens the authors of the scientific paper will not even have ever read the body of the shoddy scientific paper they are citing. Again and again and again we have papers claiming that some grand result was established by neuroscience researchers. There follows a list citing a set of papers. But a careful examination of the papers cited will show that none of them provided any good evidence for the grand result claimed.  The citation of low-quality research is extremely abundant in neuroscience papers. When the citation of low-quality research becomes common, we have a situation in which the neuroscience literature serves to propel and propagate myths and legends, groundless boasts of achievements

But when happens when the authors of scientific papers are using AI systems such as ChatGPT to fill up much of the bodies of their papers, the parts dealing with the research of previous neuroscientists? Then there will be an increased tendency towards the propagation and perpetuation of legendary, groundless claims. Here's a "before" and "after":

Before AI: many neuroscientists would not bother to read the papers they were citing, but merely skimmed the abstracts of such papers. 

After AI: now the same neuroscientists do not even bother to read the abstracts of the papers they are citing, but merely copy and paste some answer they got from some AI system. 

AI echo chamber

We have in the research described above yet another giant reason why all statements in neuroscience papers should by default be distrusted. We cannot trust neuroscientists to write abstracts and paper titles accurately summarizing what was accomplished by the research described underneath such titles and abstracts. And we cannot trust neuroscientists to accurately describe what was demonstrated by research done by other neuroscientists. 

bad practices in neuroscience research

No comments:

Post a Comment