Friday, November 24, 2023

Major Journal Suggests 3% of Biology Papers Look Like Paper Mill Junk

In three previous posts on this blog I discussed the issue of fraud in biology research. The posts were these:

A recent article in the journal Nature asks "How big is science's fake-paper problem?"  We read this:

"An unpublished analysis shared with Nature suggests that over the past two decades, more than 400,000 research articles have been published that show strong textual similarities to known studies produced by paper mills. Around 70,000 of these were published last year alone (see ‘The paper-mill problem’). The analysis estimates that 1.5–2% of all scientific papers published in 2022 closely resemble paper-mill works. Among biology and medicine papers, the rate rises to 3%."

What's so bad if a scientific paper resembles the product of a paper mill? The article gives us a bit of a clue, without explaining it very well. It says, "Paper-mill studies are produced in large batches at speed, and they often follow specific templates, with the occasional word or image swapped." The average reader will have no idea of what this refers to, so let me explain. 

In computer programming a template is some body of text containing placeholders. The template can be used to make many different versions of a narrative, by simply replacing the placeholders with specific examples.  For example, the page here gives us a template for producing a press release announcing some scientific research. The template starts out like this:

"Scientists today announced that they are the first to successfully demonstrate SCIENTIFIC FINDING. This has long been one of the holy grails of SCIENTIFIC FIELD. 'This finding radically alters our understanding of the field, to say the least,' says FIRST AUTHOR, a SCIENTIFIC FIELDologist from INSTITUTION who led the research. 'We were stunned when we made the discovery. For a few minutes we just didn’t believe what we were seeing,'  says FIRST AUTHOR, then SECOND AUTHOR (a student of FIRST AUTHOR) yelled "We’ve done it!" and we started dancing around the LAB/OBSERVATORY/FIELD SITE. It was very exciting.”

If you are writing a scientific press release, you could manually replace the capitalized phrases to match some new research.  But templates such as these can also be inputs to computer programs. Computer programs can generate countless different versions of the narratives in a template, by doing search and replace of the capitalized words. 

So, for example, imagine you want 10,000 different versions of the story below:

"MALE HUMAN ONE had a good life, but he knew that something was missing. He tried using dating apps to meet Miss Right, but somehow it never worked out. But one day MALE HUMAN ONE had a stroke of luck.  He was at the BUSINESS PLACE ONE where he was a regular customer. He looked to his left, and was stunned by the beauty of a female he had never met before: FEMALE HUMAN ONE. MALE HUMAN ONE felt sure that he wanted to strike up a conversation with the beautiful stranger, but he couldn't think of what to say. He thought of saying TRITE OVERUSED PICKUP LINE, but thought that would never work.  Suddenly, he had a good idea. Walking up to the stranger he said, ORIGINAL WITTY ICE-BREAKING LINE." 

It would be very easy to write a computer program that generated 10,000 different versions of this story.  The computer program could just run in a loop, and thousands of times replace the phrases MALE HUMAN ONE, FEMALE HUMAN ONE and BUSINESS PLACE ONE with items randomly extracted from a list, or randomly generated. Similarly, the program could thousands of times replace TRITE OVERUSED PICKUP LINE with an item randomly chosen from a list of such lines, and replace ORIGINAL WITTY ICE-BREAKING LINE with  with an item randomly chosen from a list of such lines. 

It seems that paper mills are doing something similar, to generate phony scientific papers, which amount to phony narratives. We hear in the Nature article that some machine-learning software is being used to look for papers that are suspected products of paper mills. An estimate has been produced that 3% of the biology and medicine papers from recent years are fake papers produced by paper mills. This 3% figure is higher than for any of the other fields mentioned. We read this: "June 2022 report by the Committee on Publication Ethics, based in Eastleigh, UK, said that for most journals, 2% of submitted papers are likely to have come from paper mills, and the figure could be higher than 40% for some."

Why would such wrongdoing occur? If you are a scientist living in a "publish or perish" culture, it may be expected that you will author a certain number of papers each year. There is an effect called publication bias, in which scientific journals prefer to publish papers reporting positive results. If you are a scientist doing experiments that have recently produced only null results, you may resort to paying some paper mill to get some result that will have a higher chance of getting published. The paper mill companies are typically in foreign countries, and have discreet names such as Suichow Editorial Services. 

A researcher named Bernhard A. Sabel has developed what he thinks is a pretty simple way to spot paper mill papers in biology and medicine: look for papers which have author email addresses that are private emails or hospital emails rather than college or university emails such as joesmith@harvard.com. The technique of Sabel is entirely different from the technique mentioned at the beginning of this post. 

The latest version of a paper by Sabel describes the paper mill industry:

"The major source of fake publications are 1,000+ 'academic support' agencies – so-called 'paper mills' – located mainly in China, India, Russia, UK, and USA (Abalkina, 2021Else, 2021PĂ©rez-Neri et al., 2022). Paper mills advertise writing and editing services via the internet and charge hefty fees to produce and publish fake articles in journals listed in the Science Citation Index (SCI) (Christopher, 2021Else, 2022). Their services include manuscript production based on fabricated data, figures, tables, and text semi-automatically generated using artificial intelligence (AI). Manuscripts are subsequently edited by an army of scientifically trained professionals and ghostwriters." 

Sabel mentions a case of a paper mill that emailed a scientific journal offering a sum of $1000 if the journal published one of the papers the paper mill (calling itself an editorial services firm) helped to produce. 

A paper by Sabel states this:

"More than 1,000 paper mills openly advertise their services on Baidu and Google to 'help prepare' academic term papers, dissertations, and articles intended for SCI publications. Most paper mills are located in China, India, UK, and USA, and some are multinational. They use sophisticated, state-of-the-art AI-supported text generation, data and statistical manipulation and fabrication technologies, image and text pirating, and gift or purchased authorships. Paper mills fully prepare – and some guarantee –publication in an SCI journal and charge hefty fees ($1,000-$25,000; in Russia: $5,000) (Chawla, 2022) depending on the specific services ordered (topic, impact factor of target journal, with/without faking data by fake 'experimentation')" 

Sabel estimates that paper mills are a major business, earning a revenue of about a billion dollars per year.  He estimates that close to 150,000 papers are questionable papers with red flags indicating possible paper mill authorship.  

academic paper mill
It's so much easier when the "experiments" are all fake

I would imagine that experimental neuroscience papers are some of the easiest types of science papers for paper mills to fake. Many experimental neuroscience papers follow a very similar approach. It's as if very many experimental neuroscientists lack the imagination to think up new types of neuroscience experiment designs, and as if such neuroscientists are just borrowing the design structure from previous experiments (which often have very poor experimental designs).  With such repetition occurring massively, it is easier for paper mills to just detect some design pattern, and use it as a kind of cookie cutter, duplicating most of the text and giving it some novelty by using search-and-replace algorithms in which placeholder text is replaced with phrases chosen from a list.  

What is described above is an example of what can be called commodification corruption. Commodification is when something becomes a commodity to be bought, sold and exchanged in an economic system, and the term has a connotation of something that should not have become a mere commodity becoming a commodity. In the current "publish or perish" culture of academia, two of the biggest commodities may be paper counts (a number supposedly indicating how many papers a scientist has written) and citation counts (a number of times a scientist's papers have been cited).  These metrics are used to judge the performance of scientists. Lots of corruption is occurring in connection with such commodification. Such commodification corruption includes the following:

(1) A large fraction of neuroscientists are producing junk science papers guilty of Questionable Research Practices, with such "quick and dirty" studies occurring largely because an easiest path is being taken for a scientist to increase his "paper count" supposedly listing how many papers he has written. 

(2) Many scientists who did not materially participate in producing a scientific paper are being listed as co-authors (a practice sometimes called "guest authorship"), both to increase "paper counts" of scientists, and to increase the chance of a paper getting published. Such "guest authorship" gifts (given to professors in the same department as the real paper authors) are also given as a kind of bribe to increase the promotion prospects of those granting the gift. 

(3) Many scientists are engaging in appalling lying by claiming they authored some particular number of papers, when they were merely one of the listed authors of most of such papers. For example, if a scientist was the sole author of 10 papers, and was merely one of the authors of 50 other papers (50 papers having an average of 7 authors each), it is very misleading for such a scientist to describe himself as the author of 60 papers (his work being equivalent to being the sole author of merely about 17 papers). 

(4) Scientists are massively citing their own papers (a practice called self-citation), and are citing the papers of their friends or associates while expecting the favor to be returned, in an "old boy network" that can be described as "I'll rub your back if you rub mine." 

(5) Some scientists are paying paper mills (described above) to produce fake papers, for the sake of increasing their "paper count" supposedly listing how many papers they published. 

(6) Quite a few papers are being partially or mostly "ghost written" by employees of pharmaceutical companies or biotech companies, who are paid to produce results (accurate or not) that will tend to raise the stock price of such companies (such as claiming a success for one of the company's pills). Scientists listed as co-authors of such papers (who often did little or nothing to produce them) are often investors in such stocks, and stand to gain from both an increase in their "paper counts," and an increase in the value of their investments. 

Corruption almost inevitably follows commodification.  Something else that has been commodified in the world of science is the production of exaggerated or inaccurate science news stories that serve as clickbait that is highly profitable for parties such as web pages running ads on the web pages that you reach after clicking on some clickbait.  The corruption behind that is very large, and discussed here. We are now pretty much in a territory of "you can't trust the science news headlines," largely because of all the clickbait going on.  

Today on my I-Pad I am reading a story about a neuroscientist who supposedly received many millions in federal funding. The story suggests massive wrongdoing in his research, and suggests that scientists knew about this for years, but were reluctant to blow the whistle because they thought it might harm their careers. We can only imagine how much "turn a blind eye" stuff is going on to help enable some scientists to put fake paper-mill science articles on their resumes, and also poorly designed junk science articles or articles they were not involved in but were listed as co-authors. It sounds like what goes on in the movie industry, where people often turn a blind eye to "casting couch" abuses, not wanting to be called "trouble makers" after they blew a whistle by complaining to the press. 

No comments:

Post a Comment