'Meticulously Commendable': AI's Fingerprints Found All Over Recent Academic Papers

Tell-tale signs of genAI are rising fast

John Leonard
clock • 4 min read
'Meticulously Commendable':  AI's Fingerprints Found All Over Recent Academic Papers

A new study has found that AI tools such as ChatGPT were likely used to assist with a significant number of research papers published in 2023, mostly without proper disclosure.

The study, ChatGPT ‘contamination': estimating the prevalence of LLMs in the scholarly literature was conducted by Andrew Gray of University College London Library Services.

Gray analyzed millions of academic articles indexed in the Dimensions database, a repository of mainly scientific research articles and published papers. He looked for distinct language patterns and word choices which are the fingerprint of LLMs, in documents published between 2019 and 2023.

In particular, Gray looked at the frequency of 12 adverbs and 12 adjectives which tend to be more frequently used by LLMs than in the literature as a whole:

Adjectives: commendable; innovative; meticulous; intricate; notable; versatile; noteworthy; invaluable; pivotal; potent; fresh; ingenious

Adverbs: meticulously; reportedly; lucidly; innovatively; aptly; methodically; excellently; compellingly; impressively; undoubtedly; scholarly; strategically

There are several factors that could lead to LLM chatbots' tendency to overuse words like these, including biases in training data, lack of contextual understanding, and the fact that chatbots are usually designed to generate "positive" output.

Gray found that the use of positive adjectives like "commendable," "meticulous," and "innovative" surged by over 30% on average in papers published in 2023 compared with 2022. The increase for some terms was even higher, with "intricate" rising 117% in 2023.

The adverb "meticulously" saw a 137% increase, and both "methodically" and "innovatively" were 26% more frequent. The years before 2023 - when ChatGPT et al. started being widely used - saw no marked fluctuations.

Combining these LLM-associated words into groups, the effect was even more pronounced. Papers containing at least one of the four strongest indicator words (intricate, meticulous, meticulously or commendable) showed a 87% increase in 2023, while those containing any combination of two or more of those terms rose by a massive 468%.

In terms of absolute numbers, the proportion of papers showing influence of LLMs is still small at around 60,000 to 85,000 articles published in 2023 - around 1%-2% of the total number studied.

Nevertheless, early indications show that papers published in 2024 have an increased frequency of the same words over 2023.

The study does not prove that AI was used to write any particular paper, and AI tools may be used to correct spelling and grammar. However, Gray argues the disproportionate use of LLM-linked adverbs and adjectives indicates that the use of AI tools extends beyond mere presentational improvements.

"While none of these can be easily tested...it seems reasonable to assume that the correlations found for peer reviews also hold up for papers - which would imply a substantial proportion of these papers may have more significant LLM involvement than was openly disclosed," he wrote.

Does It Matter That Academic Researchers Are Using GenAI?

Short answer, yes.

Many people use AI chatbots in their work, but original research is a special case. Publishers of academic papers, such as Wiley, say the use of AI tools should be disclosed if it goes beyond grammar and spelling assistance. However, very few of the papers showing the signs of AI-generated content featured such a disclosure, according to Gray.

Gray suggests this lack of transparency could ultimately undermine the trust in academic publications and the validity of their contributions. Generative AI is prone to hallucinations, producing plausible-looking content that is not based on fact, or which does not make sense in context. Given the rise in the use of these tools this could quickly become a problem.

Secondly, academic papers are a primary resource for training LLMs themselves. AI-generated text in the academic papers therefore poses a risk of "model collapse," a downward spiral where artificially generated text starts to outweigh genuine content, degrading the models themselves.

"Authors who are using LLM-generated text must be pressured to disclose this – or to think twice about whether doing so is appropriate in the first place – as a matter of basic research integrity," Gray concludes.

"From the other side, publishers may need to be more aggressive to engage with authors to identify signs of undisclosed LLM use and push back on this or require disclosure as they feel is appropriate."

This article originally appeared on our sister site Computing

You may also like
Midmarket Reacts, Recovers From CrowdStrike Outage

Software

Needless to say, the outage placed additional burden on IT departments, particularly those in the midmarket where budgets and team sizes can be limited.

clock 07-23-2024 • 5 min read
SolarWinds Patches Eight Critical Flaws In Access Rights Manager Software

Security

The latest revelation comes as a U.S. district judge last week dismissed most of a lawsuit that accused SolarWinds of misleading investors.

clock 07-22-2024 • 3 min read
Access Point: Weekly News Roundup For IT Executives – July 19, 2024

Column

Access Point is a weekly roundup of major tech news for IT executives on the go. This edition covers July 15-July 19.

clock 07-19-2024 • 1 min read

More on Artificial Intelligence

GenAI Is A 'Tax' For Software Companies: Gartner

GenAI Is A 'Tax' For Software Companies: Gartner

"Revenue gains from the sale of GenAI add-ons ... flow back to their AI model provider partner"

John Leonard
clock 07-22-2024 • 2 min read
Microsoft AI Chief Makes Questionable Claims About Copyright And Online Content

Microsoft AI Chief Makes Questionable Claims About Copyright And Online Content

Says web content is 'freeware' for training AI

clock 07-03-2024 • 3 min read
Report Ranks This US City As World's Top AI Hub. I Was Shocked: Opinion

Report Ranks This US City As World's Top AI Hub. I Was Shocked: Opinion

It's not on the West Coast

Samara Lynn
clock 06-26-2024 • 2 min read