'Meticulously Commendable': AI's Fingerprints Found All Over Recent Academic Papers

A new study has found that AI tools such as ChatGPT were likely used to assist with a significant number of research papers published in 2023, mostly without proper disclosure.

The study, ChatGPT ‘contamination': estimating the prevalence of LLMs in the scholarly literature was conducted by Andrew Gray of University College London Library Services.

Gray analyzed millions of academic articles indexed in the Dimensions database, a repository of mainly scientific research articles and published papers. He looked for distinct language patterns and word choices which are the fingerprint of LLMs, in documents published between 2019 and 2023.

In particular, Gray looked at the frequency of 12 adverbs and 12 adjectives which tend to be more frequently used by LLMs than in the literature as a whole:

Adjectives: commendable; innovative; meticulous; intricate; notable; versatile; noteworthy; invaluable; pivotal; potent; fresh; ingenious

Adverbs: meticulously; reportedly; lucidly; innovatively; aptly; methodically; excellently; compellingly; impressively; undoubtedly; scholarly; strategically

There are several factors that could lead to LLM chatbots' tendency to overuse words like these, including biases in training data, lack of contextual understanding, and the fact that chatbots are usually designed to generate "positive" output.

Gray found that the use of positive adjectives like "commendable," "meticulous," and "innovative" surged by over 30% on average in papers published in 2023 compared with 2022. The increase for some terms was even higher, with "intricate" rising 117% in 2023.

The adverb "meticulously" saw a 137% increase, and both "methodically" and "innovatively" were 26% more frequent. The years before 2023 - when ChatGPT et al. started being widely used - saw no marked fluctuations.

Combining these LLM-associated words into groups, the effect was even more pronounced. Papers containing at least one of the four strongest indicator words (intricate, meticulous, meticulously or commendable) showed a 87% increase in 2023, while those containing any combination of two or more of those terms rose by a massive 468%.

In terms of absolute numbers, the proportion of papers showing influence of LLMs is still small at around 60,000 to 85,000 articles published in 2023 - around 1%-2% of the total number studied.

Nevertheless, early indications show that papers published in 2024 have an increased frequency of the same words over 2023.

The study does not prove that AI was used to write any particular paper, and AI tools may be used to correct spelling and grammar. However, Gray argues the disproportionate use of LLM-linked adverbs and adjectives indicates that the use of AI tools extends beyond mere presentational improvements.

"While none of these can be easily tested...it seems reasonable to assume that the correlations found for peer reviews also hold up for papers - which would imply a substantial proportion of these papers may have more significant LLM involvement than was openly disclosed," he wrote.

Does It Matter That Academic Researchers Are Using GenAI?

Short answer, yes.

Many people use AI chatbots in their work, but original research is a special case. Publishers of academic papers, such as Wiley, say the use of AI tools should be disclosed if it goes beyond grammar and spelling assistance. However, very few of the papers showing the signs of AI-generated content featured such a disclosure, according to Gray.

Gray suggests this lack of transparency could ultimately undermine the trust in academic publications and the validity of their contributions. Generative AI is prone to hallucinations, producing plausible-looking content that is not based on fact, or which does not make sense in context. Given the rise in the use of these tools this could quickly become a problem.

Secondly, academic papers are a primary resource for training LLMs themselves. AI-generated text in the academic papers therefore poses a risk of "model collapse," a downward spiral where artificially generated text starts to outweigh genuine content, degrading the models themselves.

"Authors who are using LLM-generated text must be pressured to disclose this – or to think twice about whether doing so is appropriate in the first place – as a matter of basic research integrity," Gray concludes.

"From the other side, publishers may need to be more aggressive to engage with authors to identify signs of undisclosed LLM use and push back on this or require disclosure as they feel is appropriate."

This article originally appeared on our sister site Computing.