With the recent release of ChatGPT, many studies have already been uploaded to biorxiv examining the potential uses of the chatbot’s outputs. One such paper compared ChatGPT-generated scientific abstracts to the original abstracts. Upon seeing the title, I immediately got my hopes up that my abstract-writing days were over. So is this the case?
The researchers began by curating 50 abstracts from 10 high-impact medical journals, all published after the latest data cutoff for ChatGPT’s training. ChatGPT was then fed the following input: ‘Please write a scientific abstract for the article [title] in the style of [journal] at [link]’. As ChatGPT cannot browse the internet, the link was provided to add context rather than allowing the bot to read the paper. The generated abstracts were analysed and compared to the original abstracts, looking specifically at cohort size, presence of plagiarism, and signs of being AI generated (using an AI detector and human reviewers).
While all the generated abstracts were written clearly, only 8% are in keeping with the journal’s guidelines, despite the model likely having access to older papers from each of the 10 journals within its training data.
ChatGPT was able to roughly replicate the correct cohort size for each study, indicating that it has seen previous articles containing similar types of studies. Additionally, the ChatGPT abstracts had far fewer (mostly zero) indications of plagiarism, whereas the original abstracts tended to be detected as plagiarising the original article (this is expected, as while ChatGPT couldn’t access the papers, the plagiarism detector could). Off the back of this I did input an older paper title into the chatbot, and checked the generated abstract for plagiarism – the plagiarism detector still found it to be 100% original.
When it came to classification of the abstracts as original or AI-generated, a ChatGPT-specific AI detection model could comfortably tell the difference (AUROC=0.94). However, human reviewers performed less well, misclassifying 16 of the 50 generated abstracts as original. The AI detector has a clear advantage here, as it is specifically trained to spot ChatGPT outputs, whereas the reviewers will have observed relatively few ChatGPT outputs. Over time, this may well change, as scientists become more familiar with AI-generated text. The reviewers also appeared to be looking for different criteria to the AI detector, as there wasn’t necessarily agreement on labelling between the two methods.
Whether ChatGPT can come up with accurate, believable, original writing or not, the use of AI-generated material is controversial. It is reasonably suggested by many that ChatGPT could be a huge source of misinformation, not only in terms of generating misleading information (e.g. inaccurate StackOverflow answers) or for cheating on essay-writing, but also for the spread of propaganda, the writing of code for malware or ransomware, or fake identities online that carry out scams. The use of such AI-generated material is not yet specifically regulated in most places, as is common with such technology, with leaders diverging in opinion about how exactly to go about this.
Overall, it seems like a bad idea to get ChatGPT to write your abstract for you, not least because you’d probably have to add it as an author on your paper.