2025 Proffered Presentations
S336: APPLICATION OF ARTIFICIAL INTELLIGENCE FOR PEER REVIEW OF SKULL BASED ARTICLES
Ali A Mohamed, MS1; Daniel Colome1; Jack Yang1; Emma Sargent1; Brandon Lucke-Wold, MD, PhD2; 1Charles E. Schmidt College of Medicine, Florida Atlantic University; 2Lillian S. Wells Department of Neurosurgery, University of Florida
Background: The use of artificial intelligence (AI) in peer review presents both opportunities and challenges. In neurosurgery, where advancements are rapid, the traditional peer review process can lead to delays in disseminating critical findings. This study evaluates the effectiveness of AI in predicting the acceptance or rejection of neurosurgical manuscripts, offering insights into its potential integration into the peer review process.
Methods: Preprint.org and medRxiv.org were queried for skull based articles with the following topics: adjuvant therapy, anterior skull base/orbit, basic science, cavernous sinus/middle fossa, clivus/craniocervical junction, meningioma, paraganglioma, pediatrics, training and education, value based care/quality of life, vestibular schwannoma, head and neck tumors-nonsinonasal, malignancy, lateral skull base/CPA/jugular foramen, pituitary adenoma, sinonasal malignancy, surgical approaches and technology, and vascular. Preprints that were later published were compared with preprints that have not been published and were uploaded on the preprint servers for at least 12 months; presumed to be rejected. Each article was uploaded in an independent ChatGPT 4o query with the following prompt: “Based on the literature up to the date this article was posted, will it be accepted or rejected for publication following peer review? Please provide a yes or no answer.” Impact factor and cite score of journals at the time of publication of the assessed accepted articles were collected. T-tests were used to compare journal metrics corresponding to accepted articles between ChatGPT accepted and rejected articles. Chi square analysis was used to compare between ChatGPT assessment of article acceptance or rejection with preprints accepted or presumed to be rejected.
Results: A total of 31 preprints, 18 published and 13 presumed to be rejected, were included in the analysis. The average impact factor and cite score of accepted articles were 4.36 ± 2.07 and 6.38 ± 3.67, respectively. The impact factor and cite score of journals corresponding to accepted articles that were also accepted by ChatGPT were not significantly different than those rejected by ChatGPT (p=0.932 and p=0.490, respectively). ChatGPT had significantly low performance in correctly accepting (66.67%) published articles or rejecting (61.54%) presumed to be rejected articles (p<0.001).
Discussion: This study’s findings indicate that while ChatGPT currently demonstrates only moderate accuracy in predicting peer review outcomes, there is significant potential for improvement. Current generative AI models are limited to publicly available date for natural language processing and model training. This presents a notable challenge in appraising the utility of AI in the peer review process as rejected manuscripts are not publicly disclosed for confidentiality and privacy purposes. Future iterations of generative AI models in collaboration with journals and with the consent of submitting authors may provide a more balanced training pool of data for specific models designed to facilitate a more efficient peer review process.
Conclusion: ChatGPT shows moderate accuracy in predicting peer review outcomes, but with continued refinement, AI has the potential to assist in streamlining the peer review process.