ChatGPT-4: su desempeño en un examen final de la carrera de médico especialista en oftalmología de la Universidad de Buenos Aires

Roberto Borrone

doi:10.70313/2718.7446.v17.n01.286

Authors

Roberto Borrone Cátedra de Oftalmología de la Facultad de Medicina de la Universidad de Buenos Aires (UBA).

DOI:

https://doi.org/10.70313/2718.7446.v17.n01.286

Keywords:

artificial intelligence, ChatGPT-4, medical education, ophthalmology

Abstract

Objectives: To evaluate the performance of the GPTChat-4 in a final exam of the University Medical Specialist in Ophthalmology degree at the University of Buenos Aires and compare it with the performance of the students and the ChatGPT-3.5.

Materials and methods: Observational, retrospective and analytical study. The answers of a group of students were evaluated in a multiple choice exam of 50 questions with 4 answer options taken on September 8th, 2023. They were compared with the performance on the same exam of ChatGPT versions 3.5 and 4.

Results: Students (n = 7) correctly answered, on median, 39 questions (39/50), an accuracy of 78%, with a range of correct answers between 33 and 45. The average time to complete the exam was 75 minutes. ChatGPT-3.5 correctly answered 31 questions (31/50), an accuracy of 62%. Chat GPT-4 correctly answered 40 questions (40/50) an accuracy of 80% completing the exam in 73,49 seconds.

Conclusions: ChatGPT-4 achieved higher performance than the average student using 61 times less time. ChatGPT-4 achieved higher accuracy than ChatGPT-3.5. The grade obtained by the two versions of ChatGPT allows to pass the final exam of the University Medical Specialists in Ophthalmology Career at the University of Buenos Aires.

References

Ting DSJ, Tan TF, Ting DSW. ChatGPT in opthalmology: the dawn of a new era ? Eye (Lond) 2024; 38: 4-7.

Raimondi R; Tzoumas N; North East Trainee Research in Ophthalmology Network (NETRiON) et al. Comparative analysis of large language models in the Royal College of Ophthalmologists fellowship exams. Eye (Lond) 2023; 37: 3530-3533.

Kung TH, Cheatham M, Medenilla A et al. Performance of Chat GPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2023; 2: e0000198.

Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N Engl J Med 2023; 388: 1233-1239.

Moshirfar M, Altaf AW, Stoakes IM et al. Artificial intelligence in opthalmology: a comparative analysis of GPT-3.5, GPT-4, and human expertise in answering StatPearls questions. Cureus 2023; 15: e40822.

Takagi S, Watari T, Erabi A, Sakaguchi K. Performance of GPT-3.5 and GPT-4 on the Japanese medical licensing examination: comparison study. JMIR Med Educ 2023; 9: e48002.

Wang H, Wu W, Dou Z et al. Performance and exploration of ChatGPT in medical examination, records and education in Chinese: pave the way for medical AI. Int J Med Inform 2023; 177: 105173.

De Vito E. Inteligencia artificial y chat GPT. ¿Usted leería a un autor artificial? Medicina (B Aires) 2023; 83: 329-332.

Lüthy IA. Inteligencia artificial y aprendizaje de máquina en diagnóstico y tratamiento del cáncer. Medicina (B Aires) 2022; 82: 798-800.