GPT-4’s Performance on Bar Exam Overstated, MIT Study Reveals—Fell Short of 70th Percentile

Recent claims that OpenAI’s GPT-4 model surpassed 90% of aspiring lawyers on the bar exam seem exaggerated, according to a new MIT study. Initially heralded for its achievement, the reality appears different, as GPT-4 didn’t even reach the top 10% percentile.
The claim, made last year by OpenAI, caused a stir across both media and legal circles. However, the latest research indicates that the touted 90th percentile figure was skewed, primarily favoring repeat test-takers who had previously failed the exam, a group typically scoring lower than average. These findings were published on March 30 in the journal Artificial Intelligence and Law.
The model, renowned for its capabilities, garnered attention after supposedly outperforming many human test-takers. Yet, upon closer examination, it achieved only the 69th percentile when compared to all test-takers and a modest 48th percentile among first-time examinees.
Additionally, the study revealed GPT-4’s lackluster performance in the essay-writing section, crucial for assessing lawyering skills. Scoring below average in this aspect suggests limitations in tasks resembling real legal practice, indicating that while the model’s advancements are commendable, there are significant gaps in its capabilities.
The study highlights the importance of thoroughly evaluating AI systems, especially concerning legal applications, to prevent unintended adverse outcomes. Despite AI’s potential benefits, caution is warranted to ensure its safe and effective integration into legal settings.

Leave a Reply

Your email address will not be published. Required fields are marked *