by

From Novice to Prodigy: A Perspective on AI’s Rapid Ascent

Imagine this scenario: a kindergartner—initially grappling with the basics of counting, addition and subtraction—manages to leapfrog several years of curriculum and advance her mathematical capabilities to a fifth grade level within a year. Such a feat would be recognized as a significant achievement.

Now suppose the teachers at the school see these results and respond by saying, “Her ability to learn mathematics is all well and good, but ultimately it’s not that meaningful. The real world is messy, and a standardized exam doesn’t reflect that. Besides, the kid doesn’t know a thing about algebra, geometry or calculus!” Suddenly, the child’s achievement seems overshadowed by the skills she lacks or hasn’t yet mastered, instead of focusing on her exponential improvement.

The dismissal feels shortsighted, doesn’t it?

AI Generates Wonder…Then Pushback

The previous scenario may sound like an exaggeration, but we’re seeing a parallel story happening today with AI in health care, specifically with large language models (LLMs) like GPT-4. Soon after the public release of ChatGPT in November 2022, researchers started to test this AI and other similar tools on a range of industry benchmarks. By February 2023, a study evaluated ChatGPT’s performance on the three subtests of the US Medical Licensing Exam (USMLE), showing the tool was able to perform at or near the passing threshold of 60% accuracy without any prior specialized training. In the following months, other models like Google’s Med-PaLM 2, specifically trained and fine-tuned for medical tasks, started to show even stronger performance on the USMLE and other measures.

Despite these breakthroughs, it didn’t take long for a subset of health care stakeholders to dismiss AI’s performance on these exams, suggesting that assessments like the USMLE were now suddenly obsolete—a mere demonstration of rote memorization of facts that didn’t really reflect the complexities of clinical decision-making. (Makes one wonder if, prior to ChatGPT, any medical school graduates celebrated passing the USMLE or immediately dismissed their passing score as a meaningless achievement.)

GPT-4 and Beyond

OpenAI’s GPT-4 was released by mid-March 2023, providing another significant boost in LLM performance. By November 2023—just one year after the public release of ChatGPT—researchers from OpenAI and Microsoft demonstrated how GPT-4 could achieve impressive results on all nine of the benchmark datasets in the MultiMedQA suite. The study showed how a generalist LLM like GPT-4 could outperform health care domain-specific models through focused prompting strategies, achieving a score of over 90% on the MedQA dataset. These industry performance evaluations are not slowing down, with several recent studies comparing AI and clinician performance in board residency exams and deeper dives in specialties like oncology and ophthalmology. OpenAI’s competitors are also keeping busy, with Google recently announcing its latest medical AI models that it claims outperforms GPT-4.

While researchers are busy evaluating newer LLM performance, it is worth noting that AI has a history of outperforming humans in tests that goes back to 2015. According to Stanford University’s 2024 AI Index Report, AI has already surpassed human benchmarks in areas like image classification, basic reading comprehension and visual reasoning. But even in areas where AI still underperforms compared to humans, its progress has been quick. For example, AI systems went from solving only 6.9% of problems in the MATH benchmark exam (competition-level math problems) in 2021 to solving 84.3% of the dataset’s problems by 2023 (human baseline performance is 90%).

Looking Ahead

To be clear, we should not blindly accept the results of every new AI study. A healthy dose of skepticism can help filter out the noise and push stakeholders to think more critically about AI, but it can also result in cyclical exercises of “moving the goalposts” that consistently leads to inaction. We’ve previously provided some strategic steps to prepare for generative AI’s role in health care, but there are additional considerations for health system leaders as the AI landscape continues to change:

  • Get used to fast-paced change: If we know anything about technology, it’s that progress isn’t always linear and can dramatically shift in unexpected ways. LLMs like GPT-4 are pushing the boundaries of what we thought was achievable with AI, and it’s doing so on a timescale of months. GPT-5 is coming around the corner, and many organizations are simply not ready for it.
  • View AI through a lens of opportunity: It’s understandable, if not required, that we set a high bar for trusting emerging technology, especially in health care where patient lives are on the line, and recent protests have shown the extent of these concerns. But if health care stakeholders seek to dismiss AI’s evolution at every turn, they run the risk of not leveraging this technology to better serve their interests or business goals. Going back to our student analogy: sometimes, instead of sticking to mental math or pen and paper, using a calculator just makes sense.
  • Prepare to take the leap: The gap between early adopters and laggards on the technology adoption curve is going to expand faster than it has in the past. If your organization hasn’t made a serious effort to experiment with AI yet, it’s high time you do so—like yesterday. Not every organization can afford to be on the bleeding edge of innovation, but the proliferation of open-source AI tools, free online education, and ongoing incorporation of AI into core IT systems and applications will facilitate greater access and comfort with AI tools.

  • Share
  • Follow Sg2 on Twitter
  • Connect with Sg2 on LinkedIn