Dr. GPT will see you now

GPT-4 will hunt for trends in medical records thanks to Microsoft and Epic

Generative AI promises to streamline health care, but critics say not so fast.

Benj Edwards – Apr 19, 2023 5:14 AM | 144

An AI-generated image of a pixel art hospital with empty windows. Credit: Benj Edwards / Midjourney

On Monday, Microsoft and Epic Systems announced that they are bringing OpenAI's GPT-4 AI language model into health care for use in drafting message responses from health care workers to patients and for use in analyzing medical records while looking for trends.

Epic Systems is one of America's largest health care software companies. Its electronic health records (EHR) software (such as MyChart) is reportedly used in over 29 percent of acute hospitals in the United States, and over 305 million patients have an electronic record in Epic worldwide. Tangentially, Epic's history of using predictive algorithms in health care has attracted some criticism in the past.

In Monday's announcement, Microsoft mentions two specific ways Epic will use its Azure OpenAI Service, which provides API access to OpenAI's large language models (LLMs), such as GPT-3 and GPT-4. In layperson's terms, it means that companies can hire Microsoft to provide generative AI services for them using Microsoft's Azure cloud platform.

The first use of GPT-4 comes in the form of allowing doctors and health care workers to automatically draft message responses to patients. The press release quotes Chero Goswami, chief information officer at UW Health in Wisconsin, as saying, "Integrating generative AI into some of our daily workflows will increase productivity for many of our providers, allowing them to focus on the clinical duties that truly require their attention."

The second use will bring natural language queries and "data analysis" to SlicerDicer, which is Epic's data-exploration tool that allows searches across large numbers of patients to identify trends that could be useful for making new discoveries or for financial reasons. According to Microsoft, that will help "clinical leaders explore data in a conversational and intuitive way." Imagine talking to a chatbot similar to ChatGPT and asking it questions about trends in patient medical records, and you might get the picture.

GPT-4 is a large language model (LLM) created by OpenAI that has been trained on millions of books, documents, and websites. It can perform compositional and translation tasks in text, and its release, along with ChatGPT, has inspired a rush to integrate LLMs into every type of business, whether appropriate or not.

Producing things that “look like facts”

However, the partnership between Microsoft and Epic causes concern among researchers of large language models, partly due to GPT-4's ability to make up (confabulate) information that isn't represented in its data set.

"Language models aren't trained to produce facts. They are trained to produce things that look like facts," says Dr. Margaret Mitchell, chief ethics scientist at Hugging Face. "If you want to use LLMs to write creative stories or help with language learning—cool. These things don't rely on declarative facts. Bringing the technology from the realm of make-believe fluent language, where it shines, to the realm of fact-based conversation, is exactly the wrong thing to do."

Mitchell explains that natural language generation (NLG) in health care has been studied for years, and one of the most important things the work has shown is that it is important to provide information that is not misleading. "This can be done to some extent through the use of templates and rules," she says. "By using templates and rules, we can achieve a lot of what this [work from Microsoft] seems to be trying to do—and draw from years of work from experts in NLG like Ehud Reiter —without infusing fictional content into a system that should be fact-based."

In addition, some individuals are wondering if LLMs are the right tool for the job. "Work on dialogue and natural language generation has been an active area of research and development for years," says Mitchell. "I wonder what those systems are not providing that makes Epic and Microsoft feel they're not worth investing in, and that they should instead be investing in something that's not fit for the purpose."

Another concern comes from potential bias in GPT-4 that might discriminate against certain patients based on gender, race, age, or other factors. In OpenAI's system card for GPT-4, researchers working on behalf of OpenAI wrote, "We found that GPT-4-early and GPT-4-launch exhibit many of the same limitations as earlier language models, such as producing biased and unreliable content."

Even if Epic uses a fine-tuned version of GPT-4 specifically trained for medical usage, bias could arrive in the form of subtle phrasing in automated doctor-to-patient communications, or while trying to convey conclusions about medical data, even if those conclusions are drawn from an existing external system like SlicerDicer.

“The difference between life and death”

Considering these limitations, OpenAI's own usage policies understandably state that its models cannot be used to provide instructions on how to cure or treat a health condition. In particular, "OpenAI’s models are not fine-tuned to provide medical information. You should never use our models to provide diagnostic or treatment services for serious medical conditions."

Epic and Microsoft's application of GPT-4 seems to skirt these rules by avoiding the implication that GPT-4 itself is providing communications autonomously (without a medical professional in the loop) or perhaps by providing diagnosis or data analysis on individual patient records, only in aggregate, in the case of SlicerDicer. Microsoft and OpenAI were not available for comment at press time. We reached out to them for clarification and will update if we receive a response.

Overall, Mitchell says she is frustrated by the general misunderstanding and misapplication of generative AI in today's hype-rich environment: "Combined with the well-known problem of automation bias, where even experts will believe things that are incorrect if they're generated automatically by a system, this work will foreseeably generate false information," says Mitchell. "In the clinical setting, this can mean the difference between life and death."

Listing image: Benj Edwards / Midjourney

Benj Edwards Senior AI Reporter

Benj Edwards is Ars Technica's Senior AI Reporter and founder of the site's dedicated AI beat in 2022. He's also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

144 Comments