Microsoft research head Peter Lee. (GeekWire File Photo / Clare McGrane)

Peter Lee has spent a lot of time recently with GPT-4, the AI-powered tool that simulates human conversation, built by OpenAI with contributions from its partner Microsoft.

“I lost a couple weeks of sleep,” said Lee at a lecture Monday at the University of Washington. “It was very intense.”  

Lee, head of Microsoft Research, is tasked with assessing the implications of the tool for medicine. And he thinks it could increase efficiency and even empathy in the healthcare system, as well as boost biomedical research.

GPT-4 has “amazing capabilities” Lee said during his lecture. And for healthcare, “it ends up being a really potentially useful tool.”

Lee and his colleagues outlined some potential use cases in an article in the New England Journal of Medicine (NEJM), released Thursday. These include supporting diagnoses, improving doctor-patient conversations, and reducing online paperwork.

“The paperwork burden on doctors and nurses is just dreadful,” said Lee in a separate interview with GeekWire. Of all applications, easing medical documentation and similar burdens is the one he thinks the most about.  

The GPT-4 chatbot was trained on vast amounts of open information on the internet, including medical sources. The tool answers test questions correctly from the U.S. medical licensing exam more than 90% of the time.

But it also has limitations. GPT-4 often “hallucinates” false responses to queries. The mistakes can be subtle and hard for users to identify, said Lee. An example is rounding down a calculation in a doctor’s note instead of the standard up-rounding, Lee showed at the UW. GPT-4 also projects blithe confidence.

The combination of errors and conviction can be “dangerous” in medical scenarios, said Lee and his co-authors in the NEJM article. People without medical backgrounds may be more easily fooled by GPT-4, noted an accompanying NEJM editorial.

GPT-4, however, is also capable of correcting its mistakes when asked to review its own output. And it has revealed some quirks along the way. An earlier GPT-4 version was more likely to act “emotionally attached” to its answers, said Lee.

Outside researchers say it’s hard to know what is under the hood of GPT-4. OpenAI reveals few details about its underlying algorithms and training process. But even computer scientists better acquainted with its workings, like Lee, are still trying to understand how GPT-4 thinks.

“We don’t understand how or why these capabilities have come out,” said Lee at the UW. GPT-4 is a “different beast” than GPT-3.5, which powers OpenAI’s free chatbot, he said.

Startups are also getting into the game, bolting on GPT-4 to augment their capabilities. “I think it’s probably important for any startup to understand how well its existing value proposition holds up in a world with GPT-4,” Lee told GeekWire. He added: “It’s not just startups, it’s our own products within Microsoft.”

Lee is also looking further into the future. OpenAI is preparing to release a GPT-4 version capable of analyzing images. In the long term such models may support evaluation of pathology and medical image data, said Lee at the UW. He added: “GPT-4 is not the disruption. It’s going to be the models that are coming next.”

Lee is also soon releasing a book with his colleagues, “The AI Revolution in Medicine: GPT-4 and Beyond,” exploring the implications in depth, and he was featured in a podcast Thursday.

Read on for examples of medical use cases and some hints from Lee about GPT-4’s thinking process.

Healthcare documentation

Doctors often spend hours each day writing up their encounters with patients. GPT-4 could help end that, according to Lee. The tool is capable of summarizing medical encounters in a variety of formats, with billing codes attached, said Lee. Microsoft subsidiary Nuance is already incorporating GPT-4 into a medical note-taking system trained on medical data, and will preview the application this summer.

Google, which recently released its AI-powered chatbot Bard, has similarly built a tool to summarize patients’ medical conditions.

Other potential use cases for GPT-4 include generating orders for lab tests and prescriptions and filling out text for prior authorization requests. 

Large language models like GPT-4 “are on the verge of solving some long-standing problems in medical documentation,” said Lee during his talk. A lot of companies will likely leverage GPT-4 to build tools for such purposes, he added.

The GPT-4 chatbot is also adept at suggesting language to provide patients comfort and support. “It’s able to imagine these situations of what it’s like to be in an exam room,” said Lee during his talk. “You see signs of an understanding of how the world works.”

Peter Lee, head of Microsoft Research, gives a lecture at the University of Washington this week. (UW Photo)

Medical diagnosis

Chat GPT-4 may also help physicians make differential diagnoses, listing possible conditions that match symptoms and ranking them, said Lee. He sees physicians using the tool in the way they bounce ideas off of colleagues.

Data interoperability

Health data is siloed in different formats and in different systems, stymying patients and clinicians who want access to clinical records and researchers who want to study them. GPT-4 can help support format conversion, said Lee.

Research papers

Scientists are beginning to use large language models to help write scientific papers. “Some of the best interactions I’ve had is when I’ve asked GPT-4 to read a medical research paper and then have a conversation about it,” said Lee in his talk.

Microsoft’s new Bing new search engine is linked to GPT-4 and will provide summaries in response to scientific queries. Bing hallucinates fewer unrelated scientific references than the standalone chatbot, which is cut off from the internet, said Lee.

Consensus, a startup that provides accessible summaries of scientific research, has already added GPT-4 onto its offerings. And Microsoft recently released a demo version of BioGPT, a large language model trained on research articles.

Biomedical studies

Users can instruct GPT-4 to cull a variety of existing research applications into a single AI assistant, said Lee. The assistant could tap into the data connected to the applications and standardize formatting, easing analyses and the training of new machine learning models.

Lee envisions fine-tuning GPT-4 models on specific biological datasets — and ultimately using large-scale neural transformers to predict protein structures. Microsoft is seeing similar abilities to predict protein structure as AlphaFold, a lauded system built by DeepMind, Lee told GeekWire.

“I think that we’re going to see some really useful tools that will help researchers get more done,” Lee told GeekWire.

The mind of the machine

GPT-4 is both “smarter than you and dumber than you” at math, statistics and logic, said Lee in his talk.

GPT-4 has trouble solving Sudoku puzzles because they involve backtracking and re-evaluating answers, and GPT-4 is a “feed forward” tool. “It’s not like you. It’s a different kind of intelligence,” said Lee at the UW.

Microsoft researchers have found that neural nets can solve certain mathematical problems better after they have been trained on language texts, Lee told GeekWire. “And that is a mysterious and weird thing,” he said. The findings also have implications for human intelligence. Said Lee: “Are there forms of mathematics that we’re blind to because our brains are hardwired for language?”

Training models on data like protein structures could yield algorithms and circuits that are hard for human minds to envision, Lee told GeekWire. The outcomes could reveal blind spots in human logic and insights into computer thought. “These are the sort of mysteries that computer science is struggling with,” said Lee.

Like what you're reading? Subscribe to GeekWire's free newsletters to catch every headline

Job Listings on GeekWork

Find more jobs on GeekWork. Employers, post a job here.