OII | Benchmarking Large Language Models for Self-Diagnosis

Project Contents

Overview
Key Information
Participants
Project News
Project Press Coverage

Full project title: Benchmarking Large Language Models for Self-Diagnosis

Overview

Our work investigates applications of large language models (LLMs) in healthcare settings, with a particular focus on interactions between LLMs and human users. It is a multi-phase project led by Dr Adam Mahdi (Principal Investigator), along with a team of AI researchers and clinical experts.

The Project focuses on LLMs for medical self-diagnosis and identifies factors leading to collaboration failure. We are recruiting 1,000 individuals to better understand these human-machine interactions. This is one of the first large scale projects to study how members of the general public attempt to use LLMs for information-seeking purposes, and adds an important lens in focusing on the risks arising from the interactions.

Key Information

Funders:

Prolific

Clarendon Fund

Contact:
Adam Mahdi

Participants

Professor Adam Mahdi

Associate Professor

Project role: Principal Investigator

Adam Mahdi’s research focuses on digital health and application of machine learning in social sciences. He is the director of the UKRI-funded OxCOVID19 Project and a fellow at Wolfson College, University of Oxford.

View profile

Andrew Bean

DPhil Student

Project role: Co-Investigator

Andrew holds a B.S. in Applied Mathematics from Yale University and an MSc in Social Data Science from the OII. He is a Clarendon Scholar and was previously a Thouron Prize winner at the University of Cambridge (Pembroke College).

View profile

Dr Guy Parsons

DPhil Student

Project role: Co-Investigator

Guy is a Shirley Scholar at the OII, an Intensive Care doctor, and the Clinical Lead for a Health AI business at Deloitte. He researches AI-enabled healthcare systems.

View profile

Project News

New study warns of risks in AI chatbots giving medical advice

9 February 2026

Study finds AI chatbots less helpful than search engines for medical advice

Read now

Study identifies weaknesses in how AI systems are evaluated

4 November 2025

Largest systematic review of AI benchmarks highlights need for clearer definitions and stronger scientific standards.

Read now

Project Press Coverage

AI chatbots are no better at medical advice than a search engine

The Register, 09 February 2026

A new study led by OII researchers warns of the risks in AI chatbots giving medical advice.

Read now

Dr Adam Mahdi explains talks chatbots and medical advice with BBC Sounds

BBC Radio Oxford, 09 February 2026

Dr Adam Mahdi warns that AI chatbots can be unreliable for medical advice - especially in real conversations where information is incomplete.

Read now

Medical misdiagnoses with AI: ‘It’s the people who break the process’

NZZ, 10 February 2026

When medical laypeople are asked to interpret symptoms with the help of AI, they usually get it wrong, according to a recent study led by OII researchers.

Read now

Relying on AI chatbots for healthcare can be ‘dangerous’ as they give bad advice and wrong diagnoses, researchers warn

Daily Mail, 09 February 2026

New study led by OII researchers warns of the risks in AI chatbots giving medical advice.

Read now