OII | Reasoning with Machines AI Lab

The challenge

Group lead

Dr Adam Mahdi

Senior Research Fellow

Adam Mahdi’s research focuses on digital health and application of machine learning in social sciences. He is the director of the UKRI-funded OxCOVID19 Project and a fellow at Wolfson College, University of Oxford.

View profile

Artificial intelligence (AI) and, in particular, large language models (LLMs), are revolutionising industries ranging from healthcare to customer service, but understanding their behaviour and ensuring their reliability and fairness is crucial.

At the Reasoning with Machines AI Lab we focus on benchmarking and evaluating AI systems, with a special emphasis on LLMs. We seek to answer critical questions such as: How can we accurately assess the capabilities of AI systems? How do these systems interact with humans? And how can we interpret their internal workings?

Our research

Driven by the need for transparency and trust in AI, the Reasoning with Machines AI Lab is advancing the benchmarking and evaluation of large language models to unlock their potential across various domains. Our approach combines human-computer interaction (HCI) with mechanistic interpretability. By both studying how humans engage with AI systems and exploring the inner mechanics of these models, we aim to develop a deep, nuanced understanding of their functioning and limitations. This dual focus allows us to

Evaluate the reasoning capabilities of AI models, ensuring that they align with human expectations and can perform tasks effectively
Investigate the practical applications of AI in domains like healthcare, where trust and transparency are paramount, and human-computer interaction, where user experience and model reliability are essential.

Our current research areas include:

Benchmarking AI reasoning abilities. Current benchmarks often fail to capture the complexities of real-world reasoning. We design new evaluation methods and datasets to measure AI performance across logical, mathematical and common sense reasoning tasks.
Understanding failure modes in LLMs. Despite their impressive capabilities, LLMs struggle with consistency, logical coherence and robustness in reasoning-heavy tasks. We analyse these failure points and develop strategies to mitigate them.
Human-AI reasoning interactions. AI models increasingly assist humans in decision-making, but their reasoning processes are often opaque. We study how humans interpret and rely on AI-generated reasoning, aiming to make these interactions more transparent and trustworthy.
Dynamic evaluation frameworks. As AI systems evolve, traditional benchmarks quickly become obsolete. We explore new methodologies, such as dynamic benchmarking and adversarial testing, to 2 of 3 ensure AI reasoning is assessed in a rigorous and meaningful way.

We cultivate a collaborative research environment that encourages innovation and learning for postgraduate students. Through peer learning, code sharing, and hands-on training in data science and STEM fields, we accelerate their learning, helping them to acquire critical skills and engage with cutting-edge research in a dynamic, team-oriented setting.

Our impact

Our work is shaping the future of AI by ensuring that large language models are not only powerful but also trustworthy and understandable. Our research supports the development of AI systems that are capable of more accurate reasoning, fairer interactions, and greater utility in high-stakes environments such as healthcare.

At the same time, our commitment to fostering an inclusive and collaborative research environment ensures that the next generation of researchers is well-prepared to tackle the challenges of AI. By mentoring students and creating a space for knowledge-sharing and skill development, we are building a strong foundation for future AI research and innovation.

In the news

Oxford researchers win international AI competition

20 September 2024

A team of AI experts from the Oxford Internet Institute and the Mathematical Institute at the University of Oxford has won the prestigious 2024 George B. Moody PhysioNet Challenge.

Read now

Data science plays important role in helping to diagnose hypertension, shows new Oxford study

29 September 2022

An algorithm developed by Oxford Internet Institute's Dr Adam Mahdi, lies at the heart of research finding people with higher blood pressure overnight are at higher risk of stroke or heart failure.

Read now

Sebastian Petric

Visiting Policy Fellow

Sebastian Petric leverages cutting-edge technologies to identify, explain, and predict financial instabilities. His leadership is defined by harnessing data science and machine learning, not just to adapt to emerging trends, but to shape them.

View profile

Visting team members

Sebastian Petric

Our projects

Benchmarking Large Language Models for Self-Diagnosis

Our work investigates applications of large language models (LLMs) in healthcare settings, with a particular focus on interactions between LLMs and human users. The project focuses on LLMs for medical self-diagnosis.

View Project

A Multimodal COVID-19 Database for Research

The Oxford COVID-19 Project aims to increase our understanding of COVID-19 and elaborate possible strategies to reduce the impact on the society through the combined power of Statistical, Mathematical Modelling and Machine Learning techniques.

View Project

The challenge

Group lead

Dr Adam Mahdi

Our research

Our impact

In the news

Oxford researchers win international AI competition

Data science plays important role in helping to diagnose hypertension, shows new Oxford study

Our team

Dr Adam Mahdi

Felix Krones

Dr Djavan De Clercq

Yushi Yang

Harry Mayne

Jessica Rodrigues Da Silva

Dr Guy Parsons

Karolina Korgul

Ryan Kearns

Mattéo Larrodé

Sebastian Petric

Visting team members

Our projects

Benchmarking Large Language Models for Self-Diagnosis

A Multimodal COVID-19 Database for Research

Related Topics: