Dr Paul Röttger
Departmental Lecturer
Paul works at the intersection of NLP and the social sciences. His research focuses on the safety and societal impacts of large language models. He is also an Associate Member of Nuffield College.
Several researchers and DPhil students from the Oxford Internet Institute (OII), University of Oxford, are set to attend the 14th annual International Conference on Learning Representations (ICLR) in Rio de Janeiro from April 23-27, 2026.
As one of the world’s fastest‑growing AI conferences, ICLR bringing together experts in deep learning, a type of AI that learns patterns from large amounts of data to make predictions or generate content. The conference covers deep learning research and its applications, from machine vision and computational biology to speech, text, gaming and robotics.
The OII delegation will contribute to these discussions through their participation in a series of presentations and workshops, highlighted below.
Featured OII research and events:
1.SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviours
Paper: SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviours
Presenters: Tiancheng Hu
Authors: Tiancheng Hu, Joachim Baumann, Lorenzo Lupo, Nigel Collier, Dirk Hovy,
Senior author: Paul Röttger
Session: Poster Session 4 on Friday, 3:15pm-5-45pm, in Pavilion 4
Summary:
This study looks at how well AI models can imitate human behaviour — a promising idea for social science, but only if the simulations are realistic. The researchers introduce SimBench, a large test to measure this, and find that even the best models still struggle. Performance improves with model size, but current training methods make models worse on questions where humans disagree, and models have particular difficulty representing some demographic groups.
More info: Find out more information and read the pre-print available online.
+++
2. LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations
Paper: LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations
Presenter: William Gitta Lugoloobi
Author: William Gitta Lugoloobi
Senior author: Chris Russell
Workshop: Latent & Implicit Thinking, Monday 27 April 2026, 09.00 – 17.30 (BRT), Room 101 A, Riocentro Convention Center.
Summary:
This study looks at how running a large AI model on every single question is wasteful, and shows that models often “know” in advance whether they’re likely to get something right — a signal that can be read from their internal activity. The researchers use this insight to automatically send each question to the model best suited to answer it, outperforming the strongest single model while cutting costs by up to 70%. They also find that the kinds of problems AI models struggle with are not the same as the ones humans find difficult.
More info: Find out more information and read the pre-print available online.
+++
3. Task-Specific Knowledge Distillation via Intermediate Probes
Paper: Task-Specific Knowledge Distillation via Intermediate Probes
Presenter: Ryan Brown
Author: Ryan Brown
Senior author: Chris Russell
Workshop: Latent & Implicit Thinking, Monday 27 April 2026, 09.00 to 17.30 (BRT), Room 101 A, Riocentro Convention Center.
Summary:
This study looks at how big AI models are used to train smaller ones, and why their answers can be unreliable on reasoning tasks. Instead of using the large model’s final outputs, the researchers train simple tools that read the model’s internal signals and use those predictions to teach the smaller model. This approach consistently boosts accuracy on reasoning tests, especially when there isn’t much training data. It also works without changing either model and adds only a small amount of extra processing.
More information: Find out more information and read the pre-print available online.
+++
4. A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behaviour
Paper: A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behaviour
Presenter: Justin Kang, UC Berkeley
Author: Harry Mayne
Senior author: Adam Mahdi
Session: The Trustworthy AI: Interpretability, Robustness, and Safety Across Modalities workshop, Monday April 27, 09.00 –16.05 (BRT), room 204 A/B.
Summary: This study asks whether we can trust AI models when they explain their own decisions. While many people assume the answer is no, the researchers find that these explanations are more reliable than expected. In general, they do give useful clues about how the model reached its answer, helping users understand its reasoning.
More info: Find out more information and read the pre-print available online.
+++
5. LINGOLY-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation
Paper: LINGOLY-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation
Presenters: Karolina Korgul, Ryan Kearns
Author: Jude Khouja* ⋅ Lingyi Yang ⋅ Karolina Korgul* ⋅ Simi Hellsten ⋅ Vlad A. Neacșu ⋅ Harry Mayne* ⋅ Ryan Kearns* ⋅ Andrew Bean* ⋅ Adam Mahdi*
*OII
Senior author: Adam Mahdi
Session: Pavilion 3 P3-#1509
Summary: Benchmarks are used to measure how well LLMs perform specific tasks. However, it is often hard to measure performance because parts of the benchmark have been seen by the model during its training and are therefore memorised. In this paper, we introduce LingOly-TOO, a reasoning benchmark specifically designed to avoid the influence of memorised information. LingOly-TOO offers a robust way to assess frontier language model performance at reasoning.
More info: Find out more information and read the pre-print available online. Website: https://oxrml.com/lingoly-too/
Meet OII researchers at ICLR 2026:
OII researchers will be available for interviews and commentary throughout the conference. Please contact the press office to pre-arrange times to speak to them.
Anthea Milnes, Head of Communications or Sara Spinks/ Veena McCoole, Media and Communications Manager
T: +44 (0)1865 280527
M: +44 (0)7551 345493
E: press@oii.ox.ac.uk