Researchers from the Oxford Internet Institute at the University of Oxford will be at NeurIPS 2025 in San Diego from 1- 7 December, 2025. They will take part in presentations, poster sessions and workshops looking at how AI systems are measured and compared, how fairness can be improved in generative models, and how multilingual and community-centred perspectives can strengthen the design and oversight of AI.

Featured OII research and events:
1.Measuring what matters
Paper: Measuring What Matters: Construct Validity in Large Language Model Benchmarks
Presenters: Ryan Kearns, Adam Mahdi, Franziska Sofia Hafner
Lead authors: Andrew M. Bean, Ryan Kearns, Angelika Romanou, Franziska Sofia Hafner, Harry Mayne
Senior authors: Adam Mahdi, Luc Rocher
Session: Thu 4 Dec 11am – 2pm PST, Exhibit Hall C, D, E #107
Summary:
This study examines the scientific robustness of 445 AI benchmarks, the standardised evaluations used to compare and rank AI systems. The researchers found that many of these benchmarks are built on unclear definitions or weak analytical methods, making it difficult to draw reliable conclusions about AI progress, capabilities or safety. The article makes recommendations for better benchmarking.
More info: See our previously distributed press release for further information: Study identifies weaknesses in how AI systems are evaluated
2.FairImagen: Bias Mitigation in Text-to-Image Models
Paper: FairImagen: Post-Processing for Bias Mitigation in Text-to-Image Models
Authors: Zihao Fu, Ryan Brown, Shun Shao, Kai Rawal, Eoin Delaney, Chris Russell
Poster session 4: Thu 4 Dec 4:30 – 7.30pm PST, Exhibit Hall C, D, E #1208
Summary:
FairImagen introduces a post-processing technique that reduces demographic bias in image generation systems without requiring model retraining. It improves fairness across gender and race while maintaining realism and contextual accuracy. Read the preprint.
3. Evaluating LLM-as-a-Judge under Multilingual, Multimodal Settings
Paper: Evaluating LLM-as-a-Judge under Multilingual, Multimodal Settings
Authors: Shreyansh Padarha, Elizaveta Semenova, Bertie Vidgen, Adam Mahdi, Scott A. Hale
Workshop: Evaluating the Evolving LLM Lifecycle, 8am – 5pm, 7 December, Upper Level Room 2
Summary:
This paper introduces PolyVis, a new benchmark to assess “judge” models — large language models used to evaluate the performance of other models — across 12 languages and multimodal tasks combining vision and language. The findings reveal that LLM-judge performance depends not just on scale or data but on complex interactions between language, modality, and task type. The authors argue for tailored, context-aware evaluation frameworks to capture where models succeed or fail.
Find out more information and read the paper.
4. Queer in AI Poster Session and Workshop
Poster session: 6pm – 9pm, 2 December, Hall C
Workshop: 9am – 5pm, 4 December, Room Upper 32AB
Convenor: Franziska Sofia Hafner
Summary:
The Queer in AI workshop is dedicated to advancing equitable AI practices and contributing to a future where technology serves the needs of all communities. The workshop is committed to advocating for a more ethically grounded approach to AI development, ensuring that diverse voices are heard and valued in conversations about the future of technology. Find out more information about the workshop.
Meet OII researchers at NeurIPS
OII researchers will be available for interviews and commentary throughout the conference. Please contact the press office to pre-arrange times to speak to them.
Anthea Milnes, Head of Communications or Sara Spinks/ Veena McCoole , Media and Communications Manager
T: +44 (0)1865 280527
M: +44 (0)7551 345493
E: press@oii.ox.ac.uk