Hannah is a 4th year DPhil student in Social Data Science at the OII and a Research Scientist at the UK AI Security Institute.
Her research explores how to align AI systems with the values and preferences of diverse human populations, as well as the threats to human autonomy from the social and psychological capabilities of frontier AI.
Her body of published work spans computational linguistics, computer vision, ethics and sociology, addressing a broad range of issues such as AI safety and security, alignment, bias, fairness, and hate speech from a multidisciplinary perspective. In the past year, her contributions earned a Best Paper Award at NeurIPS 2024 for research on human preference alignment and an Outstanding Paper Award at ACL 2024 for co-authored work on political bias in AI systems.
Hannah holds degrees from the University of Oxford (MSc, Distinction), the University of Cambridge (BA, Double First Class Honours) and Peking University (MA).
Alongside academia, she collaborates with industry projects at Google, OpenAI and Meta AI, and previously worked as a Data Scientist in the online safety team at The Alan Turing Institute.
Artificial Intelligence; Machine Learning; NLP; Active Learning; Adversarial Learning; Online Harms; Hate Speech
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models (Published in Advances in Neural Information Processing Systems (NeurIPS 2024). Received a Best Paper Award.
20 June 2025
OII researchers are set to attend the Association of Computing Machinery (ACM) Conference on Fairness, Accountability and Transparency (FAccT) 2025.
6 December 2024
Several researchers and DPhil students from the Oxford Internet Institute, University of Oxford, will head to Vancouver for the Thirty-Eighth annual Conference on Neural Information Processing Systems (NeurIPS) from 10-15 December 2024.
23 April 2024
Personalisation has the potential to democratise who decides how LLMs behave, but comes with risks for individuals and society, say Oxford researchers.
12 March 2024
Oxford Internet Institute researchers are studying how to make Large Language Models (LLMs) more diverse by broadening the human feedback that informs them.
Washington Post, 31 May 2025
Tactics used to make AI tools more engaging might be reinforcing harmful ideas.
Sky News, 03 December 2021
Spotify removes nearly 150 hours of content that it said violated its hate policy after Sky News reported it.
Sky News, 13 August 2021
Harmful posts can end up being missed altogether while acceptable posts are mislabelled as offensive, according to the Oxford Internet Institute. frontpage-pressSky NewsHannah Rose Kirk|Dr Scott A. Hale|Bertram Vidgen|Paul Röttger