About
Harry is a second-year DPhil student at the OII. His research investigates the limitations of large language model alignment algorithms. By applying techniques from mechanistic interpretability and representation engineering, Harry explores how alignment algorithms steer model behaviour and the extent to which they fundamentally alter model capabilities. At the centre of his work, he is motivated by the challenge of creating AI systems that are customisable, explainable, and safe.
In addition to his primary research, Harry is interested in mapping the broader capabilities of language models and has recently worked on developing benchmarks for LLM reasoning.
Prior to his DPhil, Harry completed an MSc in Social Data Science at the OII, where he was awarded the prize for best overall thesis. He also holds a BA in Economics from the University of Cambridge.
Research Interests
Artificial Intelligence; Machine Learning; NLP; Mechanistic Interpretability; Representation Engineering; LLM Alignment; LLM Evaluations and Benchmarks.