Skip down to main content

Reasoning with Machines AI Lab

interaction

Reasoning with Machines AI Lab

The challenge

Artificial intelligence (AI) and, in particular, large language models (LLMs), are revolutionising industries ranging from healthcare to customer service, but understanding their behaviour and ensuring their reliability and fairness is crucial.

At the Reasoning with Machines AI Lab we focus on benchmarking and evaluating AI systems, with a special emphasis on LLMs. We seek to answer critical questions such as: How can we accurately assess the capabilities of AI systems? How do these systems interact with humans? And how can we interpret their internal workings?

Our research

Driven by the need for transparency and trust in AI, the Reasoning with Machines AI Lab is advancing the benchmarking and evaluation of large language models to unlock their potential across various domains. Our approach combines human-computer interaction (HCI) with mechanistic interpretability. By both studying how humans engage with AI systems and exploring the inner mechanics of these models, we aim to develop a deep, nuanced understanding of their functioning and limitations. This dual focus allows us to

  • Evaluate the reasoning capabilities of AI models, ensuring that they align with human expectations and can perform tasks effectively
  • Investigate the practical applications of AI in domains like healthcare, where trust and transparency are paramount, and human-computer interaction, where user experience and model reliability are essential.

Our current research areas include:

  • Benchmarking AI reasoning abilities. Current benchmarks often fail to capture the complexities of real-world reasoning. We design new evaluation methods and datasets to measure AI performance across logical, mathematical and common sense reasoning tasks.
  • Understanding failure modes in LLMs. Despite their impressive capabilities, LLMs struggle with consistency, logical coherence and robustness in reasoning-heavy tasks. We analyse these failure points and develop strategies to mitigate them.
  • Human-AI reasoning interactions. AI models increasingly assist humans in decision-making, but their reasoning processes are often opaque. We study how humans interpret and rely on AI-generated reasoning, aiming to make these interactions more transparent and trustworthy.
  • Dynamic evaluation frameworks. As AI systems evolve, traditional benchmarks quickly become obsolete. We explore new methodologies, such as dynamic benchmarking and adversarial testing, to 2 of 3 ensure AI reasoning is assessed in a rigorous and meaningful way.

We cultivate a collaborative research environment that encourages innovation and learning for postgraduate students. Through peer learning, code sharing, and hands-on training in data science and STEM fields, we accelerate their learning, helping them to acquire critical skills and engage with cutting-edge research in a dynamic, team-oriented setting.

Our impact

Our work is shaping the future of AI by ensuring that large language models are not only powerful but also trustworthy and understandable. Our research supports the development of AI systems that are capable of more accurate reasoning, fairer interactions, and greater utility in high-stakes environments such as healthcare.

At the same time, our commitment to fostering an inclusive and collaborative research environment ensures that the next generation of researchers is well-prepared to tackle the challenges of AI. By mentoring students and creating a space for knowledge-sharing and skill development, we are building a strong foundation for future AI research and innovation.

Our team

Visting team members

Sebastian Petric

Visiting Policy Fellow

Related Topics: