Skip down to main content

How we can better align Large Language Models with diverse humans

How we can better align Large Language Models with diverse humans

Published on
12 Mar 2024
Written by
Hannah Rose Kirk and Scott A. Hale
Oxford Internet Institute researchers are studying how to make Large Language Models (LLMs) more diverse by broadening the human feedback that informs them.

Oxford Internet Institute researchers are studying how to make Large Language Models (LLMs) more diverse by broadening the human feedback that informs them.

LLMs are now widely used, supporting search engines online, by retailers in customer service, in education and the workplace. Given the widespread use is set to accelerate even further, it’s important that the LLMs are not biased towards representing one group’s worldviews at the expense of others.

The researchers reviewed 95 academic papers and found that human feedback used to tailor the behaviour of LLMs traditionally comes from small groups of people who don’t necessarily represent larger populations.

That can mean, for example, that if you ask a language model to help plan a wedding, you’re more likely to get information about a stereotypical Western wedding — big white dress, roses and the like (that finding was part of a 2021 study from OpenAI).

“Imagine a model that could learn a bit more from an individual or sociocultural context and adapt its assistance to helping you plan your wedding, and at least to know that weddings look different for different people, and it should ask a follow-up question to work out which path to go down,” says Kirk.

To diversify who decides how LLMs behave, DPhil Student Hannah Rose Kirk and supervisors Dr Scott A. Hale, and Dr Bertie Vidgen, launched the PRISM alignment project — a new resource for the AI community to understand how humans perceive and interact with LLMs. As part of this project, the researchers have surveyed 1,500 people from 75 countries about how often they use generative language models and what behaviours from those models — such as reflecting their values or being factual and honest — are important to them. The same participants then had a series of live conversations with over 20 different models and rated their outputs, giving feedback on aspects of the dialogue they did and did not like. Overall, the researchers collected over 8,000 conversations, amassing more than 68,000 AI outputs scored by diverse humans from around the world.

 

Flow chart showing participants rating LLMs

 

Using this dataset, the researchers find evidence that it does matter which humans are asked for feedback over LLM behaviours. For example, they find that different LLMs have different ranks when including only participants from the US versus participants from Europe. They also find the content of conversation matters: some LLMs perform relatively more poorly when questioned about controversial or value-centric topics, versus when prompted on more neutral or professional tasks.

Commenting on the results, DPhil researcher and lead author Hannah Rose Kirk, Oxford Internet Institute, says “Our findings suggest it matters who and how many people are in the seat of power when collecting alignment datasets like ours. As LLMs scale across more diverse populations of users, we hope that PRISM can promote a science of human feedback learning which is robust, transparent and inclusive.”

Bertie Vidgen adds “PRISM shows that you shouldn’t ask ‘Is your model aligned’ but instead ‘Who has it been aligned to?’ and ‘How has it been aligned?’ This exciting work has huge implications for how we think about the values, safety, and design of models.”

The results from these conversations are now being used to fine-tune language models to make them more diverse and representative with support from Microsoft’s Accelerate Foundation Models Research (AFMR). The researchers make the PRISM Alignment dataset available to others studying this area and hope the work encourages the development of more inclusive language models.

“People don’t want to feel like this technology was made for others and not for me,” Hale says. “In terms of the utility that someone can receive from it — if they can use it as a conversational agent to reason through something or help make a decision, to look something up or access content — that benefit should be equally distributed across society.”

“Having more people at the table, having more perspectives represented, ultimately leads to better technology, and it leads to technology that will raise everyone up.” he says.

A preprint describing the dataset and findings is now available at: [2404.16019] The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

The dataset can be downloaded at: https://doi.org/10.57967/hf/2113

PRISM was supported by a variety of funders, including Microsoft’s Advancing Foundation Models Research (AFMR) grant program which supports a range of AI related projects from astronomy to education; and MetaAI’s Dynabench Grants for optimising human-and-model-in-the-loop feedback. A full acknowledgement and disclosure of funding statement can be found in the paper.

Find out more about the PRISM project at the Oxford Internet Institute.

Find out more about the work of Oxford researchers, DPhil student Hannah Rose Kirk and Dr Scott A. Hale, Associate Professor and Senior Research Fellow, Oxford Internet Institute.

Find out more about Microsoft’s Advancing Foundation Models Research (AFMR) grant program which supports a range of AI related projects from astronomy to education.

Related Topics