Skip down to main content

How we can better align Large Language Models with diverse humans

How we can better align Large Language Models with diverse humans

Published on
12 Mar 2024
Written by
Hannah Rose Kirk and Scott A. Hale
Oxford Internet Institute researchers are studying how to make Large Language Models (LLMs) more diverse by broadening the human feedback that informs them.

Oxford Internet Institute researchers are studying how to make Large Language Models (LLMs) more diverse by broadening the human feedback that informs them.

LLMs are now widely used, supporting search engines online, by retailers in customer service, in education and the workplace. Given the widespread use is set to accelerate even further, it’s important that the LLMs are not biased towards representing one group’s worldviews at the expense of others.

The researchers reviewed 95 academic papers and found that human feedback used to tailor the behaviour of LLMs traditionally comes from small groups of people who don’t necessarily represent larger populations.

Consequently, researcher Hannah Rose Kirk, Oxford Internet Institute, says, LLMs become tailored to the preferences and values of “an incredibly narrow subset of the population that ends up using those models.”

That can mean, for example, that if you ask a language model to help plan a wedding, you’re more likely to get information about a stereotypical Western wedding — big white dress, roses and the like (that finding was part of a 2021 study from OpenAI).

“Imagine a model that could learn a bit more from an individual or sociocultural context and adapt its assistance to helping you plan your wedding, and at least to know that weddings look different for different people, and it should ask a follow-up question to work out which path to go down,” says Kirk.

As part of the PRISM alignment project, Kirk and Dr Scott A. Hale, Associate Professor and Senior Research Fellow, Oxford Internet Institute, surveyed 1,500 people from 75 countries about how often they use generative language models and what behaviours from those models — such as reflecting their values or being factual and honest — are important to them. The same respondents then had a series of live conversations with the models and rated their responses, giving feedback on aspects of the dialogue they did and did not like.

The findings from these conversations are now being used to fine-tune language models to make them more diverse and representative with support from Microsoft’s Accelerate Foundation Models Research (AFMR). The researchers will make the PRISM Alignment dataset available to others studying this area and hope the work will encourage more inclusive language models.

“People don’t want to feel like this technology was made for others and not for me,” Hale says. “In terms of the utility that someone can receive from it — if they can use it as a conversational agent to reason through something or help make a decision, to look something up or access content — that benefit should be equally distributed across society.

“Having more people at the table, having more perspectives represented, ultimately leads to better technology, and it leads to technology that will raise everyone up.” he says.

Find out more about the PRISM project at the Oxford Internet Institute.

Find out more about the work of Oxford researchers, DPhil student Hannah Rose Kirk and Dr Scott A. Hale, Associate Professor and Senior Research Fellow, Oxford Internet Institute.

Find out more about Microsoft’s Advancing Foundation Models Research (AFMR) grant program which supports a range of AI related projects from astronomy to education.

Related Topics