OII | Do language models have an issue with gender?

Published on
9 Jun 2025

Written by
Franziska Sofia Hafner

The Oxford Internet Institute’s Franziska Sofia Hafner explores whether language models are perpetuating gender stereotypes.

Language models are trained on billions of sentences, with data sourced from human-generated content including feminist blogs, corporate DEI statements, gossip sites and men’s right’s Reddit threads. So, what does this mean for how gender is handled by AI?

The Oxford Internet Institute’s Franziska Sofia Hafner, along with her co-authors Dr Ana Valdivia, Departmental Research Lecturer in Artificial Intelligence, Government, and Policy, and Dr Luc Rocher, UKRI Future Leaders Fellow and senior research fellow, explores whether language models are perpetuating stereotypes.

‘What is a woman?’ Early language models answered such questions with a range of misogynistic stereotypes. Modern language models refuse to give any answer at all. While this shift suggests progress, it raises the question: If computer scientists remove the worst associations, so that women are not ‘dumb’, ‘too emotional’, or ‘so dramatic’, is the issue of gender in language models fixed?

This is the question my co-authors, Dr Ana Valdivia and Dr Luc Rocher, and I asked ourselves in our recent study.

Language models are trained on billions of sentences, such as ‘women are the future’ from a feminist blog, ‘women are more likely to experience chronic pain’ from a health website, or ‘women are underrepresented in leadership’ from a corporate diversity statement. However, language model’s training data also contains text from men’s-rights Reddit threads, Andrew Tate’s YouTube comments sections, and tabloids sharing the latest celebrity gossip.

From all this data, language models can learn that a sentence beginning ‘women are…’ is likely to continue with sexist stereotypes. This is not a computer bug; it is part of the core mechanism through which language models learn to generate text.

AI developers have compelling reasons to build models which do not spew out awful stereotypes. Most importantly, AI-generated texts full of harmful stereotypes might be offensive to chatbot users or reinforce their pre-existing biases. Developers also have a pragmatic interest in attempting to fix their model’s bias problem, as instances of such text sparking outrage online can seriously harm their company’s reputation.

To stop the worst associations from surfacing in generated text, researchers have developed many smart techniques to debias, align, or steer these models. While the models still learn that ‘women are manipulative’ is a statistically solid prediction, these techniques can teach models not to say the quiet part out loud. Fundamentally, their internal representations of gender are still based on some of the worst stereotypes the internet has to offer, but at first glance these remain invisible in users’ everyday interactions.

When models do form associations with transgender and gender diverse identities, these can be concerning. We found that they consistently pathologize such identities by associating them with mental illnesses.

In our recent work, we looked beyond the most overt sexist stereotypes to understand what concept of gender remains in language models. We ran experiments on 16 language models, including versions of GPT-2, Llama, and Mistral, and found that the concept of gender they learn is troubling. We found that all tested models learn a binary and essentialist concept of gender, and that these concepts become more ingrained as models get larger.

‘The person that has testosterone is…’, according to language models, most definitely ‘a man’. But as social scientists and biologists have long explained, the association between biological sex and gender is much more complex. A cis woman with polycystic ovary syndrome, a transgender woman, and an intersex person might all have elevated levels of testosterone, without this making them men. These complexities and nuances are not accounted for by language models.

‘The person that has testosterone is…’, in reality, also maybe ‘nonbinary’, ‘genderqueer’, or ‘genderfluid’. Language models such as Mistral and Llama, however, are frequently less likely to autocomplete a sentence with these terms than with completely random words such as ‘windshield’, or ‘pepperoni’.

GPT-2, for example, is more likely to complete the sentence ‘the person who is genderqueer has…’ with ‘post-traumatic stress’ than the sentence ‘the person who is a man has…’. In contrast, it is more likely to complete the sentence ‘the person who is a man has…’ with ‘coronavirus’, than the sentence ‘the person who is genderqueer has…’.

We found such patterns, associating transgender and gender diverse identity terms with mental rather than physical conditions, across 110 illness-related terms and 16 language models. In an age where many switch from Dr. Google to Dr. Chat Bot to enquire about their ailments, this risks spreading misleading health information to users who might already face barriers to accessing appropriate care.

While modern language models might have been successfully ‘fixed’ to not blatantly blurt out sexist responses, our work shows that these fixes still remain surface-level. The underlying concept of gender still is a binary and essentialist one that pathologizes diversions from the norm. In a world where questions such as ‘what is a woman?’ become increasingly politicized, we must advocate for models which encode a nuanced and inclusive vision of gender.

Read ‘Gender Trouble in Language Models: An Empirical Audit Guided by Gender Performativity Theory’ in full here. This research will be presented at the ACM Conference on Fairness, Accountability, and Transparency, taking place in Athens from June 23-26, 2025.

Author

Franziska Sofia Hafner

DPhil Student, Research Assistant

Sofia is a DPhil student in Social Data Science at the OII. Her research focuses on algorithmic fairness, machine learning, and interactive data visualisation.

View profile

Related People

Dr Ana Valdivia

Lecturer in AI, Government & Policy

Ana Valdivia is an interdisciplinary scholar interested in the sociotechnical aspects of AI. Her current research explores the environmental impacts of AI supply chains by combining computational and ethnographic methodologies.

View profile

Dr Luc Rocher

Senior Research Fellow, Associate Professor

Luc conducts human-centred computing research to understand how data and algorithms impact society. They work to make digital power visible to the public and guide the development of accountable, sustainable, and safe algorithms for all.

View profile

Author

Franziska Sofia Hafner

Related People

Dr Ana Valdivia

Dr Luc Rocher

Related Topics: