Skip down to main content

Do language models have an issue with gender?

Asian child and chat gpt

Do language models have an issue with gender?

Published on
9 Jun 2025
Written by
Franziska Sofia Hafner
The Oxford Internet Institute’s Franziska Sofia Hafner explores whether language models are perpetuating gender stereotypes.

Language models are trained on billions of sentences, with data sourced from human-generated content including feminist blogs, corporate DEI statements, gossip sites and men’s right’s Reddit threads. So, what does this mean for how gender is handled by AI? 

The Oxford Internet Institute’s Franziska Sofia Hafner, along with her co-authors Dr Ana Valdivia, Departmental Research Lecturer in Artificial Intelligence, Government, and Policy, and Dr Luc Rocher, UKRI Future Leaders Fellow and senior research fellow, explores whether language models are perpetuating stereotypes. 

‘What is a woman?’ Early language models answered such questions with a range of misogynistic stereotypes. Modern language models refuse to give any answer at all. While this shift suggests progress, it raises the question: If computer scientists remove the worst associations, so that women are not ‘dumb’, ‘too emotional’, or ‘so dramatic’, is the issue of gender in language models fixed?

This is the question my co-authors, Dr Ana Valdivia and Dr Luc Rocher, and I asked ourselves in our recent study.

Language models are trained on billions of sentences, such as ‘women are the future’ from a feminist blog, ‘women are more likely to experience chronic pain’ from a health website, or ‘women are underrepresented in leadership’ from a corporate diversity statement. However, language model’s training data also contains text from men’s-rights Reddit threads, Andrew Tate’s YouTube comments sections, and tabloids sharing the latest celebrity gossip.

From all this data, language models can learn that a sentence beginning ‘women are…’  is likely to continue with sexist stereotypes. This is not a computer bug; it is part of the core mechanism through which language models learn to generate text.

AI developers have compelling reasons to build models which do not spew out awful stereotypes. Most importantly, AI-generated texts full of harmful stereotypes might be offensive to chatbot users or reinforce their pre-existing biases. Developers also have a pragmatic interest in attempting to fix their model’s bias problem, as instances of such text sparking outrage online can seriously harm their company’s reputation.

To stop the worst associations from surfacing in generated text, researchers have developed many smart techniques to debias, align, or steer these models. While the models still learn that ‘women are manipulative’ is a statistically solid prediction, these techniques can teach models not to say the quiet part out loud. Fundamentally, their internal representations of gender are still based on some of the worst stereotypes the internet has to offer, but at first glance these remain invisible in users’ everyday interactions.

When models do form associations with transgender and gender diverse identities, these can be concerning. We found that they consistently pathologize such identities by associating them with mental illnesses.

In our recent work, we looked beyond the most overt sexist stereotypes to understand what concept of gender remains in language models. We ran experiments on 16 language models, including versions of GPT-2, Llama, and Mistral, and found that the concept of gender they learn is troubling. We found that all tested models learn a binary and essentialist concept of gender, and that these concepts become more ingrained as models get larger.

‘The person that has testosterone is…’, according to language models, most definitely ‘a man’. But as social scientists and biologists have long explained, the association between biological sex and gender is much more complex. A cis woman with polycystic ovary syndrome, a transgender woman, and an intersex person might all have elevated levels of testosterone, without this making them men. These complexities and nuances are not accounted for by language models.

‘The person that has testosterone is…’, in reality, also maybe ‘nonbinary’, ‘genderqueer’, or ‘genderfluid’. Language models such as Mistral and Llama, however, are frequently less likely to autocomplete a sentence with these terms than with completely random words such as ‘windshield’, or ‘pepperoni’.

When models do form associations with transgender and gender diverse identities, these can be concerning. We found that they consistently pathologize such identities by associating them with mental illnesses.

While modern language models might have been successfully ‘fixed’ to not blatantly blurt out sexist responses, our work shows that these fixes still remain surface-level. The underlying concept of gender still is a binary and essentialist one that pathologizes diversions from the norm.

GPT-2, for example, is more likely to complete the sentence ‘the person who is genderqueer has…’ with ‘post-traumatic stress’ than the sentence ‘the person who is a man has…’. In contrast, it is more likely to complete the sentence ‘the person who is a man has…’ with ‘coronavirus’, than the sentence ‘the person who is genderqueer has…’.

We found such patterns, associating transgender and gender diverse identity terms with mental rather than physical conditions, across 110 illness-related terms and 16 language models. In an age where many switch from Dr. Google to Dr. Chat Bot to enquire about their ailments, this risks spreading misleading health information to users who might already face barriers to accessing appropriate care.

While modern language models might have been successfully ‘fixed’ to not blatantly blurt out sexist responses, our work shows that these fixes still remain surface-level. The underlying concept of gender still is a binary and essentialist one that pathologizes diversions from the norm. In a world where questions such as ‘what is a woman?’ become increasingly politicized, we must advocate for models which encode a nuanced and inclusive vision of gender.

Read ‘Gender Trouble in Language Models: An Empirical Audit Guided by Gender Performativity Theory’ in full here. This research will be presented at the ACM Conference on Fairness, Accountability, and Transparency, taking place in Athens from June 23-26, 2025. 

Related Topics:

Privacy Overview
Oxford Internet Institute

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies
  • moove_gdrp_popup -  a cookie that saves your preferences for cookie settings. Without this cookie, the screen offering you cookie options will appear on every page you visit.

This cookie remains on your computer for 365 days, but you can adjust your preferences at any time by clicking on the "Cookie settings" link in the website footer.

Please note that if you visit the Oxford University website, any cookies you accept there will appear on our site here too, this being a subdomain. To control them, you must change your cookie preferences on the main University website.

Google Analytics

This website uses Google Tags and Google Analytics to collect anonymised information such as the number of visitors to the site, and the most popular pages. Keeping these cookies enabled helps the OII improve our website.

Enabling this option will allow cookies from:

  • Google Analytics - tracking visits to the ox.ac.uk and oii.ox.ac.uk domains

These cookies will remain on your website for 365 days, but you can edit your cookie preferences at any time via the "Cookie Settings" button in the website footer.