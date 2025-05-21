Oxford researchers reveal how AI language models encode a flawed and binary understanding of gender, posing significant risks for transgender, nonbinary, and even cisgender individuals.

AI language models are developing a flawed understanding of gender, leading to stereotypical associations that could result in harmful discrimination, finds research from the Oxford Internet Institute at the University of Oxford.

The researchers warn that in healthcare, where AI is increasingly integrated into health technologies, these flawed assumptions, which are often based on a model’s conflation of gender and biological sex characteristics, could lead to inaccurate advice and misdiagnoses.

For example, an AI model that learns a rigid association between ‘woman’ and biological markers like ‘uterus’ or ‘estrogen’ could provide irrelevant or even harmful advice to a transgender woman. This narrow view could also misinterpret the needs of cisgender women whose health profiles differ from typical reproductive assumptions, such as those who are postmenopausal or have undergone a hysterectomy, say the researchers.

The study is the first to develop a robust framework to examine how gender is constructed in 16 AI language models. It reveals their fundamental limitations in understanding gender, often defaulting to a restrictive, biologically tied, and binary view. These limitations have broad implications for both cisgender heterosexual people and the LGBTQIA+ community.

The research has been accepted for publication at the ACM Conference on Fairness, Accountability, and Transparency (FAccT).

Key findings:

Language models make problematic gender–illness connections: Across 110 illnesses evaluated, models tend to create problematic associations when given different gender identity labels. For example, many models systematically associate physical illnesses with men and mental illnesses with trans and gender-diverse identities, and to a lesser degree with women too. Some models associate physical illnesses, such as ‘coronavirus’ or ‘parasitic worm infections,’ as unlikely for trans and gender-diverse identities. This raises concerns about ‘diagnostic overshadowing,’ where models might incorrectly flag physical health issues as mental health concerns for these individuals.

Across 110 illnesses evaluated, models tend to create problematic associations when given different gender identity labels. For example, many models systematically associate physical illnesses with men and mental illnesses with trans and gender-diverse identities, and to a lesser degree with women too. Some models associate physical illnesses, such as ‘coronavirus’ or ‘parasitic worm infections,’ as unlikely for trans and gender-diverse identities. This raises concerns about ‘diagnostic overshadowing,’ where models might incorrectly flag physical health issues as mental health concerns for these individuals. Language models encode a binary, biologically tied view of gender: Language models predominantly define gender in rigid male/female terms and directly link it to biological sex characteristics. This reflects stereotypes prevalent in internet training data, rather than the diversity of lived human experiences.

Language models predominantly define gender in rigid male/female terms and directly link it to biological sex characteristics. This reflects stereotypes prevalent in internet training data, rather than the diversity of lived human experiences. Trans and nonbinary identities are often erased or misrecognised: Language models rarely choose terms like ‘nonbinary’ or ‘transgender’ when predicting gender – they mostly choose ‘man’ or ‘woman’. Some models treat terms like ‘nonbinary’ or ‘genderqueer’ as less likely than non-human objects like ‘windscreen’, suggesting a fundamental failure to recognise these as valid human identities.

Language models rarely choose terms like ‘nonbinary’ or ‘transgender’ when predicting gender – they mostly choose ‘man’ or ‘woman’. Some models treat terms like ‘nonbinary’ or ‘genderqueer’ as less likely than non-human objects like ‘windscreen’, suggesting a fundamental failure to recognise these as valid human identities. Model size amplifies bias: Contrary to some expectations, the study found that larger, more powerful models often learn stronger and more rigid associations between gender and sex characteristics.

Lead author, Franziska Sofia Hafner, Researcher at the Oxford Internet Institute, said: “If language models are going to be used in healthcare, either built into diagnostics to help doctors make decisions or as self-help tools for individuals, their limited and biased understanding of gender could introduce significant discriminatory harm.”

In their study, the researchers evaluated associations between gendered and sexed words, as well as associations between gendered words and physical or mental illnesses. They tested 16 language models based on GPT, RoBERTa, T5, Llama, and Mistral.

Language models are known to perpetuate stereotypes present in their training data, and developers typically respond by auditing for bias and applying filters. This study highlights deeper issues in how models internalise and reproduce social norms and stereotypes based on language.

Co-author, Dr Ana Valdivia, Lecturer in AI, Government and Policy at the Oxford Internet Institute, said: “Our academic community is aware of the social biases reproduced by algorithmic models. With the emergence of a new generation of AI systems, such as language models, these biases have not been mitigated; rather, they continue to amplify stereotypical representations. We advocate for stronger accountability mechanisms.”

Co-author, Dr Luc Rocher, Senior Research Fellow at the Oxford Internet Institute, said: “Our findings reveal a troubling trend where larger models, despite performing better on many benchmarks, actually encode a more rigid and essentialising view of gender. This challenges the notion that simply scaling up AI will lead to more nuanced or fair outcomes. Instead, these fundamental biases risk becoming more deeply ingrained.

“Current AI models are largely learning gender from the Internet, and the results are predictably problematic. Fixing AI’s gender problem is not just about tweaking algorithms. We need a concerted approach, from curating better training datasets to building standards and robust public oversight, to ensure these new tools stop amplifying old prejudices.”

The study, ‘Gender trouble in language models: an empirical audit guided by gender performativity theory’ by Franziska Sofia Hafner, Ana Valdivia, and Luc Rocher of the Oxford Internet Institute, will be available as a postprint on arXiv from 21 May. It will be formally published as part of the ACM Fairness, Accountability, and Transparency (FAccT) peer-reviewed conference proceedings. The conference will be held from 23-26 June in Athens, Greece.

