Skip down to main content

Detecting East Asian Prejudice on Social Media

Globe China

Detecting East Asian Prejudice on Social Media

Globe China Globe China
Published on
28 May 2020
Written by
Bertram Vidgen

Researchers from the Oxford Internet Institute have contributed to research led by The Alan Turing Institute’s Hate Speech: Measures & Counter-measures project to create a tool that uses deep learning to detect East Asian prejudice on social media [1]. The tool is available open source, along with the training dataset and annotation codebook. It can be used immediately for research into the prevalence, causes and dynamics of East Asian prejudice online and could help with moderating such content. You can find the paper describing the methodology and results on arXiv.

COVID-19 has not only inflicted massive health costs, it has also amplified myriad social hazards, from online grooming to gambling addiction and domestic abuse. In particular, the United Nationals High Commissioner for Human Rights has warned that the pandemic may drive more discrimination, calling for nations to combat all forms of prejudice. Anecdotal evidence shows numerous instances of physical attacks against East Asians and other forms of abuse. Researchers at Cardiff University’s HateLab have already identified three types of online hate that are rising during COVID-19: anti-Asian prejudice, anti-Semitism and Islamophobia. Worryingly, social media platforms may not have the right processes and infrastructure in place to adequately safeguard against the increased risk of online harm, potentially making online spaces deeply unpleasant and even dangerous.

Initial evidence suggests that there has been an uptick in the amount of prejudice against East Asia. Research from the iDrama lab shows a substantial increase in Sinophobic language on niche social media platforms, such as 4chan. Moonshot CVE analysed more than 600 million tweets and found that 200,000 contained either Sinophobic hate speech or conspiracy theories, and identified a 300% increase in hashtags that support or encourage violence against China during a single week in March 2020. East Asian prejudice has also been linked to the spread of COVID-19 health-related misinformation. In March 2020, the polling company YouGov found that 1 in 5 Brits believed the conspiracy theory that the coronavirus was developed in a Chinese lab.

To fully understand the spread of online harms and to develop appropriate counter-measures we need robust and scalable ways of measuring them. We have developed a tool that uses machine learning to distinguish between content that expresses Hostility against East Asia, Criticism of East Asia, Discussion of East Asia Prejudice or none of these (Neutral). The classifier achieves an F1 score of 0.83 across all four classes and can be deployed immediately. For a full overview of how it was created and to access the final model and training dataset, please see our pre-print and follow the links to the data repository. Austin Botelho, Masters Student on the OII’s Social Data Science course, implemented the machine learning used for the tool, which is based on a state-of-the-art contextual word embeddings model.

Social media is one of the most important battlegrounds in the fight against social hazards during COVID-19. As life moves increasingly online, it is crucial that social media platforms and other online spaces remain safe, accessible and free from abuse—and that people’s fears and distress during this time are not exploited and social tensions stirred up. We hope this new tool and dataset enable better understanding of East Asian prejudice and support the creation of effective counter-measures.

[1] This work was funded through the Hate Speech: Measures & Counter-Measures project in the Criminal Justice Theme of The Alan Turing Institute under Wave 1 of The UKRI Strategic Priorities Fund, EPSRC Grant EP/T001569/1.

It was a collaboration between researchers from The Alan Turing Institute, the Oxford Internet Institute, the George Washington University, the University of Surrey and the University of Sheffield. The full list of authors is provided in the paper.

If you have any questions or comments, please contact the lead author, and OII visiting researcher, Dr. Bertie Vidgen

Authors; Bertie Vidgen, Austin Botelho, David Broniatowski, Ella Guest, Matthew Hall, Helen Margetts, Rebekah Tromble, Zeerak Waseem, Scott Hale

Related Topics:

Privacy Overview
Oxford Internet Institute

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies
  • moove_gdrp_popup -  a cookie that saves your preferences for cookie settings. Without this cookie, the screen offering you cookie options will appear on every page you visit.

This cookie remains on your computer for 365 days, but you can adjust your preferences at any time by clicking on the "Cookie settings" link in the website footer.

Please note that if you visit the Oxford University website, any cookies you accept there will appear on our site here too, this being a subdomain. To control them, you must change your cookie preferences on the main University website.

Google Analytics

This website uses Google Tags and Google Analytics to collect anonymised information such as the number of visitors to the site, and the most popular pages. Keeping these cookies enabled helps the OII improve our website.

Enabling this option will allow cookies from:

  • Google Analytics - tracking visits to the ox.ac.uk and oii.ox.ac.uk domains

These cookies will remain on your website for 365 days, but you can edit your cookie preferences at any time via the "Cookie Settings" button in the website footer.