Researchers from the Oxford Internet Institute have contributed to research led by The Alan Turing Institute’s Hate Speech: Measures & Counter-measures project to create a tool that uses deep learning to detect East Asian prejudice on social media [1]. The tool is available open source, along with the training dataset and annotation codebook. It can be used immediately for research into the prevalence, causes and dynamics of East Asian prejudice online and could help with moderating such content. You can find the paper describing the methodology and results on arXiv.

COVID-19 has not only inflicted massive health costs, it has also amplified myriad social hazards, from online grooming to gambling addiction and domestic abuse. In particular, the United Nationals High Commissioner for Human Rights has warned that the pandemic may drive more discrimination, calling for nations to combat all forms of prejudice. Anecdotal evidence shows numerous instances of physical attacks against East Asians and other forms of abuse. Researchers at Cardiff University’s HateLab have already identified three types of online hate that are rising during COVID-19: anti-Asian prejudice, anti-Semitism and Islamophobia. Worryingly, social media platforms may not have the right processes and infrastructure in place to adequately safeguard against the increased risk of online harm, potentially making online spaces deeply unpleasant and even dangerous.

Initial evidence suggests that there has been an uptick in the amount of prejudice against East Asia. Research from the iDrama lab shows a substantial increase in Sinophobic language on niche social media platforms, such as 4chan. Moonshot CVE analysed more than 600 million tweets and found that 200,000 contained either Sinophobic hate speech or conspiracy theories, and identified a 300% increase in hashtags that support or encourage violence against China during a single week in March 2020. East Asian prejudice has also been linked to the spread of COVID-19 health-related misinformation. In March 2020, the polling company YouGov found that 1 in 5 Brits believed the conspiracy theory that the coronavirus was developed in a Chinese lab.

To fully understand the spread of online harms and to develop appropriate counter-measures we need robust and scalable ways of measuring them. We have developed a tool that uses machine learning to distinguish between content that expresses Hostility against East Asia, Criticism of East Asia, Discussion of East Asia Prejudice or none of these (Neutral). The classifier achieves an F1 score of 0.83 across all four classes and can be deployed immediately. For a full overview of how it was created and to access the final model and training dataset, please see our pre-print and follow the links to the data repository. Austin Botelho, Masters Student on the OII’s Social Data Science course, implemented the machine learning used for the tool, which is based on a state-of-the-art contextual word embeddings model.

Social media is one of the most important battlegrounds in the fight against social hazards during COVID-19. As life moves increasingly online, it is crucial that social media platforms and other online spaces remain safe, accessible and free from abuse—and that people’s fears and distress during this time are not exploited and social tensions stirred up. We hope this new tool and dataset enable better understanding of East Asian prejudice and support the creation of effective counter-measures.

[1] This work was funded through the Hate Speech: Measures & Counter-Measures project in the Criminal Justice Theme of The Alan Turing Institute under Wave 1 of The UKRI Strategic Priorities Fund, EPSRC Grant EP/T001569/1.

It was a collaboration between researchers from The Alan Turing Institute, the Oxford Internet Institute, the George Washington University, the University of Surrey and the University of Sheffield. The full list of authors is provided in the paper.

If you have any questions or comments, please contact the lead author, and OII visiting researcher, Dr. Bertie Vidgen

Authors; Bertie Vidgen, Austin Botelho, David Broniatowski, Ella Guest, Matthew Hall, Helen Margetts, Rebekah Tromble, Zeerak Waseem, Scott Hale