Computer scientists from Oxford University, MIT, the University of Michigan, and Meedan have joined forces to develop an algorithm to help fight misinformation being shared through private messaging apps like WhatsApp.

The algorithm is the first of its kind to be able to group social media messages making similar fact-checkable claims across a range of languages. By doing so, the algorithm can help fact-checkers reach a wider audience and help users understand if the message they have received via their messaging app has been fact checked. For example, the algorithm can find messages with similar claims as shown in the table below.

The algorithm is already being used by fact-checking organisations in India where it has been found to be highly effective at identifying similar messages. This is helping fact-checkers understand the true prevalence of claims on social media as well as discover new variations of claims they have already fact checked. Unlike a simple text search, the algorithm looks at the semantic meaning of content and can group items with the same meanings, even when the words used are very different or the items are in different languages.

Currently private messaging apps like WhatsApp have very limited interventions in terms of labelling and identifying misinformation, with only the most shared information containing a “forwarded many times” label.  This new algorithm paves the way for fact-checking tools or apps that can enable people to check the validity of messages received on WhatsApp and other platforms.

The approach by developed by computer scientists from academia and industry outlined in the paper, ‘Claim Matching Beyond English to Scale Global Fact-Checking’, is being used to help fact-checkers in India run misinformation tiplines on WhatsApp. Tiplines are accounts operated by fact-checkers on WhatsApp enabling users to forward potential misleading content to them. When someone submits a message, the algorithm developed is used to help find any existing matches. If a match exists, the user immediately receives a response. If a match does not exist, the message is enqueued into Meedan’s open-source fact-checking workflow software, Check. In that case, the algorithm is used to help fact-checkers understand how popular any message is by grouping messages with similar claims.  The team of academics responsible for developing this new approach have worked with Meedan to extend the approach to other languages, and fact-checking organizations beyond India are set to start using it from June 2021 onwards.

Co-author of the paper, Dr Scott A. Hale, Associate Professor and Senior Research Fellow, Oxford Internet Institute, said: “While researchers often want to focus on the fully-automated solutions to replace human fact-checkers, it is very rewarding to develop new approaches that can empower fact-checkers and ensure their work is more discoverable – especially at this critical time as the world wrestles with COVID-19 misinformation.”

The study is peer-reviewed and accepted for publication at the Annual Meeting of the Association for Computational Linguistics (ACL-IJCNLP 2021). A preprint of the article is openly available at https://arxiv.org/pdf/2106.00853.pdf