OII | New machine learning algorithm can predict age and gender from just your Twitter profile

Published on
16 May 2019

A new “demographic inference” tool developed by academics can make predictions based solely on the information in a person’s social media profile (i.e. screen name, biography, profile photo, and name)

A new “demographic inference” tool developed by academics can make predictions based solely on the information in a person’s social media profile (i.e. screen name, biography, profile photo, and name)
The tool—which works in 32 languages—could pave the way for views expressed on social media to be factored in to popular survey methods.

Researchers at the University of Oxford, University of Michigan, University of Massachusetts, GESIS – Leibniz Institute for the Social Sciences, the Max Planck Institute, and Stanford University have developed a method to infer information about a social media account owner based on the information disclosed in their Twitter profile information.

A new machine learning system —unveiled at the Web Conference in San Francisco this week—learned the patterns associated with different ages, genders, and between organizations and individuals from a dataset of over four million Twitter accounts in 32 languages. This information was then combined with estimated locations and re-weighted against census data to produce more accurate estimates of population in 1,101 statistical regions across the EU.

This could pave the way for a more representative understanding of people’s views on key societal issues and topics, based on what they post on social media and attributed to specific geographical locations and demographic groups.

Dr Scott Hale, Senior Research Fellow, Oxford Internet Institute, University of Oxford said: “Despite providing lots of data points, social media has long been an unreliable tool for understanding what issues are most important to a wider population given how people self-select into using any one platform.

“This first study of its kind performs demographic predictions about a social media account’s owner based purely on the account’s profile information in 32 languages and then re-weights the online sample to be more similar to an offline population.

“We see this as a significant step towards using social media to get a more accurate picture on the issues and topics that most interest the public and understanding which groups’ views are over- or under-represented.”

This information and data underpinning this research has been made available in an open source library and you can test the Twitter profile inference tool at http://www.euagendas.org/m3demo.

For more information or to request an interview, please contact Mark Malbas on 01865 287220 or email mark.malbas@oii.ox.ac.uk

Notes for editors

Publication to be released at WWW conference:

Zijian Wang, Scott Hale, David Ifeoluwa Adelani, Przemyslaw Grabowicz, Timo Hartman, Fabian Flöck, and David Jurgens. 2019. Demographic Inference and Representative Population Estimates from Multilingual Social Media Data. In The World Wide Web Conference (WWW ’19), Ling Liu and Ryen White (Eds.). ACM, New York, NY, USA, 2056-2067. DOI: https://doi.org/10.1145/3308558.3313684.

Related People

Dr Scott A. Hale

Associate Professor, Senior Research Fellow

Dr Scott A. Hale is an Associate Professor, Senior Research Fellow, and Turing Fellow. He develops and applies computer science techniques to the social sciences focusing on increasing equitable access to quality information.

View profile

Przemyslaw Grabowicz

Max Planck Institute for Software Systems

View profile

Dr Fabian Flöck

GESIS, Leibniz Institute for the Social Sciences

View profile

David Jurgens

University of Michigan

View profile

Notes for editors

Related People

Dr Scott A. Hale

Przemyslaw Grabowicz

Dr Fabian Flöck

David Jurgens

Related Topics: