Dr Taha Yasseri
Former Senior Research Fellow
Taha Yasseri analyses large-scale transactional data to understand human dynamics, collective behaviour, collective intelligence and machine intelligence.
Researchers at The Alan Turing Institute (including members of the Oxford Internet Institute) have conducted the first systematic study of Urban Dictionary (UD), the informal, crowd-sourced online dictionary best known for slang and niche definitions.
In a paper published today in the Royal Society journal Open Science, the Turing’s Dong Nguyen, Barbara McGillivray and Taha Yasseri attempt to characterise UD’s content, including how opinionated and offensive its entries are. They study a complete snapshot of the website from its inception as a parody of Dictionary.com in 1999, through to 2016.
The promise of the ‘wisdom of the crowd’ has inspired successful projects such as Wikipedia, which has become the primary source of crowd-based information in many languages. Yet the decentralized and often un-monitored environment of such projects leave them susceptible to low-quality content, edit wars and destructive interactions between users. It involves a community that up- and down-votes entries based on whether the voter thinks the entry is offensive, informative, funny and whether the voter agrees or disagrees with the expressed view.
In a time where ‘facts’ are hotly contested items on the internet, UD is an unapologetic affront to highly referenced and cross-examined material. Most dictionaries strive towards objective content. For example, Wiktionary states ‘Avoid bias. Entries should be written from a neutral point of view, representing all usages fairly and sympathetically’. In contrast, the entries provided in UD do not always describe the meaning of a word, but they sometimes contain an opinion (e.g. beer ‘Possibly the best thing ever to be invented ever. I MEAN IT.’ or Bush ‘A disgrace to America’).
But what can UD teach us about the reality of our language, biases, and how we actually speak day-to-day? This latest analysis uses natural language processing to shed light on the overall features of UD in terms of growth, coverage and types of content:
Language is constantly evolving. Over time, new words enter the lexicon, others become obsolete, and existing words acquire new meanings. The authors conclude that while UD captures many infrequent, informal words and it also contains offensive content, highly offensive deﬁnitions tend to get ranked lower through the voting system.
Taha Yasseri is a Fellow at The Alan Turing Institute and Senior Research Fellow in Computational Social Science at the Oxford Internet Institute, University of Oxford.
Dong Nguyen is a Research Fellow at The Alan Turing Institute and is also affiliated with Edinburgh University.
Barbara McGillivray is a Research Fellow at The Alan Turing Institute and the University of Cambridge.