Skip down to main content

Wikipedia coverage by langauge

Published on
17 Oct 2011
Written by
Scott A. Hale
Update (November 2014): I’ve recently published a related paper examining how many users edit multiple language editions of Wikipedia and how these multilingual users connect the editions together. Please see Multilinguals and Wikipedia Editing for further information and a free, open-access copy of the article.

My absence from blogging for a few months has been personal (I got married in July) but also work related: I have a number of great project outputs that have just been released. These include a draft paper on social influence and collective action, a presentation at the Oxford Martin School, and a publication of Internet related maps resulting also in an online visualization gallery.

I’ve put my new mapping skills to work on the latest Wikipedia dumps from 30 September 2011 to uncover some patterns in geotagged articles. My methods are not perfect and not all language editions of the encyclopedia have the same level of geo-tagging; nevertheless, I think the patterns revealed are quite telling:

The map above shows which language edition out of German, Portuguese, and Spanish has the most geotagged articles in each country. There are a few ties, but for the most part a clear pattern emerges: countries in the Spanish-speaking world have more Spanish articles, German-speaking regions more German articles, etc. I will parse more dumps and add these (in particular I’d like to add Arabic, French, and English), but I think this pattern will hold across these and other languages.

I’ve received some challenge on my language-related research about what specific benefits multilingual contributors might bring, and I think one answer lies in breadth of content. Better coverage for a particular language edition of Wikipedia might not lie in energizing those in the home regions of a language, but rather in mobilizing the diaspora and language learners. As Brian Hecht’s 2010 article shows, there is little overlap in content and articles between different additions of Wikipedia and thus the possibility of greater coverage exists for all language editions.

Related Topics