Wikipedia coverage by langauge
My absence from blogging for a few months has been personal (I got married in July) but also work related: I have a number of great project outputs that have just been released. These include a draft paper on social influence and collective action, a presentation at the Oxford Martin School, and a publication of Internet related maps resulting also in an online visualization gallery.
I’ve put my new mapping skills to work on the latest Wikipedia dumps from 30 September 2011 to uncover some patterns in geotagged articles. My methods are not perfect and not all language editions of the encyclopedia have the same level of geo-tagging; nevertheless, I think the patterns revealed are quite telling:
The map above shows which language edition out of German, Portuguese, and Spanish has the most geotagged articles in each country. There are a few ties, but for the most part a clear pattern emerges: countries in the Spanish-speaking world have more Spanish articles, German-speaking regions more German articles, etc. I will parse more dumps and add these (in particular I’d like to add Arabic, French, and English), but I think this pattern will hold across these and other languages.
I’ve received some challenge on my language-related research about what specific benefits multilingual contributors might bring, and I think one answer lies in breadth of content. Better coverage for a particular language edition of Wikipedia might not lie in energizing those in the home regions of a language, but rather in mobilizing the diaspora and language learners. As Brian Hecht’s 2010 article shows, there is little overlap in content and articles between different additions of Wikipedia and thus the possibility of greater coverage exists for all language editions.