Using Wikipedia to explore the participation gap between those who have their say, and those whose voices are pushed to the side, in representations of the Arab world online.

There are obvious gaps in access to the Internet, particularly the participation gap between those who have their say, and those whose voices are pushed to the sidelines. Despite the rapid increase in Internet access, there are indications that people in the Middle East and North Africa (MENA) region remain largely absent from websites and services that represent the region to the larger world.

We explore this phenomenon through one of the MENA region’s most visible and most accessed source of content: Wikipedia. It currently contains over 9 million articles in 272 languages, far surpassing any other publicly available information repository. It is widely considered the first point of contact for most general topics, thus making it an effective site for framing any subsequent representations. Content from Wikipedia also has begun to form a central part of services offered elsewhere on the Internet.

Wikipedia is therefore an important platform from which we can learn whether the Internet facilitates increased open participation across cultures, or reinforces existing global hierarchies and entrenched power dynamics. Because the underlying political, geographic and social structures of Wikipedia are hidden from users, and because there have not been any large scale studies of the geography of these structures and their relationship to online participation, groups of people may be marginalized without their knowledge.

Map of geotagged Wikipedia articles in the Middle East and North Africa region

This relative lack of MENA voice and representation means that the tone and content of this globally useful resource that represents MENA, in many cases, is being determined by outsiders with a potential misunderstanding of the significance of local events, sites of interest and historical figures. Furthermore, in an area that has seen substantial social conflict, participation from local actors enables people to ensure balance in content about contentious issues. Unfortunately, most research on MENA’s Internet presence has been drawn from anecdotal evidence, and no comprehensive studies currently exist.

This project will therefore employ a range of (primarily quantitative) methods to assess the connection between access and representation, using MENA as the first step in an assessment of the inequalities in the global system.

Research Objectives

Our key academic objective is to discern the visibility of the MENA region, and residents of the MENA region, in the production of online knowledge. To do this, we outline a number of more specific objectives:

  • To examine whether there are disproportionately fewer articles on the MENA region compared to the rest of the world, and of these articles, whether authors from MENA will comprise disproportionately fewer of the contributors.
  • To determine if the centralized political structure of Wikipedia undervalues new contributors from MENA. In particular, we explore whether authors from MENA have their contributions undermined because of: competitive practices such as content deletion; indifference to content created by authors from MENA; and marginalization through bullying or dismissal.

Our key practical objective is to find the appropriate social mirror that will effectively represent Wikipedia’s presentation of MENA content and MENA contributors in such a way as to facilitate more content, more accurate content and more effective knowledge transfer between MENA and other global regions.

We intend to do this through both community outreach workshops and a website resource that enables individuals to compare the breadth and quality of articles on areas of similar population size across MENA.

  • Per capita, Arabic is the most under-represented major world language on Wikipedia, which is why it was of particular focus for us. Additionally, we were interested in Sub-Saharan Africa, as it is woefully under-represented per-capita in all major languages.
  • Of the barriers we discovered in general, open government data was the one that surprised us until we delved deep into the logic of Wikipedia. As a compendium of secondary data, Wikipedia depends on good sourcing. Without official statistics, lists of towns or sub-national level facts, it is hard to ensure articles stay inside Wikipedia. Similarly, there is a great deal of unfamiliarity with legitimate but small newspapers and books in foreign languages.
  • Arabic authors in particular face a number of hurdles. State actors in the Gulf have been known to meddle in Wikipedia and use it for surveillance. The Arabic language itself is presented in Wikipedia with a poor typeface. As Arabic people go online they are leapfrogging laptops in favour of mobile devices that are less amenable to content creation. Civil discourse between adherents to Islam and non-Islamic people suffers from disagreement about the veracity of religious texts. There is also a perception that Wikipedia is a Western enterprise that does not meet Arabic needs or that is trying to co-opt Arabic participation. There is a difference of opinion within the Arabic Wikipedia community that has led to the splintering of the site into Arabic and Egyptian Arabic.
  • We did both big data analysis (primarily GIS and spatial analysis) on data dumps from Wikipedia and qualitative analysis through focus groups and interviews with key Wikipedians. By combining these two we were able to come up with a comprehensive picture that shows both the scale of variation in representation and many of the micro-level processes that either caused or reinforced this variation.
  • From a global analysis, we discovered that one of the most significant barriers to geographic representation on Wikipedia is broadband Internet. It is stronger than population, education and GDP. Connectivity is critical.

We have now published several papers from this project, but most notably: Graham, M., Hogan, B., Straumann, R.K., and Medhat, A. (2014) Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers.

Support

This project is supported by the International Development Research Centre (IDRC).

Articles

Chapters

  • Allagui, I., Graham, M., and Hogan, B. (2014) Wikipedia Arabe et la Construction Collective du Savoir [Wikipedia Arabic and the Collective Construction of Knowledge]. In L.Barbe and L. Merzeau (eds) Wikipedia, objet scientifique non identifie. Paris: Presses Universitaries du Paris Ouest.
  • Graham, M. (2012) Die Welt in Der Wikipedia Als Politik der Exklusion: Palimpseste des Ortes und selective Darstellung. In S. Lampe, and P. Bäumer (eds) Wikipedia. Bundeszentrale für politische Bildung/bpb, Bonn.
  • Graham, M. (2014) Internet Geographies: Data Shadows and Digital Divisions of Labour. In M.Graham and W.H.Dutton (eds) Society and the Internet: How Networks of Information and Communication are Changing our Lives. Oxford: Oxford University Press, pp. 99-116.
  • Graham, M. (2014) The Knowledge Based Economy and Digital Divisions of Labour. In V.Desai, and R.Potter (eds) Companion to Development Studies, 3rd edn. Hodder, pp. 189-195.
  • Yasseri, T., Spoerri, A., Graham, M. and Kertész, J. (2014) The most controversial topics in Wikipedia: A multilingual and geographical analysis. In: P.Fichman and N.Hara (eds) Global Wikipedia: International and cross-cultural issues in online collaboration. Rowman & Littlefield, pp. 25-48.

Reports

  • The Digital Language Divide

    Date Published: 29 May 2015

    Source: The Guardian

    A Digital Guardian article which explores in depth the effects of language on internet use draws heavily on work done by OII researchers.

  • The world wide SPREAD: Map reveals the extent of internet use around the globe – and the countries that are still not online

    Date Published: 22 September 2014

    Source: Daily Mail

    The map of global use of websites created by Mark Graham and Stefano De Sabbata is reported in the Daily Mail. The data visualisation shows each country sized according to its internet-enable population.

  • Why global contributions to Wikipedia are so unequal

    Date Published: 8 September 2014

    Source: The Conversation

    Mark Graham authors an article explaining why the unequal global representation in Wikipedia matters and why it impedes Wikiepedia's aim to be the 'sum of all human knowledge'.

  • Geotagging reveals Wikipedia is not quite so equal after all

    Date Published: 18 August 2014

    Source: New Statesman

    Rather than being an equaliser, Wikipedia may be reproducing an established world view. Mark Graham writes about his work on inequalities in Wikipedia. For example, he says, the Middle East is massively underrepresented.

  • What was the last book you read? Wikipedia wants to know

    Date Published: 13 August 2014

    Source: The National Opinion

    The interactive map of Wikipedia created by Mark Graham and colleagues is used to demonstrate inequalities in representation on Wikipedia.

  • Wikipedians most likely to war over ‘Israel,’ ‘God’

    Date Published: 3 June 2013

    Source: The Times of Israel

    Reporting Taha Yasseri’s work the Times of Israel notes that in Hebrew Wikipedia  the greatest divisions are mainly about religious sects and armed conflicts but across the languages ‘Israel ‘ and ‘Hitler’ are the most contested subjects.

  • Chile, el tema más controvertido de Wikipedia en espaňol

    Date Published: 3 June 2013

    Source: BBC Mundo

    The most controversial topics in Spanish Wikipedia, identified by Taha Yasseri and Mark Graham are highlighted on the BBC’s Spanish language web site.

  • Wikipedia ‘Edit Wars’: The most hotly contested topics

    Date Published: 31 May 2013

    Source: Live Science

    Taha Yasseri says Wikipedia suffers from traditional features of human societies. People argue most on Wikipedia about religion and politics with variations on non-English language sites. Romanians for example argue most about musicians and art.

  • The Most Controversial Article in all of English Wikipedia is George Bush’s

    Date Published: 31 May 2013

    Source: The Huffington Post

    The Huffington Post says that the study of controversial topics in Wikipedia by Taha Yasseri and Mark Graham contains some ‘incredible graphics’ several of which are displayed.

  • The Controversial Topics of Wikipedia

    Date Published: 30 May 2013

    Source: Wired Science Blog

    Wired magazine article sets out some of the findings of Taha Yasseri, mark Graham and colleagues’ work on contested subjects in Wikipedia.  The table of the most controversial articles in each language edition is featured.

  • Wikipedia is not free

    Date Published: 21 May 2013

    Source: Caijing.com.cn

    The challenge for Wikipedia of expanding beyond the English speaking world is published in the independent Beijing-based Chinese language magazine. Mark Graham’s research is referenced and DPhil student Heath Ford is quoted.

  • Free for all? Lifting the lid on a Wikipedia crisis

    Date Published: 17 April 2013

    Source: New Scientist

    In an in-depth analysis of the challenges facing Wikipedia in expanding participation beyond the English speaking world, Mark Graham’s research on Wikipedia is referenced and DPhil student Heather Ford is quoted.

  • Who Writes the Wikipedia Entries About Where You Live?

    Date Published: 26 March 2013

    Source: The Atlantic

    Mark Graham tackles the issue of where our information comes from, and how this should influence the way we interpret it?

  • Big data and the death of the theorist

    Date Published: 25 January 2013

    Source: Wired

    Mark Graham is skeptical about on the death of the scientific theory at the hands of big data analysis: "when talking about 'big data' and the humanities, there will always be things that are left unsaid, things that haven't been measured or codified".

  • Wikipedia world: an interactive guide to every language. Infographic map

    Date Published: 4 April 2012

    Source: The Guardian

    In 'Show and Tell' on the Guardian Data Store, Simon Rogers, winner of the OII award for best internet journalist in 2011, highlights the Mapping Wikipedia project which shows millions of articles worldwide in a variety of languages.

  • Wikipedia Language Maps Created By Oxford Internet Institute’s Mark Graham

    Date Published: 13 November 2011

    Source: Huffington Post

    "Mark Graham led a team of researchers who broke down Wikipedia's geotagged articles by language and examined the global scope of the encyclopedia. They plotted these data onto maps of the world to show the spread of languages within the encyclopedia."

  • This Map Shows the World of Wikipedia Broken Down by Languages

    Date Published: 11 November 2011

    Source: Gizmodo US

    "Ever wondered if anyone outside your redneck little town writes about it on Wikipedia? Or if anyone has ever written about Australia in Arabic? Guess no longer, because someone's worked it out for you."

  • The world of Wikipedia’s languages mapped

    Date Published: 11 November 2011

    Source: Guardian Datablog

    What happens if you map every geotagged Wikipedia article - and then analyse it for language use? A team of Oxford University researchers has found out.