How much Wikipedia could tell us about elections
IMPORTANT NOTE: this post does not aim at predicting the results of any election. This is just a report on some publicly available data and does not draw any conclusion on it.
In few hours, vote casting for Iranian presidential election, 2013 starts. And within few days (may be one or two) the next president of Iran for the forthcoming four years will be officially announced. This is not only an important event for all Iranians but it also could significantly impact the short or even long term history of the region and even the world, given the complicated internal and international political situation of Iran. Clearly this discussion is out of my expertise and interests and is not the goal of this post.
One of the main differences between Iranian elections and many other countries’ is that most of the time, the candidates are not known until very close to the election date. The process of self-nomination (registration), and then approval and pre-selection of candidates by the Guardian Council, and official announcement of campaigning candidates is rather complicated and unpredictable. In short, almost no one knows the candidates until about a month before election dates.
The rather short period of election campaigns makes it very important how to inform the voters about the programmes and plans of the candidates as well as their previous political biography. Of course online material and social networking could play an important role in bridging between candidates and voters. Among others, Wikipedia is one of the sources that citizens refer to in order to gather at least some basic information about the candidates.
This time, there have been 8 candidates officially announced by the Ministry of Interior, from which 2 have withdrawn later. I did a simple count on the number of edits, number of unique editors, and number of page views of the Persian Wikipedia pages of those 8 candidates from May 7th (start of registration) up to now. The results are presented in the following chart. To my surprise, there hasn’t been massive editorial work on the pages within this period (180 edits at most). However, page view numbers are relatively large, with a maximum of 180,000 hits during the same period and for the same candidate with the maximum number of edits by maximum number of unique editors. If I were a candidate, I’d have put more effort in order to complete and groom my Wikipedia page! As it’s quite visible!
More interestingly, those candidates with higher page view statistics are commonly known to have higher chances of success according to official and unofficial polls during the last few weeks (I don’t believe in any kind of survey-based opinion mining, by the way!).
Another interesting aspect of page view statistics, is of course its temporal evolution. In the next diagram I show the number of daily views for the top-4 candidates (according to the total number of page views and excluding Aref, who has withdrawn).
On May 21st, the final list of 8 candidates was announced and it’s the reason for the second peak in all 4 lines and it’s even higher for Jalili because his acceptance as a candidate was kind of a surprise and people apparently has started to know him more. The following bumps in the page view numbers of candidates are mainly due to their presence in either live TV debates or their campaign meetings. Finally, the most interesting and relevant jump is the one of Rouhani, just 2-3 days ago.Among those 4 candidate, Jalili was the least expected and known candidate who registered on the last day of registration and it produced the first peak in his page views.
The only significant event during this period was the withdrawal of Aref, which could be seen as a supportive action for Rouhani (although never mentioned explicitly).
I’d like to emphasise that I’m not trying to do any prediction based on this low-dimensional, sparse data, but if you are interested in predictions, see our soon-to-be-published paper on Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data or read about it in the Guardian.