Predicting elections with Wikipedia data: new article in EPJ Data Science
Taha Yasseri and I have a new article out in EPJ Data Science which looks at the subject of electoral prediction using page view data from Wikipedia. Forecasting electoral results with some form of novel internet data is really a growth area in the literature at the moment, with a huge amount of research teams trying out different approaches. However I think our paper nevertheless makes a novel contribution, in a couple of respects. First, our model is theory driven rather than taking a machine learning approach, by which I mean that we try and theorise the mechanism generating Wikipedia page view data and how that relates to electoral outcomes, rather than simply looking at a range of indicators to see if any of them offers any predictive power. Second, we test a reasonably large set of electoral results: a group of around 60 parties in the European Parliament elections in 2014, whereas many other studies look at prediction only in the case of one election.
We found a number of things: we are able to show that the majority of online information seeking happens in the couple of days before the election (left hand panel in the figure); we are also able to show that page views do seem to offer indicators of a number of things happening in the election, such as turnout levels (right hand panel in the figure) and overall electoral results. Wikipedia was particularly good at predicting the emergence of small parties which were shooting to prominence (something which has become a feature of European politics in the last decade), even if it did tend to overstate their final result.
In future work, we intend to spread the work out to more countries and more types of information seeking.