8 May 2015
GE2015 turned out to be a bad night for some. Beyond the obvious political parties, the reputation of polling firms took a big hit: while the exit poll got more or less in the ball park, none of the pre-election polls were anywhere near. This, combined with the advance of the SNP, UKIP and Greens, lent the whole election a real “earthquake” feel, with people like David Dimbleby questioning whether politicians would ever take polling seriously again.
Considering the weaknesses of conventional polling, could social media have filled a gap in terms of forecasting the earthquake that was to come? Were people on Twitter in advance of the opinion polls?
The data we produced last night produces a mixed picture. We were able to show that the Liberal Democrats were much weaker than the Tories and Labour on Twitter, whilst the SNP were much stronger; we also showed more Wikipedia interest for the Tories than Labour, both things which chime with the overall results. But a simple summing of mention counts per constituency produces a highly inaccurate picture, to say the least (reproduced below): generally understating large parties and overstating small ones. And it’s certainly striking that the clearly greater levels of effort Labour were putting into Twitter did not translate into electoral success: a warning for campaigns which focus solely on the “online” element.
In terms of prediction the problem here, of course, is that there are many potential statistics which could be produced by social media, and many potential metrics to predict (from vote shares, to swings, to turnouts etc.). Some of them are bound to be “right” after the fact. In response to this, Taha Yasseri and I have recently written a draft paper trying to produce social election predictions more systematically using Wikipedia data. The main premise is that we need a theory informed model to drive social media predictions, which is based on an understanding of how the data is generated and hence enables us to correct for certain biases.
How could we apply this reasoning to our Twitter data? Well one of the suggestions we made last night was that, even though we were sure the Green Party wasn’t going to win the 46 constituencies shown on our Twitter map, perhaps these areas were nevertheless places where the Green vote was going to spike upwards disproportionately (they might, for instance, indicate a highly organised local party machine which would be capable of delivering extra votes). In order to check this, I took results data for the Green Party and UKIP from 50 constituencies in England and Wales (good data tables for the election results still haven’t been released – so I’m limited to the amount I could quickly collect by hand). The graph below plots the amount of percentage points each party’s results increased by against the amount of Twitter mentions candidates received in the run up to the election in each constituency.
Overall on the graph there is little apparent correlation for UKIP candidates; Green Party candidates show by contrast a rough though by no means perfect positive correlation. In other words, for the Green Party the Twitter mentions have a little predictive power, whereas for UKIP they have none at all. What is more striking is that the points on the graph group clearly into two sections: UKIP increasing more than their mentions would suggest, whilst the reverse is true for the Greens. This highlights one of the major difficulties in making predictions from social media: that voters of different parties make different uses of social media, and a predictive model would need to take these differences into account.
Once the results are announced in full, over the next few weeks we will be looking into this in more detail, for all parties, and across a wider range of metrics.