Estimating local commuting patterns from geolocated Twitter data
Over the last decade or so there has been an explosion of research interest in the area of measuring (and forecasting) of traffic and commuting patterns. Part of this is driven by ever increasing human mobility: in 2016 alone, people in the UK travelled a collective 800 billion kilometres [PDF], more than 60% of which was by car, and congestion on these networks costs billions of pounds a year. But also driving the research agenda is the emergence of a wide variety of new forms of data (which has built on and supplemented more traditional magnetic loop technologies): such as data re-purposed from mobile phone records, or collected through IoT enabled smart sensors, or emerging from freely contributed traces to social media platforms. These data sources offer huge potential to improve on existing methods of data collection, such as hated transport census (see picture).
As part of a research project entitled NEXUS: Real Time Data Fusion and Network Analysis for Urban Systems (funded by InnovateUK), myself and a team of researchers at the OII have been looking into some of these possibilities. Our first paper on the subject, entitled “Estimating Local Commuting Patterns from Geolocated Twitter Data“, has just been published in EPJ Data Science. The paper addresses the extent to which we can make use of geolocated Twitter data to estimate commuting flows between local authorities (you can have a play with some of the underlying data using the map below, which shows census commuting figures and Twitter based estimates for local authorities around Britain).
We draw two main conclusions from the paper. First we show that, making use of heuristics for mapping individuals making geolocated tweets to home and work areas, we can use Twitter to produce accurate representations of the overall structure of commuting in mainland Great Britain; estimates which improve considerably on other ‘low information’ methods of estimating commuting flows (we compared estimates in particular to the popular radiation model). Second, and probably most importantly, we show that these results are not particularly sensitive to demographic characteristics. When looking at commuting flows broken down by gender, age group and social class, we found that Twitter still offered reasonable estimations for all of these sub-categories. We think this is important because a key concern about using social media data for this type of proxy estimation is the extent to which the ‘demographic bias’ in social media users (who are often younger, better educated and wealthier than the population average) might also result in biased predictions (for example, better prediction of the travel patterns of younger people). We show that, at least in our context, this is not the case.
What’s next? There is plenty more to explore in this research area: looking at whether predictions can be made more granular, or perhaps whether sentiment from social media can be worked in, or whether other platforms can also contribute. We will also start to work on some other data sources, making use of some of the exciting datasets being made available by places like the ADRN and CDRC.
Graham McNeill, Jonathan Bright and Scott A Hale (2017) Estimating local commuting patterns from geolocated Twitter data, EPJ Data Science 20176:24.