Mining human mobility and migration patterns from social media and industry data sources as well as visualizing geo-temporal network data interactively with HTML5.

This project will develop new techniques for estimating population levels and population movement through social media and other types of data (e.g., WiFi hot spot data) and will also pioneer new techniques for visualizing such geospatial and temporal network data interactively using HTML5. Within the visualization work, we employ online experiments to compare various visualization alternatives. The visualization tools produced will be open-source, and we aim to produce integrations with other popular network visualization software along the lines of our previous work with sigma.js.

Estimating population levels and population movement

Population movement is a major contemporary challenge for policymakers in local government. At the macro level, demographic data created by censuses (and other survey instruments) quickly becomes out of date in the contemporary social context of high internal and external migration, meaning that local policymakers often have only a loose grasp on exactly who is living in different districts and areas. At a smaller scale, the levels of movements of local populations on a day to day basis are also notoriously difficult to measure, meaning that crafting accurate policy around issues such as commuting and public transport or tourism is highly challenging.

This strand of the project aims to drive forward research into novel solutions to this problem by creating and validating proxy measurements for both macro and micro level population flows from a range of new data sources, especially social media data and business data. These data sources stand positioned to offer major new insight into the way people move at both the micro and macro level, and thus to provide a step change in the extent to which policymakers can plan population related policy.

This part of the project will pursue research into two levels of proxy measurement. First, we will look at ways of measuring aggregate population levels from social media data (in particular geotagged data from Twitter) and data from BT (such as WiFi hotspot use data). We do not expect initial estimates to be accurate straight away: indeed we expect significant biases. The project will build upon existing work to look at a range of sociodemographic variables, such as average levels of income and education, which could be used to correct these biases or, at the very least, inform individuals already using social media data for policy about the nature of the biases.

Second, we will look at ways of measuring daily population flows. We will again make use of geotagged tweets, and also take into account industry data. Here the aim will be to explore how individual users change geographical location during the day, and how populations flow in and out of urban areas, thus providing ways of measuring commuting flows. The data will also permit analysis of weekday/weekend differences as well as seasonality. Our raw data will be benchmarked against existing sources of traffic data, such as the Department for Transport’s Traffic Counts dataset, which provides street level data for thousands of streets around the country. Again, we expect biases to be present, and part of the research will be about analysing and understanding these biases.

Interactive, HTML5 visualizations of geospatial and temporal network data

Network diagrams show relationship data between sets of nodes or actors. Such relationships are often complex and difficult to visualize for a non-expert. Interactive visualization allows for individuals to explore the data in a more intuitive way and ultimately gain a better understanding.

Our previous work produced a HTML5 network visualization framework used by over 10,000 websites including Elsevier, Harvard, the Wikimedia Foundation, and MastodonC. That visualization framework, however, does not handle geospatial network data (e.g., nodes with geographic locations) nor any time-dimension of network data. In addition, while the software allows users to zoom, search, and filter a network visualization, it does not allow users to manipulate network diagrams by resizing, reposition, or recoloring nodes.

This strand of the project develops a HTML and JavaScript framework to interactively visualize geospatial and temporal network data. The framework will be released as open-source software along with integrations for popular desktop network visualization programs such as Gephi. The framework will allow for the easy creation of geospatial and temporal network visualizations and for the sharing of these visualizations in standard web browsers. The software developed will allow for the visualization of flow data across a geographic region (such as data from the population movement strand of the project).

The visualization framework will be developed in a user-centred, iterative, and agile manner. We will conduct user-tests throughout the development process to understand how users perceive geospatial and temporal network data and how different visualization options affect users’ understandings of the data.

Support

The project is funded by InnovateUK and conducted in partnership with BT.


Latest blog posts