Dr Fabian Stephany
Departmental Research Lecturer
Fabian is a Departmental Research Lecturer in AI & Work at the Oxford Internet Institute.
Climate change and global migration are at the centre of policymakers’ efforts. The World Bank has dedicated an entire report to the topic of water, climate change, and development . Together with the World Bank colleagues we have assembled a new dataset to make climate-induced migration visible with the use of explainable machine learning techniques. We showcase how social data science can help policy to make more informed decisions on some of the largest challenges of our times.
Severe storms, rising sea levels, and prolonged droughts; the impact of climate change is real and it is increasingly impacting our life. In light of devastating climate shocks, some people see no other chance than leaving their catastrophe-struck home, hoping to find a better life in other parts of the world. In contrast to public opinion, climate-induced migration tends to happen within – not between – countries and overwhelmingly from rural to urban areas. But so far climate-induced internal migration patterns often remain uncovered. Therefore, relevant questions to understand and manage climate-induced migration remain unanswered: Which regions are most strongly impacted by internal migration due to climate change? Do droughts result in stronger migration pressure than floods? Are women more likely to migrate after events of climate shocks than men?
We have assembled the world’s largest dataset on internal migration, together with the World Bank and academic experts on climate change and migration. The data allows for studying the relationship of climate events and migration development. It includes information about almost 500 million people from 189 censuses in 64 countries. This unique data allows us to address several of the urgent questions related to climate shocks and migration. To make sense of this huge data set, we have applied explainable machine learning*.
Explainable (or “causal”) machine learning is a type of statistical analysis that uses machine learning methods rather than conventional (parametric) statistical tools. For example, classification techniques in machine learning methods, such as random forests, are not limited by sample sizes and they can reveal highly non-linear relationships in large data sets. For explainable machine learning, models are designed in a way to systematically compare different model settings or the influence of specific model characteristics.
The figure below illustrates the results of several random forest models for explainable machine learning to identify factors that relate to internal migration in the 64 countries used in our study .Each mode considers individual characteristics, such as age, gender, or education, plus the occurrence of a rainfall shock in the respondent’s home region. This approach allows us to compare the importance of the climate shock for migration decisions with the influence of other factors that are known to be relevant for migration behaviour, such as education . A value of more than 100 indicates that in the given country, a given climatic shock is more relevant for explaining migration patterns than education.
The model estimates the influence of climate shocks, relative to other personal characteristics. The results show that for a set of regions, climate shocks mattered more than education when deciding to migrate, while in other parts of the world, climate shocks are less relevant for internal migration. This machine learning analysis is an important pre-selection for the subsequent case studies presented in the World Bank report .
Climate change and migration are two of the biggest challenges of our time and they are the focal point of policy action. In our study, we showcase how novel datasets and robust machine learning models help the World Bank to reveal undetected patterns in internal migration due to climate shocks. It is one of the many possible fields in which big data and novel statistical techniques can help policy makers to deal with complex real-world problems. Social data science can help to make more informed decisions on some of the largest challenges of our times.
For further information have a look at the World Bank’s 2021 report titled Ebb and Flow, Volume1: Water, Migration, and Development and its background paper on interpretable machine learning.
Note: The report was funded in part by the Global Water Security & Sanitation Partnership, a Multi-Donor Trust Fund based at the World Bank’s Water Global Practice
*Our half a billion observation-strong dataset is too big for conventional statistics, as very large sample size impedes an application of regression models because the expressiveness of significance levels diminishes for samples larger than 10,000 observations.
 “Zaveri, Esha; Russ, Jason; Khan, Amjad; Damania, Richard; Borgomeo, Edoardo; Jägerskog, Anders. 2021. Ebb and Flow, Volume 1: Water, Migration, and Development. Washington, DC: World Bank. © World Bank. https://openknowledge.worldbank.org/handle/10986/36089 License: CC BY 3.0 IGO.”
 Lin, M., Lucas, H. C., & Shmueli, G. (2013). Research Commentary—Too Big to Fail: Large Samples and the p-Value Problem. Information Systems Research, 24(4), 906–917. https://doi.org/10.1287/isre.2013.0480
 “Abel, Guy J.; Muttarak, Raya; Stephany, Fabian. 2022. Climatic Shocks and Internal Migration: Evidence from 442 Million Personal Records in 64 Countries. World Bank, Washington, DC. © World Bank. https://openknowledge.worldbank.org/handle/10986/36886 License: CC BY 3.0 IGO.”
 Ginsburg, C., Bocquier, P., Béguy, D., Afolabi, S., Augusto, O., Derra, K., et al. (2016). Human capital on the move: Education as a determinant of internal migration in selected INDEPTH surveillance populations in Africa. Demographic Research, 34(30), 845–884. https://doi.org/10.4054 /DemRes.2016.34.30