Skip down to main content

Computational Methods for the Social Sciences

Key Information

Course details
Methods Option course for MSc, Hilary Term
Written submission
Reading list
View now
Dr Fabian Stephany, Dr Luc Rocher


Social science research increasingly rely on computational methods to study online culture, communities, and human behaviours, drawing from larger and larger amounts of data available online. The uptake of platforms (such as Twitter, Facebook and Google) and public websites around the world—and the relatively open, structured nature of the data they produce—now allows researchers to study many new social research questions.

However, the way these messy data are structured, disseminated, and available creates challenge for the social sciences. This course teaches how to collect and wrangle structured and unstructured data from social websites and platforms. In particular, the focus will be on using Python to access data from a diverse variety of sources on the social web (from Twitter to Reddit, from Wikipedia to the front page of the New York Times), and transforming this material into datasets which are amenable to traditional social science analysis. Once the data has been collected, the course familiarizes students with the variety of approaches for processing and preparing data for analysis.

Learning objectives

  • Have a grasp of key methodological issues involved in the collection and processing of these types of data, and the key challenges in using online social data to answer social research questions
  • Have a solid grounding in the use of the Python programming language to access and pre-process social data
  • Be able to parse text files in a manner suitable for Natural Language Processing
  • Be able to reshape JSON data such as tweets and other API-based data into rectangular structures amenable to analysis using data frames.
  • Understand how to apply regular expressions to string text
  • Appreciate, understand, and tame Unicode data such as © and TM
  • Be able to transfer data between R and Python