Big Data tends to be dirty data. It is rare that a researcher has privileged access to a data set that is in the correct form for analysis, and the act of cleaning and shaping data remains one of the most significant hurdles in moving from idea to execution. This methods option course for the OII MSc in “Social Science of the Internet” will familiarize the student with a variety of techniques for cleaning and shaping data. We will move through the acts parsing text files, cleaning json, aggregating data in a database and reading from that database. This course will not be directly exploring substantive questions with the data and thus should be paired with a substantive course of interest to the student. Instead we will focus almost exclusively on the skills required to manage data across contexts. Similarly, this course will not provide the students with the tools to analyse or visualize data. While this might appear to be a significant oversight, it is often the case that visualization and analysis are rather straightforward once data has been cleaned. As such, this course focuses on the difficult chasm between data access and data analysis that is often overlooked.

