OII | Data Analytics at Scale

Key Information

Course details
Compulsory intensive course for MSc, Michaelmas Term

Assessment
Coursework submission

Reading list
View now

Tutor
Dr Scott A. Hale

About

The course will teach computational complexity, how to profile Python code, and increase the computational efficiency of Python code. It will also cover parallel and distributed computing approaches including issues such as race conditions and discuss data storage and retrieval techniques (including SQL and NoSQL). The course includes lab sessions in which students will gain hands-on experience to be able to aptly handle large-scale, heterogeneous data on a server and be able to reduce, transform, and otherwise manipulate the data in order to answer a social science question

Key Themes

Unix terminal basics (SSH, error handling, cron, logging)
Computational limits, computational complexity, Big-O notation, and profiling code
Parallelization, race conditions, and distributed computing
MapReduce and PySpark
Data storage techniques
Cython

Learning Objectives

At the end of this course students will:

Design and execute a long-running process to capture data implementing appropriate log-ging, error handling, and scheduling
Understand common limits on wrangling data (memory, disk, CPU) and be able to devise an analysis plan to analyse data taking these limits into account as well as profile code to identify inefficiencies
Understand the MapReduce approach and be able to develop an analysis plan using multi-ple MapReduce cycles to be able to answer a basic research question
Be able to implement and execute the filtering, aggregation, and extraction of data from a large-scale, heterogeneous dataset using Hadoop, Spark, or another appropriate tool.

Course Tutor

Dr Scott A. Hale

Associate Professor, Senior Research Fellow

Dr Scott A. Hale is an Associate Professor, Senior Research Fellow, and Turing Fellow. He develops and applies computer science techniques to the social sciences focusing on increasing equitable access to quality information.

View profile