Skip down to main content

Data Analytics at Scale

Key Information

Course details
Compulsory intensive course for MSc, Michaelmas Term
Assessment
Coursework submission
Reading list
View now
Tutor
Dr Scott A. Hale

About

The course will teach computational complexity, how to profile Python code, and increase the computational efficiency of Python code. It will also cover parallel and distributed computing approaches including issues such as race conditions and discuss data storage and retrieval techniques (including SQL and NoSQL). The course includes lab sessions in which students will gain hands-on experience to be able to aptly handle large-scale, heterogeneous data on a server and be able to reduce, transform, and otherwise manipulate the data in order to answer a social science question

Key Themes

  • Unix terminal basics (SSH, error handling, cron, logging)
  • Computational limits, computational complexity, Big-O notation, and profiling code
  • Parallelization, race conditions, and distributed computing
  • MapReduce and PySpark
  • Data storage techniques
  • Cython

Learning Objectives

At the end of this course students will:

  • Design and execute a long-running process to capture data implementing appropriate log-ging, error handling, and scheduling
  • Understand common limits on wrangling data (memory, disk, CPU) and be able to devise an analysis plan to analyse data taking these limits into account as well as profile code to identify inefficiencies
  • Understand the MapReduce approach and be able to develop an analysis plan using multi-ple MapReduce cycles to be able to answer a basic research question
  • Be able to implement and execute the filtering, aggregation, and extraction of data from a large-scale, heterogeneous dataset using Hadoop, Spark, or another appropriate tool.

 

Privacy Overview
Oxford Internet Institute

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies
  • moove_gdrp_popup -  a cookie that saves your preferences for cookie settings. Without this cookie, the screen offering you cookie options will appear on every page you visit.

This cookie remains on your computer for 365 days, but you can adjust your preferences at any time by clicking on the "Cookie settings" link in the website footer.

Please note that if you visit the Oxford University website, any cookies you accept there will appear on our site here too, this being a subdomain. To control them, you must change your cookie preferences on the main University website.

Google Analytics

This website uses Google Tags and Google Analytics to collect anonymised information such as the number of visitors to the site, and the most popular pages. Keeping these cookies enabled helps the OII improve our website.

Enabling this option will allow cookies from:

  • Google Analytics - tracking visits to the ox.ac.uk and oii.ox.ac.uk domains

These cookies will remain on your website for 365 days, but you can edit your cookie preferences at any time via the "Cookie Settings" button in the website footer.