This course is a four-week intensive primer to get people up to speed on programming in the python programming language for use with data science. To note, python is not the only programming language you will encounter in this course, let alone this degree programme, but it is a great place to start. In week 4 we will compare differences between Python and R (another very popular language in data science). The goal of this course is to get students acquainted with clean, reusable, documented code. Learning machine learning and big data tools will be secondary to this task and come in later modules.

The course will be primarily lab work. Some of this will be group work based, although all summative assessments will be individual.

Key Themes

  • Data science as a technique for abstraction
  • Python as a means to manage large data sets
  • The web as unstructured data that must be transformed into structured data
  • Multiple languages and packages are required for data science
  • Importance of data wrangling choices to research validity

Learning Outcomes

  • Beginner level python including understanding the file system, packages.
  • Parsing a web page and text documents for analysis.
  • Merging data from multiple sources for analysis
  • Present data in attractive and minimal information graphics

Topics

  1. Skills and installation
  2. Introduction to Python
  3. Wrangling and accessing data with Python
  4. Visualisation and analysis of data in Python and R

 

This page was last modified on 8 October 2018