Machine learning algorithms can discover patterns and hidden structure in data and use these for prediction of future data. This course covers the fundamentals of both supervised and unsupervised learning. Machine learning has many applications in the social sciences and is considered a key data science method. Applications include clustering documents and latent attribute inference (trying to infer demographics, personality traits, or other attributes about a person from behavioural data).

Key Themes

  • Supervised and unsupervised machine learning
  • Regression and classification
  • Overfitting and regularization
  • Support vector machines and tree-based methods (random forests, boosting)
  • Gaussian Processes and Expectation-Maximization
  • Variational inference and optimization
  • Neural networks and deep learning

Learning Objectives

At the end of this course students will…

  • …understand what is meant by ‘machine learning’
  • …understand differences between supervised and unsupervised learning
  • …understand overfitting and regularization
  • …compare various machine learning methods and understand the benefits and limitations of each in reference to a given problem
  • …implement key algorithms in Python and run them on test sets of data

Assessment

There will be two small projects (one in each week of the course), and the summative assessment will be a report on these projects including the computer code written by the student. In both instances, students will be supplied a dataset and tasks to accomplish. The first project will involve clustering and/or regression, and the second Gaussian Processes or neural networks.

Formative Assessment

Students will complete afternoon lab sessions and have immediate feedback from TAs (in addition to seeing whether their output matches the correct output supplied)

Topics

  1. Introduction to machine learning
  2. Regression and regularisation
  3. Classification
  4. Unsupervised learning for cluster analysis (PCA, k-means)
  5. Tree-based methods, random forests, bagging and boosting
  6. Support vector machines and kernel methods
  7. Gaussian Processes and Expectation-Maximization algorithm
  8. Variational inference and optimization
  9. Neural networks and deep learning
This page was last modified on 8 October 2018