Skip down to main content

Introduction to Speech and Language Processing

Key Information

Course details
Option course for MSc, Hilary Term
Professor John Coleman

Please note this course is not available in the 2022-23 academic year


Most textbooks and research literature in speech or language processing are too technical for students in humanities and social sciences, as these subjects are more normally taught in computer science or electronic engineering. And yet many of the techniques are not inherently difficult. This course is tailored to the background and training needs of humanists and social scientists working on speech and language.

Speech and Language Processing introduces a range of computational techniques for the analysis of speech and language, as developed in speech and language technology but here focused on their use in scientific research. The course introduces the necessary technical concepts as gently as possible and has a strongly practical focus. It covers the elements of signal processing, automata theory and parsing.

Learning Objectives

At the end of this course students will…

  • Have an appreciation of and a practical ability to carry out various kinds of time-series analysis on signals and other vectors.
  • Be able to use Matlab, GNU Octave, Prolog, or general scripting to analyse speech and language data, with a particular focus on on-line speech corpora.


  1. Digital signals and on-line audio corpora
  2. Measurement of acoustic parameters
  3. Speech recognition and forced alignment
  4. Functional data analysis of acoustic data
  5. Finite state machines
  6. Probabilistic finite-state models
  7. Parsing
  8. Using probabilistic grammars

Teaching Arrangements

  • Class Style: Traditional lecture/lab work
  • Course delivery: Partly asynchronous: Lectures are pre-recorded and released each week, followed by a live Q&A session
  • Convenor attendance: Remotely via video link
  • Norms of interaction: Only volunteers need contribute in class-wide discussion
  • TA support: None