COMPGI19 - Statistical Natural Language Processing

This database contains the 2017-18 versions of syllabuses. Syllabuses from the 2016-17 session are available here.

Note: Whilst every effort is made to keep the syllabus and assessment records correct, the precise details must be checked with the lecturer(s).

CodeCOMPGI19 (also taught as COMPM083)
  1. Be able to write code in Python.
  2. Understand Basic Probability Theory (e.g. Bayes Rule) and Linear Algebra.
  3. Be able to install libraries on a computer.
Taught BySebastian Riedel (100%)

The course introduces the basics of statistical natural language processing (NLP) including both linguistics concepts such as morphology and syntax and machine learning techniques relevant for NLP.

Learning Outcomes

Students successfully completing the module should understand:

  • relevant linguistic concepts
  • relevant ML techniques, in particular structured prediction
  • what makes NLP challenging (and exciting)
  • how to write programs that process language
  • how to rigorously formulate NLP tasks as learning and inference tasks, and address the computational challenges involved.


NLP is domain-centred fields, as opposed to technique centred fields such as ML, and as such there is no "theory of NLP" which can be taught in a cumulative technique-centred way. Instead this course will focus on one or two NLP end-to-end "pipelines" (such as Machine Translation and Machine Reading). Through these applications the participants will learn about language itself, relevant linguistic concepts, and Machine Learning techniques. For the latter an emphasis will be on structured prediction, a branch of ML that is particularly relevant to NLP.

Topics will include (but are not restricted to) machine translation, sequence tagging, constituent and dependency parsing, information extraction, semantics. The course has a strong applied character, with coursework to be programmed, and lab classes to teach students to write software that processes language.

NLP Tasks

  • Language Models
  • Machine Translation
  • Text Classification
  • Sequence Tagging
  • Constituency Parsing
  • Dependency Parsing
  • Information Extraction
  • Machine Comprehension

NLP and ML methods

  • Structured Prediction
  • Generative Learning
  • Smoothing
  • EM Algorithm
  • Discriminative Learning
  • Deep and Representation Learning

Method of Instruction

The module is delivered in lectures, with occasional guest lectures by leading researchers in NLP. Coursework problems will focus on basic components in an NLP pipeline, such as a document classifier, part-of-speech tagger and syntactic parser.

The module is based on the lecture notes, exercises and slides at, our own online book with interactive notes and slides, based on Python and Jupyter Notebooks.


The course has the following assessment component:

  • Coursework (100%)

To pass this module students must:

  • Obtain an overall pass mark of 50% for all sections combined


Reading list available via the UCL Library catalogue.