COMP0087 Statistical Natural Language Processing

This database contains the 2018-19 versions of syllabuses.

Note: Whilst every effort is made to keep the syllabus and assessment records correct, the precise details must be checked with the lecturer(s).

Academic session

2018-19

Module

Statistical Natural Language Processing

Code

COMP0087

Module delivery

1819/A7U/T2/COMP0087 Masters (MEng)

Related deliveries

1819/A7P/T2/COMP0087 Postgraduate

Prior deliveries

COMPM083

Level

Masters (MEng)

FHEQ Level

L7

FHEQ credits

15

Term/s

Term 2

Module leader

Riedel, Sebastian

Contributors

Riedel, Sebastian

Rocktäschel, Tim

Module administrator

Ball, Louisa

Aims

The module introduces the basics of statistical natural language processing (NLP) including both linguistics concepts such as morphology and syntax and machine learning techniques relevant for NLP.

Learning outcomes

On successful completion of the module, a student will be able to understand relevant linguistic concepts relevant ML techniques, in particular structured prediction and deep learning, what makes NLP challenging (and exciting) how to write programs that process language, how to rigorously formulate NLP tasks as learning and inference tasks, and address the computational challenges involved.

Availability and prerequisites

This module delivery is available for selection on the below-listed programmes. The relevant programme structure will specify whether the module is core, optional, or elective.

In order to be eligible to select this module as optional or elective, where available, students must meet all prerequisite conditions to the satisfaction of the module leader. Places for students taking the module as optional or elective are limited and will be allocated according to the department’s module selection policy.

Programmes on which available:

  • MEng Computer Science (International Programme) (year 4)
  • MEng Computer Science (year 4)
  • MEng Mathematical Computation (International Programme) (year 4)
  • MEng Mathematical Computation (year 4)

Prerequisites:

In order to be eligible to select this module, students must have:

  • an understanding of Basic Probability Theory (e.g. Bayes Rule), Linear Algebra and Multivariable Calculus; and
  • proficiency in coding in Python; and
  • the ablilty to install libraries on a computer.

Content

NLP is domain-centred fields, as opposed to technique centred fields such as ML, and as such there is no "theory of NLP" which can be taught in a cumulative technique-centred way. Instead this course will focus on one or two NLP end-to-end "pipelines" (such as Machine Translation and Machine Reading). Through these applications the participants will learn about language itself, relevant linguistic concepts, and Machine Learning techniques. For the latter an emphasis will be on structured prediction, a branch of ML that is particularly relevant to NLP, and deep learning.

Topics will include (but are not restricted to) machine translation, sequence tagging, constituent and dependency parsing, information extraction, semantics. The course has a strong applied character, with coursework to be programmed and lectures that mix practical aspects with theory and background.

NLP Tasks

  • Language Models
  • Machine Translation
  • Text Classification
  • Sequence Tagging
  • Constituency Parsing
  • Dependency Parsing
  • Information Extraction
  • Machine Comprehension

NLP and ML methods

  • Structured Prediction
  • Generative Learning
  • Smoothing
  • EM Algorithm
  • Discriminative Learning
  • Deep and Representation Learning

An indicative reading list is available via http://readinglists.ucl.ac.uk/departments/comps_eng.html.

Delivery

The module is delivered through a combination of lectures, with occasional guest lectures by leading researchers in NLP, and self-directed learning. Coursework problems will focus on basic components in an NLP pipeline, such as a document classifier, part-of-speech tagger and syntactic parser.

The module is based on the lecture notes, exercises and slides at github.com/uclmr/stat-nlp-book, our own online book with interactive notes and slides, based on Python and Jupyter Notebooks.

Assessment

This module delivery is assessed as below:

#

Title

Weight (%)

Notes

1

Coursework 1

30

 

2

Coursework 2

40

 

3

Coursework 3

30

 

In order to pass this module delivery, students must:

  • achieve an overall weighted module mark of at least 50%; and
  • achieve a mark of at least 40% in any components of assessment weighed ≥ 30% of the module.

Where a component comprises multiple assessments, the minimum mark applies to the overall component.