COMPM055 - Supervised Learning
Note: Whilst every effort is made to keep the syllabus and assessment records correct, the precise details must be checked with the lecturer(s).- Code
- COMPM055 (Also taught as: COMPGI01)
- Year
- 4
- Prerequisites
- Basic mathematics, Calculus, Probability and statistics, Linear algebra
- Term
- 1
- Taught By
- Mark Herbster (50%)
Massi Pontil (50%)
- Aims
- This module covers supervised approaches to machine learning. It starts by reviewing fundamentals of statistical decision theory and probabilistic pattern recognition followed by an in-depth introduction to various supervised learning algorithms such as Perceptron, Backpropagation algorithm, Decision trees, instance-based learning, support vector machines. Algorithmic-independent principles such as inductive bias, side information, approximation and estimation errors. Assessment of algorithms by jackknife and bootstrap error estimation, improvement of algorithms by voting methods such as boosting. Introduction to statistical learning theory, hypothesis classes, PAC learning model, VC-dimension, growth functions, empirical risk minimization, structural risk minimization.
- Learning Outcomes
- Gain in-depth familiarity with various classical and contemporary supervised learning algorithms, understand the underlying limitations and principles that govern learning algorithms and ways of assessing and improving their performance, understand the underlying fundamentals of statistical learning theory, the complexity of learning and its relationship to generalization ability.
Content:
- Overview and Introduction to Bayes Decision Theory
- Machine Intelligence and Applications
Pattern Recognition concepts
Classification, Regression, Feature Selection
Supervised Learning
Class conditional probability distributions
Examples of classifiers
Bayes optimal classifier and error
Learning classification approaches - Linear machines
- General and Linear Discriminants
Decision regions
Single layer neural network
Linear separability, general position, number of dichotomies
General gradient descent
Perceptron learning algorithm
Mean square criterion and Widrow-Hoff learning algorithm - Multi-Layer Perceptrons
- Introduction to Neural Networks, Two-Layers
Universal approximators
Backpropagation learning, on-line, off-line
Error surface, important parameters - Learning decision trees
- Inference model, general domains, symbolic
Decision trees, consistency
Learning trees from training examples
Entropy, mutual information
ID3 algorithm criterion
C4.5 algorithm
Continuous test nodes, confidence
Pruning
Learning with incomplete data - Instance-based Learning
- Nearest neighbor classification
k-Nearest neighbor
Nearest Neighbor error probability, proof
Simplification, Editing
Example: Document retrieval
Case-based reasoning
Example: learning graphical structures - Machine learning concepts and limitations
- Fundamental algorithmic-independent concepts
Hypothesis class, Target class
Inductive bias, Occam's razor
Empirical risk
Limitations of inference machines
Approximation and estimation errors
Tradeoff - Machine learning assessment and Improvement
- Statistical Model Selection
Structural Risk Minimization
Practical methods for risk assessment based on resampling, Jackknife, Bootstrap
Improving accuracy of general algorithms, Bagging, Boosting - Learning Theory
- Formal model of the learnable
Sample complexity
Learning in zero-Bayes and realizable case
Growth function, VC-dimension
VC-dimension of Vector space of functions, proof
Empirical Risk Minimization over finite classes, sample complexity, proof
Empirical Risk Minimization over infinite classes, risk upper bound, proof
Lower bound on sample complexity - Support Vector Machines
- Margin of a classifier
Dual Perceptron algorithm
Learning non-linear hypotheses with perceptron
Kernel functions, implicit non-linear feature space
Theory: zero-Bayes, realizable infinite hypothesis class, finite covering, margin-based bounds on risk
Maximal Margin classifier
Learning support vector machines as a dual-optimization problem
Method of Instruction:
Lecture presentations with associated class problems
Assessment:
The course has the following assessment components:
- Written Examination (2.5 hours, 75%)
- Coursework Section (2 pieces, equally weighted, 25%)
To pass this course, students must:
- Obtain an overall pass mark of 50% for all sections combined
The examination rubric is:
Answer three questions out of five. All questions carry equal marks.
Resources:
Text Book 1: The Elements of Statistical Learning: Data Mining, Inference and Prediction, Hastie.T., Tibshirani.R., and Friedman.J.,
Springer [2001]
Reference Book 1: Pattern Classification, Duda.R.O., Hart.P.E., and Stork.D.G., John Wiley and Sons (2001)
Reference Book 2: Pattern Recognition and Machine Learning, Bishop, Christopher M., Springer (2006)
Reference Book 3: An Introduction to Support Vector Machines, Shawe-Taylor J. and Cristianini N., Cambridge University Press
(2000)
Reference Book 4: Kernel Methods for Pattern Analysis, Shawe-Taylor.J, and Cristianini N., Cambridge University Press (2004)
Lecture notes and module information

