COMPGI01 - Supervised Learning

Note: Whilst every effort is made to keep the syllabus and assessment records correct, the precise details must be checked with the lecturer(s).

Code
COMPGI01 (Also taught as: COMPM055 Supervised Learning)
Year
MSc
Prerequisites
Basic mathematics, Calculus, Probability and statistics, Linear algebra
Term
1
Taught By
Mark Herbster (50%)
Massi Pontil (50%)
Aims
This module covers supervised approaches to machine learning. It starts by reviewing fundamentals of statistical decision theory and probabilistic pattern recognition followed by an in-depth introduction to various supervised learning algorithms such as Perceptron, Backpropagation algorithm, Decision trees, instance-based learning, support vector machines. Algorithmic-independent principles such as inductive bias, side information, approximation and estimation errors. Assessment of algorithms by jackknife and bootstrap error estimation, improvement of algorithms by voting methods such as boosting. Introduction to statistical learning theory, hypothesis classes, PAC learning model, VC-dimension, growth functions, empirical risk minimization, structural risk minimization.
Learning Outcomes
Gain in-depth familiarity with various classical and contemporary supervised learning algorithms, understand the underlying limitations and principles that govern learning algorithms and ways of assessing and improving their performance, understand the underlying fundamentals of statistical learning theory, the complexity of learning and its relationship to generalization ability.

Content:

Overview and Introduction to Bayes Decision Theory
Machine Intelligence and Applications
Pattern Recognition concepts
Classification, Regression, Feature Selection
Supervised Learning
Class conditional probability distributions
Examples of classifiers
Bayes optimal classifier and error
Learning classification approaches
Linear machines
General and Linear Discriminants
Decision regions
Single layer neural network
Linear separability, general position, number of dichotomies
General gradient descent
Perceptron learning algorithm
Mean square criterion and Widrow-Hoff learning algorithm
Multi-Layer Perceptrons
Introduction to Neural Networks, Two-Layers
Universal approximators
Backpropagation learning, on-line, off-line
Error surface, important parameters
Learning decision trees
Inference model, general domains, symbolic
Decision trees, consistency
Learning trees from training examples
Entropy, mutual information
ID3 algorithm criterion
C4.5 algorithm
Continuous test nodes, confidence
Pruning
Learning with incomplete data
Instance-based Learning
Nearest neighbor classification
k-Nearest neighbor
Nearest Neighbor error probability, proof
Simplification, Editing
Example: Document retrieval
Case-based reasoning
Example: learning graphical structures
Machine learning concepts and limitations
Fundamental algorithmic-independent concepts
Hypothesis class, Target class
Inductive bias, Occam's razor
Empirical risk
Limitations of inference machines
Approximation and estimation errors
Tradeoff
Machine learning assessment and Improvement
Statistical Model Selection
Structural Risk Minimization
Practical methods for risk assessment based on resampling, Jackknife, Bootstrap
Improving accuracy of general algorithms, Bagging, Boosting
Learning Theory
Formal model of the learnable
Sample complexity
Learning in zero-Bayes and realizable case
Growth function, VC-dimension
VC-dimension of Vector space of functions, proof
Empirical Risk Minimization over finite classes, sample complexity, proof
Empirical Risk Minimization over infinite classes, risk upper bound, proof
Lower bound on sample complexity
Support Vector Machines
Margin of a classifier
Dual Perceptron algorithm
Learning non-linear hypotheses with perceptron
Kernel functions, implicit non-linear feature space
Theory: zero-Bayes, realizable infinite hypothesis class, finite covering, margin-based bounds on risk
Maximal Margin classifier
Learning support vector machines as a dual-optimization problem

Method of Instruction:

Lecture presentations with associated class problems

Assessment:

The course has the following assessment components:

     
  •  Written Examination (2.5 hours, 75%)
  •  
  • Coursework Section (25%)
  • For full details of coursework see the course web page.

To pass this course, students must:

     
  • Obtain an overall pass mark of 50% for all sections combined
  •  

The examination rubric is:
Answer THREE questions out of FIVE. All questions carry equal marks.

Resources:

Text Book 1: The Elements of Statistical Learning: Data Mining, Inference and Prediction, Hastie.T., Tibshirani.R., and Friedman.J.,

 Springer [2001]

Reference Book 1: Pattern Classification, Duda.R.O., Hart.P.E., and Stork.D.G., John Wiley and Sons (2001) 

Reference Book 2: Pattern Recognition and Machine Learning, Bishop, Christopher M., Springer (2006)

Reference Book 3: An Introduction to Support Vector Machines, Shawe-Taylor J. and Cristianini N., Cambridge University Press

 (2000) 

Reference Book 4: Kernel Methods for Pattern Analysis, Shawe-Taylor.J, and Cristianini N., Cambridge University Press (2004)