Pattern Recognition Practical - PCA and Classification

5%, To be completed by 20/11/06, to be marked 11/12/06 .

1. Objectives

The aim of this practical is to enhance your understanding of the lectures on principal components anaylsis and two-category classification through a number of experiments on face expression data from a series of pre-processed face images on which a number (19) of landmark points have been identified.

Your demonstrator for this practical is Vasileios Zografos. His predecessor, Ben Dias maintained a web-page complementary to this one, where you can find additional instructions, hints and tips for many of the PR&MV practicals. You can access this page by clicking  here .

2. Procedure

Carry out each of the steps listed below and write a brief report containing: Include any code you write yourself (suitably commented) in an appendix together with any scripts.

3. Exercises

3.1 The Face Data

Excel files containing the face image landmark points may be copied from: In each file the landmarks are stored as a list of 19 pairs of x and y pixel co-ordinates for each of the faces. If you want to view the images from which these landmark points were obtained, they may be found, stored in a similar order, in: and may be viewed by use of the Matlab Image Toolbox, or by using xv, Microsoft Photo Editor or any other convenient tool.

3.2 Principal Components Analysis

The 19 landmarks form a 38 dimensional feature vector characterising the shape of each of the face images. Construct the scatter matrix, ST, for the 40 face images that have been classified as happy and sad and carry out a principal components analysis on it. Plot a graph of the eigenvalues, l(k), obtained, comment on its form and determine the fraction of the total variation in these labelled datasets that is explained by the first three principal components. Almost all the analysis in the remainder of this assignment is performed in the low-dimensional feature space spanned by these first three principal components.

Determine the vectors b(i) describing the weight of each of these first three principal components for each of these faces images, i=1, 2...40 and plot them in the low-dimensional feature space spanned by these principal components. Calculate the mean,  m, of all 40 classified faces in this 3-dimensional feature space and comment on the distribution of the 40 data points. Label the happy and sad faces distinctively or plot them separately, calculate the mean of the happy faces,  mH, and of the sad faces, mS, and add them to your plots. Comment on the distribution of the data now that points for each class can be identified. Use two-dimensional plots b1(i) against b2(i) etc if it helps visualize the results.

3.3 Scatter Matrices

The labelled face data have been identified as representative of  "happy" and "sad" expressions for this subject. Determine, in the three-dimensional feature space spanned by these first three principal components, the scatter (or "covariance") matrices for: the happy faces SH, the sad faces SS, the within class covariance SW, and the between class covariance SB, and calculate the total variation: sH2, sS2, sW2 , and sB2 of each. Construct a suitable total scatter matrix  ST from the matrix ST calculated in 3.2 above and verify that the matrix you construct is given by the sum ST = SB + SW.

Carry out a principal components analysis (PCA) of the five scatter matrices SH, SS, SW, SB and ST = SB + SW constructed above and comment on the results obtained.

3.4 Classification of Labelled Data

Construct five classifiers in this 3-dimensional feature space according to the following criteria: Describe the form of the decision surfaces defined by each of the above classifiers and discuss their relationship, if any, with each other and with Fisher's linear discriminant and linear discriminant analysis.

3.5 Performance Evaluation

We are interested in classifying the happy faces. Use the number of faces correctly classified as happy to compute the true positive rate TP, for each of the above classifiers and the number incorrectly classified as happy to compute the false positive rate FP.  Also calculate the total error rate which, since the numbers of happy and of sad face examples are identical, is given by 1-TP + FP.

Use the results obtained to comment on the performance of each classifier on the 40 faces supplied. In particular, which of the five classifiers gives: (a) the highest true positive rate, (b) the lowest false positive rate, and (c) the lowest error rate; and which of the five classifiers would you describe as the "best"?

3.6 Training and Generalisation

Repeat the experiments described in sections 3.4 and 3.5, but using only half of the happy faces (chosen at random) and also half of the sad faces (again chosen at random) to construct the classifiers. Evaluate the performance of the resulting classifiers on the training data and on the remaining, test data not used in their construction. Comment on the results obtained on these two data sets and also in comparison to the performance of the classifiers constructed in section 3.4: (a) on the training data, (b) on the test data, (c) on all the data.

Given the small size of the training sets, the classifiers constructed in this section are not expected to perform well. Explain why this is expected, comment on whether this expectation is borne out by your experiments and describe a procedure that you would expect to lead to better results. [NB. It is not required that you implement the procedure you suggest.]

3.7 Introducing a Threshold

Use the difference of the squares of each of the four distance functions defined in section 3.4 to introduce a threshold, T, such that positive values of T bias the classifiers in favour of classifying examples as 'happy' (ie make the classifiers more likely to classifier examples as 'happy', for example for the first criterion, by considering whether (b-mH)2-(b-mS)2>T or not). Describe the classification systems you implement and construct a receiver operating characteristic (ROC) curve for each. Describe how the extreme values of the threshold T may be chosen so as to ensure (TP=0, FP=0) and (TP=1, FP =1) and comment on the resulting ROC curves obtained. Which of the classifiers for which you have constructed an ROC curve would you describe as the "best"?

Discuss what changes you would have to make in your implementations and what you would expect to happen if (a) the difference of each of the distance functions were used, rather than the squares of the distances, (b) if ratios of the distance functions were used.

3.8 Unlabelled Data

Landmark points for 80 other face examples not classified as happy or sad expressions are contained in /cs/research/vision/images3/starship/mbdias/pat_rec/uclassified/unclassified.xls. Plot these as points in the 3-dimensional feature space defined in 3.2 and comment on their distribution. In particular, using only the landmark data, say whether, from these plots in the 3-dimensional feature space,  you think: (a) these are examples of faces showing the same expression, or a number of different expressions and, if the latter, how many different types of expression; (b) whether any of them are likely to be from happy or from sad faces. In each case give reasons for your opinions and check your conclusions by looking at a selection of the example images. You may use the classifiers developed in previous sections to help you decide whether these pther faces are likely to be 'happy' or 'sad'.

4. Reporting

Produce a summary report on paper (in the format indicated in section 2 above) which includes a critical appraisal of the performance of the classifiers you have implemented.
 

Bernard Buxton
2006