Maths Methods, Algorithms and Implementation: Practical - Multiprocessor Systems

Revision exercise, non-compulsory and not for assessment.

1.     Objective

In the lectures, you have heard about different types of multiprocessor computer systems and their use in real-time image processing and machine vision applications. The aim of this practical is to work through the way different multiprocessor architectures may be applied to a number of simple image processing tasks in order to illustrate their advantages and disadvantages.

2.     Procedure

This is an "essay" type coursework, but your report should be structured according to the sections below. Thus, by analogy with, say, the Machine Vision practicals, you should carry out the steps listed below and write a brief report containing: Any mathematical excursions you consider too lengthy to include in the main text may be included as appendices.

3.     Exercises

3.1     Complexity and Timing of Image Processing Operations

Estimate the computational complexity and latency of the following simple image processing operations carried out on an M x N, 8-bit image, when implemented using a single processor in such a way as to keep pace with the incoming data rate:

(i)    differentiation d/dx followed by thresholding on the derivative magnitude,

(ii)   smoothing in the x-direction with an m-tap filter and in the y-direction with an n-tap filter,

(iii)  smoothing and differentiation d/dx as above in (i) and (ii) followed by thresholding on the magnitude of the smoothed derivative (ie. a simple vertical edge detector),

(iv)  template matching by cross-correlation with a fixed P x Q template (ie a convolution), detection and location of the maximum of the cross-correlation and thresholding of the maximum.

For the complexity, express your answers as numbers of multiply, addition and comparison operations in single length arithmetic, as functions of the image size and algorithm parameters. For the latency, give estimates in terms of the times for each of the above operations and the image data rate, expressed in terms of frame, line and pixel times. Comment on computational effects that may affect the accuracy of your estimates.

Describe how to make the smoothing in (ii) in the x-direction independent of the filter size for (a) a uniform, top-hat filter and (b) a triangular filter. Explain why this property is important for the implementation of a real-time image processing system, and describe what dependency on the filter size still remains. Discuss to what extent the smoothing in the y-direction can be made to work in the same kind of way.

For the template matching in (iv) consider a two-dimensional template of P x Q pixels to be matched against every possible location in each image and consider carefully how best to detect and locate the maximum of the cross-correlation.

3.2    Pipelined image processing systems

Design a simple pipelined system with several identical processors, one for each stage of the pipe, for carrying out the differentiation, smoothing and thresholding operations on an image described in 3.1, (i), (ii) and (iii), above. Analyse trade-offs in expected performance (throughput and latency) as a function of the number of stages used in the pipeline for the following implementations:

(i)    using a pipeline with two stages, one stage for the combined smoothing and differentiatuion in the x-direction, and one for smoothing in the y-direction plus the thresholding,

(ii)    a similar pipeline but using, where possible, IIR filters to implement uniform smoothing filters,

(iii)    varying the number of stages in the pieplines so as to optimise the pipelines (i) and (ii) above,

(iv)    using a multistage pipeline in which the filtering is implemented by means of repeated convolution with small FIR kernels of size 1 by 2 or 1 by 3 pixels, followed by the differentiation and thresholding.

Use theoretical estimates of the number of arithmetic operations to be carried out in each stage of the pipelines to estimate the throughput and latency of the various systems above and to compare the alternatives.  Include consideration of any data buffering required and comment on whether the pipelines are balanced.

Discuss the feasibility of using a similar approach to implement the template matching described in 3.1 (iv).

3.3    Parallel systems for Image Processing

Explain why parallel systems are often used in real-time computing and describe how the common different parallel architectures may be classified in terms of the characteristics of their instruction (I) and data (D) flows. Indicate how a conventional, serial, von Neumann machine and the pielined systems described in 3.2 fit into this *I*D classification.

Discuss how the architectures you have described may be used in a real-time image processing system in which each image is read into a frame buffer and then divided into a number of similar sized strips of height, h, each occupying a contiguous area of the buffer memory. Describe the parallel architecture you would use to implement the tasks listed in 3.1 above, explaining, in each case any problems you would expect to encounter, in particular, relating to access to the image data and writing results, possible resource contentions and what steps you would take to overcome the difficulties you have noted. You should consider implementations in which, for example: (i) you make multiple copies of the intial image data or of parts of it so that, for example, the strips of height, h, overlap, (ii) individual (non-overlapping) image strips are assigned to a particular processor and only that processor may access the data (and may therefore have to send it to other processors), and (iii) the whole of the image data is shared.

3.4     Combined systems

Describe how parallel and pipelined systems may be combined. Explain the potential advantages and disadvantages of such systems and discuss briefly whether it would be useful to use a combined system for the tasks listed in 3.1, and studied in 3.2 and 3.3 above.

4. Reporting

As usual, beware of labouring long and hard to produce a pretty report. There is a rapidly diminishing rate of return for such efforts. Evidence that the exercise has been completed and content of the report is much more important than presentation.

Bernard Buxton,
17 January 2001.