(i) differentiation d/dx followed by thresholding on the derivative magnitude,
(ii) smoothing in the x-direction with an m-tap filter and in the y-direction with an n-tap filter,
(iii) smoothing and differentiation d/dx as above in (i) and (ii) followed by thresholding on the magnitude of the smoothed derivative (ie. a simple vertical edge detector),
(iv) template matching by cross-correlation with a fixed P x Q template (ie a convolution), detection and location of the maximum of the cross-correlation and thresholding of the maximum.
For the complexity, express your answers as numbers of multiply, addition and comparison operations in single length arithmetic, as functions of the image size and algorithm parameters. For the latency, give estimates in terms of the times for each of the above operations and the image data rate, expressed in terms of frame, line and pixel times. Comment on computational effects that may affect the accuracy of your estimates.
Describe how to make the smoothing in (ii) in the x-direction independent of the filter size for (a) a uniform, top-hat filter and (b) a triangular filter. Explain why this property is important for the implementation of a real-time image processing system, and describe what dependency on the filter size still remains. Discuss to what extent the smoothing in the y-direction can be made to work in the same kind of way.
For the template matching in (iv) consider a two-dimensional template of P x Q pixels to be matched against every possible location in each image and consider carefully how best to detect and locate the maximum of the cross-correlation.
(i) using a pipeline with two stages, one stage for the combined smoothing and differentiatuion in the x-direction, and one for smoothing in the y-direction plus the thresholding,
(ii) a similar pipeline but using, where possible, IIR filters to implement uniform smoothing filters,
(iii) varying the number of stages in the pieplines so as to optimise the pipelines (i) and (ii) above,
(iv) using a multistage pipeline in which the filtering is implemented by means of repeated convolution with small FIR kernels of size 1 by 2 or 1 by 3 pixels, followed by the differentiation and thresholding.
Use theoretical estimates of the number of arithmetic operations to be carried out in each stage of the pipelines to estimate the throughput and latency of the various systems above and to compare the alternatives. Include consideration of any data buffering required and comment on whether the pipelines are balanced.
Discuss the feasibility of using a similar approach to implement the template matching described in 3.1 (iv).
Discuss how the architectures you have described may be used in a real-time image processing system in which each image is read into a frame buffer and then divided into a number of similar sized strips of height, h, each occupying a contiguous area of the buffer memory. Describe the parallel architecture you would use to implement the tasks listed in 3.1 above, explaining, in each case any problems you would expect to encounter, in particular, relating to access to the image data and writing results, possible resource contentions and what steps you would take to overcome the difficulties you have noted. You should consider implementations in which, for example: (i) you make multiple copies of the intial image data or of parts of it so that, for example, the strips of height, h, overlap, (ii) individual (non-overlapping) image strips are assigned to a particular processor and only that processor may access the data (and may therefore have to send it to other processors), and (iii) the whole of the image data is shared.
Bernard Buxton,
17 January 2001.