COMPGI10 - Bioinformatics

This database contains 2016-17 versions of the syllabuses. For current versions please see here.

Code COMPGI10 (Also taught as: COMPM058 Bioinformatics)
Year MSc
Prerequisites It is expected that students will already be familiar with the principles of techniques such as neural networks, Support Vector Machines, Hidden Markov Models from earlier parts of the course.
Term 2
Taught By David Jones (66%), Kevin Bryson (33%)
Aims The overall aim of this course is to introduce students to the new field of bioinformatics (computational biology) and how machine learning techniques can be employed in this area. The course is aimed at students who have no previous knowledge of biology and so the aim of Part 1 of the course is to give a basic introduction to molecular biology as a background for bioinformatics. Part 2 will concentrate on modern bioinformatics applications, particularly those which make good use of pattern recognition and machine learning methods.
Learning Outcomes To have a basic knowledge of modern molecular biology and genomics. To understand the advantages and disadvantages of different machine learning techniques in bioinformatics and how the relative merits of different approaches can be evaluated by correct benchmarking techniques. To understand how theoretical approaches can be used to model and analyse complex biological systems.


Part 1: Basic molecular biology 
Introduction to Basic Cell Chemistry: Cell chemistry and macromolecules. Biochemical pathways e.g. Glycolysis. Protein structure and functions.
Cell Structure and Function: Cell components. Different types of cell. Chromosome structure and organisation. Cell division.
The Hereditary Material: DNA structure, replication and protein synthesis. Structure and roles of RNA. Genetic code. Mechanism of protein synthesis: transcription and translation. Mutation.
Recombinant DNA Technology: Restriction enzymes. Hybridisation techniques. Gene cloning. Polymerase chain reaction.
Genomics and Structural Genomics: Genes, genomes, mapping and DNA sequencing.

Part 2: Bioinformatics Applications 
Biological Databases: Overview of the use and maintenance of different databases in common use in biology. Case study: the CATH database of protein structure.
Gene Prediction: Methods for analysing genomic DNA to identify genes. Techniques: neural networks and HMMs.
Detecting Distant Homology: Methods for inferring remote relationships between genes and proteins. Techniques: dynamic programming, HMMs, hierachical clustering.
Protein Structure Prediction: Methods for predicting the secondary and tertiary structure of proteins. Techniques: neural networks, SVMs, genetic algorithms and stochastic global optimization.
Transcriptomics: Methods for analysing gene expression and microarray data. Techniques: hypothesis testing, clustering, SVMs.
Agent-based Genome Analysis: Automation of genome analysis using intelligent software agents.
Systems Biology: mathematical modelling of biological systems.

Method of Instruction:

Lecture presentations with associated class problems and group presentation/discussion of key research papers.


The course has the following assessment components:

  • Written Examination (2.5 hours, 85%)
  • Coursework Section (1 individual mini-project, 15%)


To pass this module, students must:


  • Obtain an overall pass mark of 50% for all components combined.



Biochemistry - Lubert Stryer, WH Freeman and Co.

Post-genome Informatics, M. Kanehisa, Oxford University Press.


Bioinformatics - Genes, Proteins and Computers, C.A. Orengo, D.T. Jones and

J.M. Thornton, BIOS Scientific Publishers, 2003



Mathematical Biology, J.D. Murray, Springer, 1993.

Other references (including research papers) to be confirmed.