COMP0084 Information Retrieval and Data Mining

This database contains the 2018-19 versions of syllabuses.

Note: Whilst every effort is made to keep the syllabus and assessment records correct, the precise details must be checked with the lecturer(s).

Academic session



Information Retrieval and Data Mining



Module delivery

1819/A7P/T2/COMP0084 Postgraduate

Related deliveries

1819/A7U/T2/COMP0084 Masters (MEng)

Prior deliveries




FHEQ Level


FHEQ credits



Term 2

Module leader

Yilmaz, Emine


Yilmaz, Emine

Cox, Ingemar

Module administrator

Abbaro, Besheer


The module is aimed at an entry level study of information retrieval and data mining techniques. It is about how to find relevant information and subsequently extract meaningful patterns out of it. While the basic theories and mathematical models of information retrieval and data mining are covered, the course is primarily focused on practical algorithms of textual document indexing, relevance ranking, web usage mining, text analytics, as well as their performance evaluations.

Learning outcomes

On successful completion of the module, a student will master both the theoretical and practical aspects of information retrieval and data mining, and will be able to understand:

  1. the common algorithms and techniques for information retrieval (document indexing and retrieval, query processing, etc).
  2. the quantitative evaluation methods for the IR systems and data mining techniques.
  3. the popular probabilistic retrieval methods and ranking principles.
  4. the techniques and algorithms existing in practical retrieval and data mining systems such as those in web search engines and recommender systems, including the recently popular topic of deep learning.
  5. basic algorithms that can be used to make predictions out of data.

Availability and prerequisites

This module delivery is available for selection on the below-listed programmes. The relevant programme structure will specify whether the module is core, optional, or elective.

In order to be eligible to select this module as optional or elective, where available, students must meet all prerequisite conditions to the satisfaction of the module leader. Places for students taking the module as optional or elective are limited and will be allocated according to the department’s module selection policy.

Programmes on which available:

  • MSc Business Analytics (with specialisation in Computer Science)
  • MSc Computational Statistics and Machine Learning
  • MSc Data Science (International Programme)
  • MRes Computational Statistics and Machine Learning
  • MRes Financial Computing
  • MRes Web Science and Big Data Analytics
  • MSc Spatio-Temporal Analytics & Big Data Mining (and PGDip and Cert)
  • MSc Data Science for Research in Health and Biomedicine (and PGDip/Cert)
  • MSc Scientific Computing
  • MSc Data Science


In order to be eligible to select this module, students must have:

  • an understanding of probability and statistics; and
  • proficiency in java programming (as demonstrated by a least one programing project in the past)


Overview of the fields

Study some basic concepts of information retrieval and data mining, such as the concept of relevance, association rules, and knowledge discovery. Understand the conceptual models of an information retrieval and knowledge discovery system.

Indexing and Text Processing

Introduce various indexing techniques for textual information items, such as inverted indices, tokenization, stemming and stop words. Techniques used for text compression, such as the Lempel-ziv algorithm and Huffman Coding will be covered.

Retrieval Methods

Study popular retrieval models: 1 Boolean, 2. Vector space, 3. Binary independence, 4. Language modelling. Probability ranking principle. Other commonly-used techniques such as relevance feedback, pseudo relevance feedback, and query expansion will also be covered.


Online and offline Evaluation technqiues to evaluate retrieval quality. Commonly used evaluation metrics such as average precision, NDCG, etc. "Cranfield Paradigm" and TREC conferences, as well as some recently popular techniques such as interleaving will be discussed.

Data Mining

Study basic techniques, algorithms, and systems of data mining and analytics, including frequentpattern and correlation and association analysis, basic machine learning algorithms such as linear regression and logistic regression. Discussion on basic personalisation and usage mining techniques.

Emerging Areas

Study new emerging areas such as learning to rank, deep learning, word embeddings and topic modelling.

An indicative reading list is available via


The module is delivered through a combination of lectures, tutorials, seminars, and project work.


This module delivery is assessed as below:



Weight (%)



Report (5 pages)




Coursework 1



In order to pass this module delivery, students must achieve an overall weighted module mark of 50%.