COMP0084 Information Retrieval and Data Mining

This database contains the 2018-19 versions of syllabuses.

Note: Whilst every effort is made to keep the syllabus and assessment records correct, the precise details must be checked with the lecturer(s).

Academic session

2018-19

Module

Information Retrieval and Data Mining

Code

COMP0084

Module delivery

1819/A7U/T2/COMP0084 Masters (MEng)

Related deliveries

1819/A7P/T2/COMP0084 Postgraduate

Prior deliveries

COMPM052

Level

Masters (MEng)

FHEQ Level

L7

FHEQ credits

15

Term/s

Term 2

Module leader

Yilmaz, Emine

Contributors

Yilmaz, Emine

Cox, Ingemar

Module administrator

Ball, Louisa

Aims

The module is aimed at an entry level study of information retrieval and data mining techniques. It is about how to find relevant information and subsequently extract meaningful patterns out of it. While the basic theories and mathematical models of information retrieval and data mining are covered, the course is primarily focused on practical algorithms of textual document indexing, relevance ranking, web usage mining, text analytics, as well as their performance evaluations. 

Learning outcomes

On successful completion of the module, a student will master both the theoretical and practical aspects of information retrieval and data mining, and will be able to understand:

  1. the common algorithms and techniques for information retrieval (document indexing and retrieval, query processing, etc).
  2. the quantitative evaluation methods for the IR systems and data mining techniques.
  3. the popular probabilistic retrieval methods and ranking principles.
  4. the techniques and algorithms existing in practical retrieval and data mining systems such as those in web search engines and recommender systems, including the recently popular topic of deep learning.
  5. basic algorithms that can be used to make predictions out of data.

Availability and prerequisites

This module delivery is available for selection on the below-listed programmes. The relevant programme structure will specify whether the module is core, optional, or elective.

In order to be eligible to select this module as optional or elective, where available, students must meet all prerequisite conditions to the satisfaction of the module leader. Places for students taking the module as optional or elective are limited and will be allocated according to the department’s module selection policy.

Programmes on which available:

  • MEng Computer Science (International Programme) (Year 4)
  • MEng Computer Science (Year 4)
  • MEng Mathematical Computation (International Programme) (Year 4)
  • MEng Mathematical Computation (Year 4)

Prerequisites:

In order to be eligible to select this module, students must have:

  • an understanding of probability and statistics; and
  • proficiency in java programming

Content

Overview of the fields

Study some basic concepts of information retrieval and data mining, such as the concept of relevance, association rules, and knowledge discovery. Understand the conceptual models of an information retrieval and knowledge discovery system.

Indexing and Text Processing

Introduce various indexing techniques for textual information items, such as inverted indices, tokenization, stemming and stop words. Techniques used for text compression, such as the Lempel-ziv algorithm and Huffman Coding will be covered.

Retrieval Methods

Study popular retrieval models: 1 Boolean, 2. Vector space, 3. Binary independence, 4. Language modelling. Probability ranking principle. Other commonly-used techniques such as relevance feedback, pseudo relevance feedback, and query expansion will also be covered.

Measurements

Online and offline Evaluation technqiues to evaluate retrieval quality. Commonly used evaluation metrics such as average precision, NDCG, etc. "Cranfield Paradigm" and TREC conferences, as well as some recently popular techniques such as interleaving will be discussed.

Data Mining

Study basic techniques, algorithms, and systems of data mining and analytics, including frequentpattern and correlation and association analysis, basic machine learning algorithms such as linear regression and logistic regression. Discussion on basic personalisation and usage mining techniques.

Emerging Areas

Study new emerging areas such as learning to rank, deep learning, word embeddings and topic modelling.

An indicative reading list is available via http://readinglists.ucl.ac.uk/departments/comps_eng.html.

Delivery

The module is delivered through a combination of lectures, practical exercises, and self-directed learning.

Delivery

The module is delivered through a combination of lectures, tutorials, seminars, and project work.

Assessment

This module delivery is assessed as below:

#

Title

Weight (%)

Notes

1

Report (5 pages)

50

 

2

Coursework 1

50

 

In order to pass this module delivery, students must:

  • achieve an overall weighted module mark of at least 50%; and
  • achieve a mark of at least 40% in any components of assessment weighed ≥ 30% of the module.

Where a component comprises multiple assessments, the minimum mark applies to the overall component.