COMP0084 Information Retrieval and Data Mining
This database contains the 2018-19 versions of syllabuses.
Note: Whilst every effort is made to keep the syllabus and assessment records correct, the precise details must be checked with the lecturer(s).
Information Retrieval and Data Mining
The module is aimed at an entry level study of information retrieval and data mining techniques. It is about how to find relevant information and subsequently extract meaningful patterns out of it. While the basic theories and mathematical models of information retrieval and data mining are covered, the course is primarily focused on practical algorithms of textual document indexing, relevance ranking, web usage mining, text analytics, as well as their performance evaluations.
On successful completion of the module, a student will master both the theoretical and practical aspects of information retrieval and data mining, and will be able to understand:
- the common algorithms and techniques for information retrieval (document indexing and retrieval, query processing, etc).
- the quantitative evaluation methods for the IR systems and data mining techniques.
- the popular probabilistic retrieval methods and ranking principles.
- the techniques and algorithms existing in practical retrieval and data mining systems such as those in web search engines and recommender systems, including the recently popular topic of deep learning.
- basic algorithms that can be used to make predictions out of data.
Availability and prerequisites
This module delivery is available for selection on the below-listed programmes. The relevant programme structure will specify whether the module is core, optional, or elective.
In order to be eligible to select this module as optional or elective, where available, students must meet all prerequisite conditions to the satisfaction of the module leader. Places for students taking the module as optional or elective are limited and will be allocated according to the department’s module selection policy.
Programmes on which available:
In order to be eligible to select this module, students must have:
Overview of the fields
Study some basic concepts of information retrieval and data mining, such as the concept of relevance, association rules, and knowledge discovery. Understand the conceptual models of an information retrieval and knowledge discovery system.
Indexing and Text Processing
Introduce various indexing techniques for textual information items, such as inverted indices, tokenization, stemming and stop words. Techniques used for text compression, such as the Lempel-ziv algorithm and Huffman Coding will be covered.
Study popular retrieval models: 1 Boolean, 2. Vector space, 3. Binary independence, 4. Language modelling. Probability ranking principle. Other commonly-used techniques such as relevance feedback, pseudo relevance feedback, and query expansion will also be covered.
Online and offline Evaluation technqiues to evaluate retrieval quality. Commonly used evaluation metrics such as average precision, NDCG, etc. "Cranfield Paradigm" and TREC conferences, as well as some recently popular techniques such as interleaving will be discussed.
Study basic techniques, algorithms, and systems of data mining and analytics, including frequentpattern and correlation and association analysis, basic machine learning algorithms such as linear regression and logistic regression. Discussion on basic personalisation and usage mining techniques.
Study new emerging areas such as learning to rank, deep learning, word embeddings and topic modelling.
An indicative reading list is available via http://readinglists.ucl.ac.uk/departments/comps_eng.html.
The module is delivered through a combination of lectures, tutorials, seminars, and project work.
This module delivery is assessed as below:
Report (5 pages)
In order to pass this module delivery, students must achieve an overall weighted module mark of 50%.