COMPGA16 - Malware

This database contains the 2017-18 versions of syllabuses. Syllabuses from the 2016-17 session are available here.

Note: Whilst every effort is made to keep the syllabus and assessment records correct, the precise details must be checked with the lecturer(s).

 CodeCOMPGA16 (Also taught as COMPM066)
YearMSc
PrerequisitesUndergraduate courses in logic and discrete mathematics, assembly, and imperative programming.
Term1
Taught By

David Clark (60%) [Module Leader]

Jens Krinke (20%)

Earl Barr (20%)

Hector Menendez (90%) [LAB support]

Aims

To provide students with:

  1. Specialist understanding of the issues and techniques in malware detection and classification.
  2. Broad understanding of the human, social, economic and historical context in which malware occurs.
Learning Outcomes

Successful completion of this course will provide students with a specialist understanding of the nature of malware, its capabilities, and how it is combatted through detection and classification. Students will understand what are the underlying scientific and logical limitations on society’s ability to combat malware. Furthermore, students should have an appreciation and broad understanding of the social, economic and historical context in which malware occurs.

Content

Laboratory work (24% assessment) Nine 2 hour labs

Topics: Introduction (malware analysis, tools list). Lab 1: architecture; Labs 2 and 3: 8086 instructions; Lab 4: from C to assembly; Labs 5 and 6: Radare 2; Lab 7: static analysis; Lab 8: dynamic analysis (Wireshark, PIN); Lab 9: packing/unpacking (Yara, PEID)

Introduction

  1. The taxonomy of malware and its capabilities: viruses, Trojan horses, rootkits, backdoors, worms, targeted malware
  2. History of malware

The social and economic context for malware

  1. crime, anti-malware companies, legal issues, the growing proliferation of malware

Basic Analysis

  1. Signature generation and detection
  2. clone detection methods

Static analysis theory

  1. program semantics
  2. abstract interpretation framework

Static Analysis

  1. System calls: dependency analysis issues in assembly languages; semantic invariance of system call sequences
  2. abstract interpretation as a formal framework for detection
  3. taint-based analyses
  4. semantic clones

Dynamic Analysis

  1. virtualization: semantic gap
  2. reverse engineering
  3. hybridisation with static analysis

Similarity metrics

  1. Kolmogorov Complexity
  2. association metrics
  3. other entropy based metrics
  4. NLP based approaches.

Problems in large scale classification

  1. scalability
  2. triage methods
  3. Required FP rate

Hiding

  1. Polymorphism
    1. compression
    2. encryption
    3. virtualization
  2. Metamorphism
    1. high level code obfuscation engines
    2. on-board metamorphic engines
    3. semantics-preserving rewritings
  3. Frankenstein

The theory of malware

  1. Rice’s theorem and the undecidability of semantic equivalence
  2. Adleman’s proof of the undecidability of the presence of a virus
  3. Cohen’s experiments on detectability and self-obfuscation

Method of instruction

Lectures, class-room based exercises and labs

Assessment

The module has the following assessments:

  • Examination (2.5 hours) (70%)
  • Coursework (30%)

To pass this module, students must:

  • Obtain an overall pass mark of 50% for all components combined.

Reading

Reading list available via the UCL Library catalogue.