Current Students

COMPM066 - Malware

This database contains 2016-17 versions of the syllabuses. For current versions please see here.

CodeCOMPM066 (Also taught as: COMPGA16)
Year4
PrerequisitesUndergraduate courses in logic and discrete mathematics, assembly, and imperative programming.
Term1
Taught ByDavid Clark (Module Leader), Jens Krinke, Earl Barr, Paul Gill, Gianluca Stringhini, Hector Menendez, Sukriti Bhattacharya.
AimsTo provide students with (1) Specialist understanding of the issues and techniques in malware detection and classification (2) Broad understanding of the human, social, economic and historical context in which malware occurs.
Learning OutcomesSuccessful completion of this course will provide students with a specialist understanding of the nature of malware, its capabilities, and how it is combatted through detection and classification. Students will understand what are the underlying scientific and logical limitations on society’s ability to combat malware. Furthermore, students should have an appreciation and broad understanding of the social, economic and historical context in which malware occurs.

Content

Laboratory work (24% assessment) Nine 2 hour labs

Topics: Introduction (malware analysis, tools list). Lab 1: architecture; Labs 2 and 3: 8086 instructions; Lab 4 from C to assembly; Labs 5 and 6: Radare 2; Lab 7: static analysis; Lab 8: dynamic analysis (Wireshark, PIN); Lab 9 packing/unpacking (Yara, PEID)

Introduction: a. The taxonomy of malware and its capabilities: viruses, Trojan horses, rootkits, backdoors, worms, targeted malware; b. History of malware

The social and economic context for malware: crime, anti-malware companies, legal issues, the growing proliferation of malware

Basic Analysis: a. Signature generation and detection b. clone detection methods

Static analysis theory: a. program semantics, b. abstract interpretation framework

Static Analysis: a. System calls: dependency analysis issues in assembly languages; semantic invariance of system call sequences; b. abstract interpretation as a formal framework for detection; c. taint-based analyses; d. semantic clones

Dynamic Analysis: a. virtualization: semantic gap; b. reverse engineering; c. hybridisation with static analysis

Similarity metrics: a. Kolmogorov Complexity; b. association metrics; c. other entropy based metrics; d. NLP based approaches.

Problems in large scale classification: a. scalability; b. triage methods; c. Required FP rate

Hiding: a. Polymorphism: i. compression ii. encryption iii. virtualization; b. Metamorphism: i. high level code obfuscation engines ii. on-board metamorphic engines iii. semantics-preserving rewritings; c. Frankenstein

The theory of malware: a. Rice’s theorem and the undecidability of semantic equivalence; b. Adleman’s proof of the undecidability of the presence of a virus; c. Cohen’s experiments on detectability and self-obfuscation

Method of instruction

Lectures, class-room based exercises and occasional labs

Assessment

The module has the following assessment components:

  • Examination (2.5 hours, 70%)
  • Coursework (30%)

To pass this module, students must:

  • Obtain an overall pass mark of 40% for each component;
  • Obtain a minimum mark of 40% in each component worth ≥ 30% of the module as a whole.

 

 

Reading

Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software By Michael Sikorski, Andrew Honig (for lab work).

Other texts and papers as advised.