# COMPG011 - Data Analytics

This database contains 2016-17 versions of the syllabuses. For current versions please see here.

Code COMPG011 MSc Good knowledge of basic mathematics and statistics. 2 Tomaso Aste (100%) The course is aimed at introducing to data analytics providing some basic data-science tools. Statistical tools to individuate regularities, discover patterns and laws in complex datasets will be introduced to students together with instruments to analyse, characterize, validate, parameterize and model complex data. Practical issues on business data analysis and statistics will be covered with specific case studies also in collaboration with industrial partners. Students will become able to analyse main statistical features of complex datasets. On successful completion of the course, a student should have a good understanding on: 1) how to analyse, characterize empirically complex data; 2) how to compute relevant statistical quantities and quantify their confidence intervals; 3) how to build sensible models and how to parameterize and validate these models; 4) how to quantify inter-dependency/causality structure between different variables; 5) how to use the outcome of data-analytics to develop better tools for forecasting.Applications:There is a great need to increase the data-analytics capability in the business community. Data scientists are in great increasing demand. Instruments and tools provided by this course are essential to understand, model and make practical use of the very large quantity of data that most businesses are currently collecting.Further information and material available to students on the course moodle page.

# Content

Empirical investigation of complex data
Essential practical familiarization with complex and big data. Typical challenges with real business data. Basics on data acquisition, manipulation, cleaning, filtering, representation and plotting.

Univariate and multivariate statistics
Marginal probability, joint probability and conditional probability. Empirical estimation of probability distributions. Measures of dependency. Cause and effect. Granger causality, mutual information, transfer entropy. Spurious correlations and regularization. Forecasting and regressions. Calibration, validation hypothesis testing.

Modelling and filtering through networks
Basics on complex networks: definitions and properties. Construction of networks of interactions form correlation and causality measures. Information filtering though networks.

Applications and case-study
Application of the studied material and methods to practical cases and real data will be done within the course through case-studies developed in collaboration with industrial partners.

# Method of Instruction

3 hours of lectures per week, practical exercises, case studie

# Assessment

The course has the following assessment components:

• Coursework (100%)

To pass this course, students must:

• Obtain an overall pass mark of 50% for all sections combined

# Resources

Dunlop, Dorothy D., and Ajit C. Tamhane. Statistics and data analysis: from elementary to intermediate. Prentice Hall, 2000.
G Casella and RL Berger, Statistical Inference, Thomson Learning 2002.
Newman “Networks: an introduction”, Oxford University Press

Further reading
Mayer-Schönberger, Viktor, and Kenneth Cukier. Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013.
Silver, Nate. The signal and the noise: Why so many predictions fail-but some don't. Penguin, 2012.
Ohlhorst, Frank J. Big data analytics: turning big data into big money. John Wiley & Sons, 2012.