COMPG011 - Data Analytics
This database contains 2016-17 versions of the syllabuses. For current versions please see here.
Good knowledge of basic mathematics and statistics.
|Taught By||Tomaso Aste (100%)|
The course is aimed at introducing to data analytics providing some basic data-science tools. Statistical tools to individuate regularities, discover patterns and laws in complex datasets will be introduced to students together with instruments to analyse, characterize, validate, parameterize and model complex data. Practical issues on business data analysis and statistics will be covered with specific case studies also in collaboration with industrial partners.
Students will become able to analyse main statistical features of complex datasets. On successful completion of the course, a student should have a good understanding on: 1) how to analyse, characterize empirically complex data; 2) how to compute relevant statistical quantities and quantify their confidence intervals; 3) how to build sensible models and how to parameterize and validate these models; 4) how to quantify inter-dependency/causality structure between different variables; 5) how to use the outcome of data-analytics to develop better tools for forecasting.
Empirical investigation of complex data
Essential practical familiarization with complex and big data. Typical challenges with real business data. Basics on data acquisition, manipulation, cleaning, filtering, representation and plotting.
Univariate and multivariate statistics
Marginal probability, joint probability and conditional probability. Empirical estimation of probability distributions. Measures of dependency. Cause and effect. Granger causality, mutual information, transfer entropy. Spurious correlations and regularization. Forecasting and regressions. Calibration, validation hypothesis testing.
Modelling and filtering through networks
Basics on complex networks: definitions and properties. Construction of networks of interactions form correlation and causality measures. Information filtering though networks.
Applications and case-study
Application of the studied material and methods to practical cases and real data will be done within the course through case-studies developed in collaboration with industrial partners.
Method of Instruction
3 hours of lectures per week, practical exercises, case studie
The course has the following assessment components:
• Coursework (100%)
To pass this course, students must:
• Obtain an overall pass mark of 50% for all sections combined
Dunlop, Dorothy D., and Ajit C. Tamhane. Statistics and data analysis: from elementary to intermediate. Prentice Hall, 2000.
G Casella and RL Berger, Statistical Inference, Thomson Learning 2002.
Newman “Networks: an introduction”, Oxford University Press
Mayer-Schönberger, Viktor, and Kenneth Cukier. Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013.
Silver, Nate. The signal and the noise: Why so many predictions fail-but some don't. Penguin, 2012.
Ohlhorst, Frank J. Big data analytics: turning big data into big money. John Wiley & Sons, 2012.