Offerta formativa | Università degli Studi di Firenze

Course year

Second year - First Semester

Belonging Department

Statistics, IT and its applications "G. Parenti" (DISIA)

Course Type

Single education field course

Scientific Area

SECS-S/01 - STATISTICS

Credits

6

Teaching Hours

48

Teaching Term

11/09/2023 ⇒ 22/12/2023

Attendance required

No

Type of Evaluation

Final Grade

Course Content

show

Course program

show

Lectureship

GOTTARD ANNA

Mutuality

Course teached as:
B025406 - MULTIVARIATE ANALYSIS AND STATISTICAL LEARNING
Second Cycle Degree in STATISTICS AND DATA SCIENCE
Curriculum GENERALE

Teaching Language

English

Course Content

The course covers the statistical methods for multivariate analysis and for machine learning. See the Course Program for further details.

Learning Objectives

The course introduces the student to statistical methods and to statistical learning for studying multivariate and high-dimensional data. In particular, it will provide the fundamental tools necessary to understand and apply the recent scientific literature on the statistical aspects of machine learning and on multivariate models.
Labs with the R language will integrate the course to facilitate the understanding, interpretation, and use of the proposed methodologies. At the end of the course, the students will understand multivariate statistics and the statistical aspects of machine learning and be able to choose and apply an appropriate methods and algorithms in specific contexts. They will be able to critically examine the output of an algorithm/model and visualize and present it. They will have the instruments to understand and compare new techniques with existing ones.

Prerequisites

Statistical Inference Statistical Models (ordinary regression models) Matrix Algebra

For student in SDS: Preliminary requirements: Statistical inference; Probability and Mathematics for Statistics

Teaching Methods

Lectures, labs, flipped classes, group contests.

Further information

Attendance is strongly recommended

Type of Assessment

The exam consists of two parts:

(1) Homework, to be uploaded on Moodle for students attending classes and briefly presented in class. For students who do not complete 75% of the homework, a brief oral test (25% of the final score) will be added to point (2).

(2) Seminarial presentation of two projects aimed at demonstrating personal mastery of the course topics.
For attending students, the first project can be prepared in a group and presented in a contest between groups (30% of the final grade). The topic of the projects will be chosen by the students within the topics covered in the course and extensions thereof.
Before the presentation, slides and codes must be uploaded to the Moodle platform.

The following skills will be evaluated: comprehension of the research topic, application of theoretical and computational tools, rigor in using the methodologies selected and the capacity to defend the conclusions obtained.

Course program

(1) Introduction to statistical learning. Definition of statistical learning and difference among machine learning and statistical models. Supervised and Unsupervised Learning. Regression and Classification. Accuracy measures. Trade-off variance and bias.

(2) Data Generating Process, Monte Carlo simulations, Resampling and cross-validation methods.
(3) Introduction to nonparametric regression, piecewise constant and polynomial regression, splines, kernel regression.
(4) Linear Model Selection and Subset Selection. Regularized estimators: Ridge, Lasso, Elastic net, Adalasso
(5) Tree-based algorithms: CART, conditional trees, oblique trees.
Tree-based ensembles: bagging, boosting, adaboost, gradient boosting (also non-tree based version), Random Forest, BART.

(6) Dimension reduction methods: PCA and SVD and their relationship

(7) Clustering: hierarchical and non-hierarchical algorithms and their characterization, probabilistic algorithms (Gaussians mixtures)
(8) Ensemble of strong classifiers: Super Learner
(9) SVM and SVM with kernel
(10) Introduction to graphical models Graphs and conditional independence properties Undirected graphs (networks / Markov random fields) Markov properties and factorization Gaussian graphical models Log-linear graphical models Directed Graphs (Bayesian networks / DAGs) Markov properties and factorization Learning Basics of Chain Graphs Markov properties and factorization

B027495 - MULTIVARIATE ANALYSIS AND STATISTICAL LEARNING

Academic Year 2023-24

Teaching Language

Course Content

Suggested readings (Search our library's catalogue)

Learning Objectives

Prerequisites

Teaching Methods

Further information

Type of Assessment

Course program

Sustainable Development Goals 2030