Summary of High Dimensional Data Analysis

Course Syllabus

Italiano ‎(it)‎
English ‎(en)‎

Export

Obiettivi formativi

Questo è un corso avanzato di statistica che ha come oggetto principale l'analisi di high-dimensional data. L'obietto del corso è quello di presentare le moderne tecniche di analisi dei dati e la teoria statistica sottostante, coniugando armoniosamente aspetti teorici, pratici e computazionali.

Contenuti sintetici

Il corso riguarda metodi di regressione e classificazione che possono essere impiegati nel caso di high-dimensional data.

Programma esteso

Regressione lineare, bias/variance trade-off
Regressione penalizzata, ridge regression e lasso.
Sezione del modello, metodi di validazione incrociata
Regressione nonparametrica. Nearest neighbors. Kernel smoothing. Regression splines, Smoothing splines, Local regression

Prerequisiti

Sono necessarie conoscenze di probabilità ed inferenza, algebra lineare, programmazione.

Metodi didattici

Tutte le lezioni si svolgono in laboratorio, integrando aspetti di carattere teorico con quelli computazionali attraverso l'uso di R.

Nel periodo di emergenza Covid-19 le lezioni si svolgeranno da remoto asincrono con eventi in videoconferenza sincrona.

Modalità di verifica dell'apprendimento

Presentazione di un lavoro di gruppo su un progetto concordato con il docente e una prova individuale scritta. Il voto finale sarà una media ponderata tra: prova scritta (50%), progetto di lavoro (30%), presentazione e discussione orale del progetto (20%).

Ciascun progetto può riguardare un articolo o un capitolo di libro su un argomento specifico trattato nel corso. La relazione deve comprendere una descrizione della metodologia utilizzata, una sua discussione critica e l'implementazione del metodo descritto attraverso R, dopo aver scelto opportunamente un set di dati. Il lavoro di gruppo può prevedere al massimo tre studenti per gruppo.

Tipologia di prova:

-prova individuale scritta: domande aperte e project work

- prova individuale orale: colloquio di discussione sullo scritto, sul project work e su argomenti trattati a lezione

Nel periodo di emergenza Covid-19 gli esami individuali orali saranno solo telematici. Verranno svolti utilizzando la piattaforma WebEx e nella pagina e-learning dell'insegnamento verrà riportato un link pubblico per l'accesso all'esame di possibili spettatori virtuali.

Testi di riferimento

Materiale didattico fornito dal docente
Azzalini, Scarpa (2012) Data analysis and data mining, an introduction . New York: Oxford University Press
Gareth, Witten, Hastie, Tibshirani (2014) An Introduction to Statistical Learning, with Applications in R . Springer
Hastie, Tibshirani, Friedman (2009) The Elements of Statistical Learning. Data Mining, Inference and Prediction . Springer
Hastie, Tibshirani and Wainwright (2015) Statistical Learning with Sparsity: The Lasso and Generalizations . CRC Press

Periodo di erogazione dell’insegnamento

Primo Semestre

Lingua di insegnamento

Italiano

Export

Learning objectives

This is an advanced course focusing on the analysis of high-dimensional data. The goal is to study modern methods and their underlying theory, drawing together theory, data, computation and recent research.

This course covers methods for regression and classification which can be applied to high-dimensional data.

Detailed program

Linear regression, bias/variance trade-off
Regularization, ridge and lasso regression
Model selection, cross-validation
Nonparametric Regression. Nearest neighbors. Kernel smoothing. Regression splines, Smoothing splines, Local regression

Prerequisites

Basic knowledge of statistics and probability, linear algebra and computer programming.

Teaching methods

Theoretical lessons and computer applications in lab with R software.

In the Covid-19 emergency period, lessons are held remotely asynchronously with synchronous
videoconferencing events.

Assessment methods

Presentation of a group project work decided with the lecturer and written individual exam on the theoretical part. Grading is based on the written exam (50%), group project (30%) and project presentation and discussion (20%).

Each project typically comprises a paper or a book chapter on a specific topic from the theme of this class. You are expected to understand the proposed method(s), implement them and evaluate them on data sets. For the work on the project, you are encouraged to form teams consisting of at most three people.

Type of exam:

- Written individual exam with open questions and project work

- Oral individual exam to assess the theoretical knowledge of the student on the topics presented during the course and project work presentation

During the Covid-19 emergency period, oral exams will only be online.
They will be carried out using the WebEx platform and on the e-learning page of the
course there will be a public link for access to the examination of possible virtual spectators.

Textbooks and Reading Materials

Lecture notes provided by the instructor
Azzalini, Scarpa (2012) Data analysis and data mining, an introduction . New York: Oxford University Press
Gareth, Witten, Hastie, Tibshirani (2014) An Introduction to Statistical Learning, with Applications in R . Springer
Hastie, Tibshirani, Friedman (2009) The Elements of Statistical Learning. Data Mining, Inference and Prediction . Springer
Hastie, Tibshirani and Wainwright (2015) Statistical Learning with Sparsity: The Lasso and Generalizations . CRC Press

Semester

First semester

Teaching language

Italian

Enter

Field of research

SECS-S/03

ECTS

Term

First semester

Activity type

Mandatory to be chosen

Course Length (Hours)

Degree Course Type

2-year Master Degreee

Teacher

Gianna Serafina Monti

View previous A.Y. opinion

Find the books for this course in the Library

Manual enrolments

Self enrolment (Student)