Course information | Streaming Data Management and Time Series Analysis

Course Syllabus

Italiano ‎(it)‎
English ‎(en)‎

Export

Obiettivi

Il corso fornisce gli strumenti per gestire, analizzare e prevedere dati in forma di serie storica anche quando generati in tempo reale.

Oltre alle metodologie per la gestione del dato in tempo reale, il corso copre sia modelli lineari (ARIMA, state-space/filtro di Kalman) sia modelli non parametrici (machine learning).

Lo studente che avrà seguito l'insegnamento con successo saprà gestire dati in tempo reale, scegliere e identificare il modello di serie storiche più adatto al problema e produrre scomposizioni e previsioni delle serie storiche sotto analisi.

L'insegnamento si inserisce nell'area di apprendimento di statistica del corso di laurea magistrale in data science.

Contenuti sintetici

Gestione dei dati in tempo reale, modelli basati su filtri lineari (ARIMA), modelli basati su componenti non osservabili (state-space/Kalman filter), modelli non parametrici (reti neurali, support vector machine, vicini più vicini, ecc.).

Programma esteso

Prima parte

Teoria della previsione statistica (miglior previsore e miglior previsore lineare).
Processi stazionari e integrati
Modelli ARIMA
Modelli VAR e cointegrazione (accenni)
Modelli a componenti non nosservabili (UCM)
Forma state-space
Kalman filter e stime di massima verosimiglianza di modelli in forma state-space
Smoothing delle variabili di stato e dei disturbi (estrazione comonenti e identificazione di anomalie).
Applicazioni a dati reali usando R (o Python)

Seconda parte

Principali task di time-series mining
Classificazione, regressione e previsione
Approcci non-parametrici statistici
Approcci non-parametrici basati su Machine Learning
Reti Neurali

Prerequisiti

Per seguire l'insegnamento con successo è necessario conoscere l'inferenza statistica, l'algebra matriciale ed R o Python (useremo R nelle lezioni, ma potete utilizzare anche Python se preferite).

Modalità didattica

Lezioni teoriche e pratiche in laboratorio informatico.

Tutte le lezioni hanno luogo in laboratorio in modo che non vi sia una divisione tra teoria e pratica e, ogni volta che un nuovo concetto o strumento teorico viene spiegato, un'applicazione pratica mostri la relativa implementazione. Infatti la teoria serve alla pratica e la pratica aiuta a capire la teoria.

Materiale didattico

Rob J Hyndman and George Athanasopoulos, Forecasting: Principles and Practice (2nd ed): https://otexts.com/fpp2/

Pelagatti M. (2015) Time Series Modelling with Unobserved Component Models. Chapman and Hall/CRC (il libro è scaricabile gratuitamente sotto indirizzo IP di Bicocca).

Abhijit Ghatak (2019) Deep Learning with R. Springer

Altro materiale sarà reso disponibile sulle pagine elearning.

Periodo di erogazione dell'insegnamento

Primo semestre

Modalità di verifica del profitto e valutazione

L'esame è organizzato in due parti. Entro la data dell'esame ciascuno studente dovrà produrre e inviare al docente un elaborato dove una o più serie storiche concordate con il docente dovranno essere analizzate e previste per mezzo di modelli ARIMA, UCM e machine learning. Lo studente illustrerà l'elaborato durante l'orale in una quindicina di minuti e il docente potrà fare domande sul contenuto. Nel medesimo giorno dell'orale vi è anche uno scritto dalla durata di un'ora che prevede la risposta a cinque tra domande teoriche aperte ed esercizi su modelli ARIMA e UCM.

Per superare l'esame entrambe le parti dovranno essere sufficienti e il voto finale sarà calcolato come media aritmetica semplice delle votazioni delle due parti.

La valutazione della parte teorica sarà basata sull'esattezza e la completezza delle risposte alle domande proposte (ogni risposta ha lo stesso peso nel voto). La valutazione dell'elaborato sarà basata sulla qualità dei modelli costruiti e, in particolare, sulle features costruite e sulla selezione dei modelli finali.

Orario di ricevimento

Pelagatti: su appuntamento (matteo.pelagatti@unimib.it).

Export

Aims

The course illustrates methods and applications for managing, analyzing, and forecasting - possibly streaming - time series.

Besides data managing applications, our lessons cover linear (ARIMA, state-space/Kalman filter) and nonparametric (machine learning) methods.

The student who successfully follows this course will be able to manage streaming data and select, identify, and implement the time series model to fit the data and address the problem under analysis.

The course is part of the statistics learning area of the master's degree program in Data Science.

Streaming data management, linear-filter-based models (ARIMA, VAR), unobserved component models (state-space form/Kalman filter), nonparametric methods (nonparametric regression, tree-based methods, neural networks, support vector machines, nearest neighbors, etc.).

Detailed program

First part

Theory of statistical prediction (best predictor, best linear predictors).
Stationary and integrated processes
ARIMA models
VAR models and cointegration (basic concepts)
Unobserved Component Models (UCM)
State-space form
Kalman filter and maximum likelihood estimation of the model in state-space form
State and disturbance smoothing
Many applications to actual data using R (or Python)

Second part

Main time series mining tasks
Similarity and Clustering
Classification, regression, and forecasting
Non-parametric approaches based on statistical methods
Non-parametric approaches based on machine Learning
Artificial Neural Networks

Prerequisites

Attending students should know statistical inference, matrix algebra, and R, or Python (we will adopt R in class, but you can use Python if you prefer).

Teaching form

Theoretical lessons and computer applications in the lab.

All lectures take place in the laboratory, so there is no division between theory and practice. Whenever a new theoretical concept or tool is explained, a practical application shows its implementation. In fact, theory serves practice, and practice helps to understand theory.

Textbook and teaching resource

Rob J Hyndman and George Athanasopoulos, Forecasting: Principles and Practice (2nd ed): https://otexts.com/fpp2/

Pelagatti M. (2015) Time Series Modelling with Unobserved Component Models. Chapman and Hall/CRC (il libro è scaricabile gratuitamente sotto indirizzo IP di Bicocca).

Abhijit Ghatak (2019) Deep Learning with R. Springer

Further material will be available in the elearning platform.

Semester

First semester

Assessment method

The examination is organized in two parts. First, by the date of the examination, each student must produce and send to the lecturer a paper in which they have to analyze and predict one or more time series (in agreement with the lecturers) using linear (ARIMA, UCM) and non-linear methods (RNN, SVM, etc.). The student will illustrate the paper during the oral examination in ca. 15 minutes, and the lecturers will ask questions about its content. On the same day of the oral exam, there will also be a one-hour written assessment, which consists in answering five theoretical questions on ARIMA and UCM models.

To pass the exam, both parts must have a positive valuation, and the final grade will be computed as the arithmetic mean of the grades of the two parts.

The evaluation of the theoretical part is based on the exactness and completeness of the answers (each answer is equally weighted). The assessment of the prediction exercise is based on the quality of the modeling. We will pay particular attention to feature engineering and model selection procedures.

Office hours

Pelagatti: by appointment (matteo.pelagatti@unimib.it).

Enter

Field of research

SECS-S/03

ECTS

Term

First semester

Activity type

Mandatory to be chosen

Course Length (Hours)

Degree Course Type

2-year Master Degree

Language

English

Teacher

Matteo Maria Pelagatti

View previous A.Y. opinion

Find the books for this course in the Library

Manual enrolments

Self enrolment (Student)