Course information | Big Data Analytics

Course Syllabus

Italiano ‎(it)‎
English ‎(en)‎

Export

Obiettivi

Il corso intende fornire le competenze teoriche e pratiche per comprendere metodi e strumenti dedicati all’analisi di grandi moli di dati, con particolare attenzione all’utilizzo di tecniche di data wrangling e machine learning attraverso l’impiego di tool come OpenRefine e Knime. L’insegnamento mira inoltre a sviluppare la capacità di interpretare criticamente i risultati delle analisi, al fine di supportare efficacemente i processi decisionali. Una parte rilevante del corso è dedicata alla comunicazione dei dati e dei risultati analitici mediante tecniche di data visualisation, con l’obiettivo di rendere le informazioni facilmente fruibili e comprensibili. Infine, verranno introdotti i principi fondamentali dell’intelligenza artificiale generativa, con particolare riferimento al suo impiego per la sintesi automatica e la descrizione dei dati.

Contenuti sintetici

Introduzione a Big Data
Data wrangling, machine learning e text mining
Data visualisation

Programma esteso

Introduzione ai Big Data: concetti fondamentali, problematiche di gestione di grandi moli di dati
Data wrangling con OpenRefine: pulizia e trasformazione dei dati
Introduzione al machine learning e text mining
Machine learning e text mining: modelli di apprendimento supervisionato e introduzione al trattamento del testo
Workflow ML base con Knime: costruzione di pipeline di apprendimento automatico, con focus su NLP
Data visualization: tecniche di rappresentazione visiva dei dati e storytelling
AI generativa: sintesi automatica e descrizione dei dati con modelli generativi

Prerequisiti

Nessuno

Modalità didattica

Lezione frontale (DE) in presenza
Laboratori (DI) con strumenti software
Possibilità di 30% ore online sincrone
24h erogative (lezione)
18h interattive (laboratori)
21h esercitazioni + laboratorio

Materiale didattico

Lezioni con l'ausilio di slide, laboratorio e casi applicativi. Articoli scientifici di riferimento saranno forniti dal docente. Il Software utilizzato sarà open-source

Periodo di erogazione dell'insegnamento

Marzo - Aprile

Modalità di verifica del profitto e valutazione

La modalità di verifica si basa su una prova scritta ed una eventuale prova orale.

La prova scritta si svolge al computer ed è composta da domande aperte e chiuse e risposta multipla su tutti gli argomenti del corso.

In sede di valutazione viene considerata la capacità dello studente di rispondere a quesiti specifici facendo riferimento agli aspetti teorici e pratici (mediante esempi) connessi all'argomento richiesto.

La prova scritta è comune sia per gli studenti frequentanti sia per i non frequentanti.

La prova orale è mirata ad accertare la conoscenza teorica dello studente sugli argomenti del corso. Saranno quindi valutate le capacità di ragionare e approfondire le tematiche proposte in sede di esame e il rigore metodologico del loro sviluppo.

Orario di ricevimento

su Appuntamento

Export

Aims

The course aims to provide both theoretical and practical skills for understanding methods and tools for analysing large volumes of data, with a particular focus on data wrangling and machine learning techniques using tools such as OpenRefine and KNIME. It also aims to develop the ability to interpret analytical results, thereby supporting effective decision-making processes critically. A significant part of the course is dedicated to communicating data and analytical outcomes through data visualisation techniques, making information easily accessible and understandable. Finally, the course introduces the fundamental principles of generative artificial intelligence, with a focus on its application for the automatic synthesis and description of data.

Introduction to big data
Data wrangling, machine learning and text mining
Data visualisation

Detailed program

Introduction to Big Data: fundamental concepts, challenges in managing large volumes of data
Data wrangling with OpenRefine: data cleaning and transformation
Introduction to machine learning and text mining: supervised learning models and an introduction to text processing
Basic ML workflow with Knime: building machine learning pipelines, with a focus on NLP
Data visualisation: techniques for visual representation of data and storytelling
Generative AI: automatic synthesis and description of data using generative models

Prerequisites

None

Teaching form

Lectures (DE) held in person
Workshops (DI) using software tools
Possibility of delivering up to 30% of hours online synchronously
24h of lectures (theoretical content)
18h of interactive workshops
21h dedicated to exercises and lab work

Textbook and teaching resource

Lectures with the support of slides, laboratory and real-life case studies. Scientific Papers and books indicated by the lecturer. The software used is either available as open source or through academic license

Semester

III ciclo

Assessment method

The verification method is based on a written test whilst the oral examination will be provided on request.

The written test takes place at the computer and it consists of open and closed questions with multiple answers on all course topics.

The evaluation is focused on the student's ability to answer to specific questions by referring both to the theoretical and practical aspects (through examples) connected to the requested topic.

The written test is common for both attending students and non-attending students.

The oral exam is aimed at assessing the theoretical knowledge of the student on the topics of the course. The ability to reason and deepen the issues proposed during the examination and the methodological rigor of their development will be evaluated.

Office hours

By Appointment

Enter

Field of research

ING-INF/05

ECTS

Term

Second semester

Activity type

Mandatory

Course Length (Hours)

Degree Course Type

Bachelor Degree

Language

Italian

Teacher

LM

Lorenzo Malandri
Fabio Mercorio
Andrea Seveso

View previous A.Y. opinion

Find the books for this course in the Library

Manual enrolments

Guest access