Course information | Data Semantics

Course Syllabus

Italiano ‎(it)‎
English ‎(en)‎

Export

Obiettivi

Conoscenza e capacità di comprensione (DdD 1)

Al termine del corso, lo studente avrà acquisito conoscenze su:

i principi fondamentali della semantica dei dati e il loro ruolo nelle applicazioni di data science;
i due principali paradigmi semantici: semantica dichiarativa, basata su modelli logico-formali e knowledge graph; semantica distribuzionale, fondata sull’apprendimento di rappresentazioni da dati testuali e sui modelli di linguaggio di grandi dimensioni (LLM);
metodologie neuro-simboliche, che integrano approcci simbolici e neurali per la rappresentazione e l’elaborazione semantica;
le principali tecnologie del web semantico (RDF, RDFS, OWL, SPARQL) e i fondamenti di ontologie, tassonomie e ragionamento automatico;
word embeddings, pre-trained language models, large language modeels e loro applicazioni
tecniche per costruire, arricchire e rendere fruibili basi di conoscenza, quali: riconciliazione di dati eterogenei, estrazione di entità e relazioni, question answering su dati e documenti

Conoscenza e capacità di comprensione applicate (DdD 2)

Lo studente sarà in grado di:

modellare e interrogare knowledge graph utilizzando linguaggi e strumenti del web semantico;
progettare e utilizzare ontologie per supportare l’integrazione semantica di dati in scenari applicativi;
applicare modelli distribuzionali e LLM per l’interpretazione di testi e l’estrazione di conoscenza da dati non strutturati;
realizzare soluzioni per la riconciliazione semantica, l'estrazione di entità e l'interrogazione di informazioni in linguaggio naturale;
affrontare problemi reali di interoperabilità semantica, selezionando metodologie e strumenti adeguati in base al contesto;
sviluppare piccoli progetti applicativi che combinano grafi di conoscenza, NLP e tecniche di rappresentazione semantica per analisi, esplorazione e generazione di contenuti.

Altri descrittori di Dublino (DdD 3, 4, 5)

Autonomia di giudizio: lo studente svilupperà la capacità di valutare criticamente diverse soluzioni semantiche e di scegliere metodologie appropriate per nuovi problemi, anche attraverso esperienze progettuali e discussioni seminariali.
Abilità comunicative: grazie alle presentazioni orali dei progetti e all’interazione durante le esercitazioni, lo studente sarà in grado di comunicare in modo efficace concetti complessi e risultati, anche in contesti multidisciplinari.
Capacità di apprendere: il corso fornisce strumenti teorici e pratici che consentono allo studente di proseguire autonomamente lo studio delle tecnologie semantiche, anche attraverso la lettura di articoli scientifici e materiali avanzati.

Contenuti sintetici

Il corso presenta strumenti computazionali per rappresentare, armonizzare e ricostruire la semantica dei dati utilizzati in applicazioni di data science, con particolare attenzione a:

modelli e linguaggi elaborati nell'ambito del web semantico per supportare l'integrazione di dati eterogeni (knowledge graph, ontologie, RDF, RDFS, OWL);
modelli per apprendere la semantica dai dati, con particolare riferimento a dati in formato testuale (word embeddings, Large Language Models (LLM))
tecniche per integrare knowledge graph e LLM.
tecniche neurali per la riconciliazione di dati;
tecniche di elaborazione del linguaggio naturale per estrarre informazioni strutturate da testi e rispondere a domande usando dati e documenti ;

Programma esteso

Data Semantics: Semantica dei dati ed applicazioni di data analytics (big data, sorgenti web, formati eterogenei, integrazione di informazioni ed arricchimento semantico, connessione tra dati, knowledge graph)
Knowledge Graph e Web Semantico: rappresentazione e interogazione dei dati nel web semantico (RDF, SPARQL, tecnologie semantiche e architettture, rappresentazioni in ambito industriale mediante basi di dati a grafo). Esercitazione su interrogazione di Knowledge Graph pubblici con SPARQL; definizione di vocabolari condivisi mediante ontologie e linguaggi logico-formali (dai vocabolari condivisi alle ontologie, tassonomie, ontologie lessicali, ontologie assiomatiche, ragionamento automatico e semantica, RDFS, OWL). Esercitazione su modellazione di ontologie mediante RDFS e OWL.
Semantica distribuzionale e modelli linguistici: introduzione alla semantica distribuzionale e all'apprendimento di rappresentazioni distribuite (semantica distribuzionale); modelli per apprendere rappresentazioni distribuite da corpus testuali (word embeddings e word2vec, contextual word embeddings e Large Language Models - LLM). Esercitazione su LLM e attenzione. Seminario: modelli per comparare rappresentazioni distribuite differenti per applicazioni di computational social science e cultural analysis (allineamento tra word embeddings, analisi diacroniche, studi basati su word embeddings con WEAT e SWEAT).
Riconciliazione semantica: algoritmi di entity matching basati su reti neurali (deep matcher, Ditto, BERT-based matching, matching con large language models).
Elementi di NLP - tecniche di estrazione di informazioni: introduzione e presentazione di alcuni approcci all'estrazione di informazioni strutturate da testo e altri dati semi strutturati (named entity recognition, entity linking, estrazione di relazioni, semantic table interpretation). Esercitazione su named entity recognition e named entity linking.
Tecniche di accesso alle informazioni mediate dalla semantica: tecniche semantiche per l'esplorazione di informazioni (faceted search,retrieval augmented generation)

Prerequisiti

Conoscenze matematiche e informatiche insegnate nei corsi obbligatori del primo semestre.

Lezioni frontali ed esercitazioni con i personal computer degli studenti. Uso della piattaforma Moodle. Seminari su applicazioni delle tecnologie semantiche a problemi reali da parte di experti del mondo dell'industria.

Didattica Erogativa: ~32h (lezioni frontali)
Didattica Interattiva: ~12h (esercitazioni guidate)

Insegnato in Inglese

Materiale didattico

Knowledge Graphs: Fundamentals, Techniques, and Applications. Kejriwal, Mayank, Craig A. Knoblock, and Pedro Szekely. MIT Press, 2021.
The Web of Data. Aidan Hogan. 2020. Springer. Pages 1-680.

Daniel Jurafsky and James H. Martin. 2025. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 3rd edition. (Online manuscript released January 12, 2025. https://web.stanford.edu/~jurafsky/slp3 )

Verrà fornito agli studenti materiale aggiuntivo sotto forma di presentazioni e articoli scientifici per coprire gli argomenti più recenti non coperti dal libro di testo.

Periodo di erogazione dell'insegnamento

Semestre II

Modalità di verifica del profitto e valutazione

La valutazione finale è costituita dall'aggregazione dei punteggi ottenuti in due valutazioni indipendenti.

La prima valutazione è basata su un progetto d'esame, effettuato individualmente o in gruppo, e finalizzato all'approfondimento di un argomento specifico trattato nel corso o collegato ad argomenti trattati nel corso; il progetto viene discusso attraverso una presentazione orale supportata da slide della durata di 20 min circa; è possibile, durante la presentazione, includere una breve demo del progetto svolto. La valutazione si basa su: significatività del progetto rispetto agli argomenti trattati nel corso, rigore metodologico (nei limiti di quanto ragionevole chiedere per un progetto d'esame); padronanza dell'argomento approfondito dimostrata durante la presentazione orale.
La seconda valutazione è basata sulla verifica della conoscenza degli argomenti affrontati durante il corso mediante valutazione di esercizi (assignment) da completare individualmente e discussione orale. Gli assignment verranno valutati e discussi in sede d'esame, dopo la discussione del progetto.

Orario di ricevimento

Su richiesta

Sustainable Development Goals

ISTRUZIONE DI QUALITÁ | IMPRESE, INNOVAZIONE E INFRASTRUTTURE

Export

Aims

Knowledge and understanding (DdD 1)

By the end of the course, students will have acquired knowledge of:

the fundamental principles of data semantics and their role in data science applications;
the two main semantic paradigms: declarative semantics, based on logic-based models and knowledge graphs; and distributional semantics, based on representation learning from textual data and large language models (LLMs);
neuro-symbolic methodologies that combine symbolic and neural approaches for semantic representation and processing;
core Semantic Web technologies (RDF, RDFS, OWL, SPARQL) and the foundations of ontologies, taxonomies, and automated reasoning;
word embeddings, pre-trained language models, large language models, and their applications;
techniques for building, enriching, and making knowledge bases usable, including heterogeneous data reconciliation, entity and relation extraction, and question answering over data and documents.

Applying knowledge and understanding (DdD 2)

Students will be able to:

model and query knowledge graphs using Semantic Web languages and tools;
design and use ontologies to support semantic integration of data in application scenarios;
apply distributional models and LLMs to text interpretation and knowledge extraction from unstructured data;
implement solutions for semantic reconciliation, entity extraction, and natural language question answering;
address real-world semantic interoperability problems by selecting appropriate methods and tools for the specific context;
develop small-scale applications for data analysis, exploration, and content generation combining knowledge graphs, NLP, and LLMs .

Other Dublin Descriptors (DdD 3, 4, 5)

Making judgments: students will develop the ability to critically assess different semantic solutions and to select suitable methodologies for novel problems, also through project work and seminar discussions.
Communication skills: through oral presentations of projects and interactive exercises, students will be able to clearly communicate complex concepts and results, including in interdisciplinary contexts.
Learning skills: the course provides theoretical and practical tools that enable students to continue studying semantic technologies independently, including through reading scientific literature and advanced materials.

The course presents computational methods to represent, harmonize and interpret the semantics of data used in data science applications, with a particular focus on:

models and languages developed within the semantic web to support the integration of heterogeneous data (knowledge graph, ontologies, RDF, RDFS, OWL);
models to learn (semantic) representations from data, especially from text corpora (word embeddings, Large Language Models);
techniques for the integration of knowledge graphs and LLMs.
neural techniques for data matching;
information extration techniques, with particular enphasis on entity extraction;
question ansering techqniques and retrieval-augmented generation

Detailed program

Data semantics: the role of semantics in data analytics (big data, web sources, heterogeneous formats, information integration, semantic enrichment, data linking, knowledge graphs).
Knowledge graphs and the semantic web: representation and query of data in the semantic web (RDF, SPARQL, semantic technologies and architectures, corporate knowledge graphs with graph databases). Excercise on querying RDF knowledge graphs with SPARQL; definition of shared vocabularies with ontologies and logic-based languages (from shared vocabularies to ontologies, taxonomies, lexical ontologies, axiomatic ontologies, automatic reasoning and semantics, RDFS, OWL). Excercises on ontology modeling with RDFS and OWL.
Distributional semantics and language models: introduction to distributional semantics and distributed representations (distributional semantics); models for learning distributed representations from textual corpora (word embeddings and word2vec, Large Language Models - LLMs). Exercises on LLMs and attention. Seminar: models to compare different distributed representations (alignment between word embeddings, diachronic language studies, studies based on word embeddings with WEAT and SWEAT).
Semantic reconciliation: neural network-based entity matching algorithms (deep matcher, Ditto, BERT-based matching).
Introduction to NLP - information extraction: presentation of selected approaches to the extraction of structured information from texts and other semi-structured data (named entity recognition, entity linking, relationship extraction, semantic table interpretation). Esercitazione su named entity recognition e named entity linking
Information and knowledge exploration: semantic techniques for the exploration of information (semantic search, retrieval augmented generation).

The first assessment is based on an exam-tailored project, carried out individually or in groups and aimed at bringing the student to have in-depth knowledge and/or hands-on experience of a specific topic covered in the course or linked to topics covered in the course; the project is discussed through an oral presentation supported by slides lasting about 20 minutes; it is possible, during the presentation, to include a short demo of the project. The evaluation is based on: significance of the project for the topics covered in the course, methodological soundness (within the limits of what is reasonable to ask for an exam project); mastery of the in-depth topic demonstrated during the oral presentation.
The second assessment is based on the evaluation of the knowledge acquired by the student on the topics addressed during the course through the discussion of assignments that students must execute individually as homework. Assignments will be evaluated and discussed during the oral exam after the presentation of the project.

Office hours

On demand

Sustainable Development Goals

QUALITY EDUCATION | INDUSTRY, INNOVATION AND INFRASTRUCTURE

Enter

Field of research

INF/01

ECTS

Term

Second semester

Course Length (Hours)

Degree Course Type

2-year Master Degree

Language

English

Teacher

MP

Matteo Luigi Palmonari
RP

Riccardo Pozzi

View previous A.Y. opinion

Find the books for this course in the Library

Manual enrolments

Self enrolment (Student)

Course Syllabus

Obiettivi

Contenuti sintetici

Programma esteso

Prerequisiti

Modalità didattica

Materiale didattico

Periodo di erogazione dell'insegnamento

Modalità di verifica del profitto e valutazione

Orario di ricevimento

Sustainable Development Goals

Aims

Contents

Detailed program

Prerequisites

Teaching form

Textbook and teaching resource

Semester

Assessment method

Office hours

Sustainable Development Goals

Key information

Staff

Teacher

Students' opinion

Bibliography

Enrolment methods

Sustainable Development Goals