Course information | Text Mining and Search

Course Syllabus

Italiano ‎(it)‎
English ‎(en)‎

Export

Obiettivi

L'obiettivo del corso è fornire un'introduzione ai concetti fondamentali relativi a tecniche di Text Mining e rappresentazione di testi; si presenteranno inoltre alcune applicazioni di Text Mining: classificazione e clustering di testi, Topic Modeling, riassunto automatico di testi. Si faranno accenni a sistemi quali i motori di ricerca e i sistemi per la raccomandazione di Informazioni.

Contenuti sintetici

Il corso fornirà inizialmente la definizione di Text Mining e indicherà le principali differenze tra Data Mining e Text Mining.
Tecniche di pre-processing di testi verranno presentate e il problema dell'indicizzazione di testi e della loro rappresentazione formale verrà affrontato.
Il corso introdurrà quindi alcune applicazioni correlate al Text Mining: classificazione, clustering, Topic Modeling e riassunto automatico di testi.
Il corso introdurrà quindi le applicazioni precedentemente citate. Si introdurranno alcuni software open source per la definizione di applicazioni di Text Mining.

Programma esteso

Definizione di Text Mining e delle principali differenze tra Text Mining e Data Mining
Breve introduzione di alcune applicazioni correlate al Text Mining
Pre-Processing, indicizzazione e rappresentazione formale di testi (BoW, Word Embedding, Introduzione a tecniche di Contextual Word Embedding)
Classificazione e clustering di testi
Topic Modelling
Riassunto automatico di testi
Introduzione ai motori di ricerca testuali e ai sistemi per la raccomandazione di informazioni
Software "open-source" per Text Mining e ricerca di informazioni online

Prerequisiti

Conoscenza di base di statistica e di linguaggi di programmazione.

Modalità didattica

Il corso è tenuto in lingua inglese e prevede sia lezioni sia esercitazioni; in laboratorio sarà spiegato e sperimentato l'utilizzo di software “open source”.
Potranno essere previsti seminari tenuti da esperti a livello nazionale ed internazionale.

Materiale didattico

Berry, M. W., & Kogan, J. (Eds.). (2010). Text mining: applications and theory. John Wiley & Sons.
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.
Chowdhary, K., & Chowdhary, K. R. (2020). Natural language processing. Fundamentals of artificial intelligence, 603-649.

Altri testi specifici su Text Mining accesibili online verranno indicati durante il corso.

Periodo di erogazione dell'insegnamento

Primo semestre

Modalità di verifica del profitto e valutazione

Prova scritta e realizzazione di un progetto di laboratorio (project work) che è possibile svolgere in gruppo (sino a tre studenti). La prova scritta ha come obiettivo la valutazione del livello di comprensione degli aspetti di base dell'insegnamento erogato, e****d è costituita da un insieme di domande a risposta aperta. L'obiettivo del progetto di gruppo, attraverso l'utilizzo di sofrware "open source", è lo sviluppo di soluzioni tecnologiche a problemi affrontati a lezione. In particolare, si considerano ambiti applicativi reali che necessitano della definizione di sistemi i cui fondamenti sono stati presentati a lezione. La valutazione dell'esame scritto avverrà in trentesimi. A tale valutazione verranno aggiunti da 0 a 4 punti ottenibili dalla valutazione del progetto.

Orario di ricevimento

Previo appuntamento con i docenti

Export

Aims

The aim of the course is to provide an introduction to the fundamental concepts related to Text Representation and Text Mining techniques; moreover, in the course some Text Mining applications will be presented: Text Classification and Clustering, Topic Modelling, and Text Summarization. An introduction to Search Engines and Recommender Systems will be provided.

This course will first provide the definition of Text Mining and will point out the basic differences between Data Mining and Text Mining.
The issues of text pre-processing and analysis, and of text indexing and representation will be addressed.
The course will then introduce some tasks involved by Text Mining, which include Text Clustering, Classification, Topic Modeling, and Text Summarization.
Then the course will introduce the previously mentioned tasks. Some open source software for Text Mining will be introduced and practiced.

Detailed program

Definition of Text Mining and basic differences between Data Mining and Text Mining
Introduction to some tasks related to Text Mining
Text pre-processing, indexing and formal representation (BoW, Word Embedding, Introduction to Contextual Word Embedding techniques)
Text Classification and Clustering
Topic Modelling
Text Summarization
Introduction to Text Based Search Engines and to Recommender Systems
Open Source software for Text Mining and Search

Prerequisites

Basic knowledge of statistics and of programming languages.

Teaching form

The course will be taught in English, and it will be constituted of both lectures introducing the main topics and of sessions in a laboratory where open source tools will be explained and employed.
Seminars could be held by experts at national and international level will be part of the course.

Textbook and teaching resource

Berry, M. W., & Kogan, J. (Eds.). (2010). Text mining: applications and theory. John Wiley & Sons.
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.
Chowdhary, K., & Chowdhary, K. R. (2020). Natural language processing. Fundamentals of artificial intelligence, 603-649.

Other specific books and articles on text mining that are accessible online will be recommended during the course.

Semester

First semester

Assessment method

Written exam, definition of a laboratory project (project work) that can be developed also by groups of students (up to three students). The written examination is aimed at assessing the level of understanding of the basic aspects taught during the course; it is constituted by a set of open questions. The goal of the group project is the usage of open source software that will be employed to develop technological solutions to the problems addressed in the course. In particular, real application areas will be considered, which require the definition of systems presented during the course. The evaluation of the written examination will be in thirtieths. 0 to 4 points will be added to this evaluation.

Office hours

To be agreed with the teachers

Enter

Field of research

INF/01

ECTS

Term

First semester

Activity type

Mandatory

Course Length (Hours)

Degree Course Type

2-year Master Degree

Language

English

Teacher

LC

Luca Celotti
GP

Gabriella Pasi
Marco Viviani

View previous A.Y. opinion

Find the books for this course in the Library

Manual enrolments

Guest access