Course information | Information Retrieval

Course Syllabus

Italiano ‎(it)‎
English ‎(en)‎

Export

Obiettivi

L'obiettivo del corso è fornire un'introduzione ai concetti fondamentali, ai modelli formali, e alle tecniche per la realizzazione di sistemi per il reperimento automatico di documenti in forma digitale (sistemi di "Information Retrieval", detti Motori di Ricerca o Motori di Ricerca su Web quando i documenti da reperire sono costituiti da pagine Web) e per la raccomandazione di informazioni (Recommender Systems). In questo contesto il principale problema da affrontare è quello della valutazione della rilevanza dei documenti rispetto alle necessità informative dell'utente. Al termine del corso lo studente sarà in grado di progettare tecniche per il reperimento, il trattamento e l'indicizzazione di testi semi-strutturati, e di utilizzare software "open source" per la definizione di applicazioni di Information Retrieval. Il laboratorio sarà finalizzato alla realizzazione di un'applicazione.

Contenuti sintetici

Il corso introdurrà un insieme di tecniche per la progettazione e la realizzazione di motori di ricerca, e per la definizione di sistemi per la raccomandazione di informazioni (Information Filtering).
In particolare saranno presentate tecniche per il trattamento, l'analisi e l'indicizzazione di testi, con accenni all'indicizzazione di documenti multimediali; saranno inoltre presentati alcuni modelli per la determinazione della stima (grado, o probabilità) di rilevanza di un documento rispetto alle necessità informative dell'utente. Verranno inoltre presentate alcune tecniche per la personalizzazione della ricerca.
Il corso introdurrà il problema della collezione, l'analisi e reperimento di contenuto generato dagli utenti sui Social Media (ad esempio Twitter, Facebook, ecc.).

Programma esteso

Definizione di Text Mining e delle principali differenze tra Text Mining e Data Mining.
Introduzione di alcune applicazioni correlate al Text Mining
Pre-Processing, indicizzazione e rappresentazione formale di testi
Modelli di sistemi di Information Retrieval: i modelli base (Booleano, Vettoriale, modelli Probabilistici). Modelli avanzati (ad esempio modelli neurali). Accenni a motori di ricerca per documenti multimediali.
I motori di ricerca su Web: crawling, link analysis e altri fattori per la stima della rilevanza di pagine Web.
La valutazione dei motori di ricerca.
Argomenti avanzati
Introduzione a software open source per la definizione di motori di ricerca

Prerequisiti

Nozioni di base di Statistica e di Algebra Lineare.

Modalità didattica

Il corso verrà tenuto in lingua inglese e prevede lezioni ed esercitazioni in laboratorio. Si organizzeranno seminari tenuti da esperti a livello internazionale.

Materiale didattico

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008.
John Scott, Social Network Analysis (Third Ed.), SAGE, 2013.

Periodo di erogazione dell'insegnamento

Primo semestre

Modalità di verifica del profitto e valutazione

Prova scritta individuale costituita da esercizi e domande aperte sui contenuti del corso. Realizzazione di un progetto di laboratorio che è possibile svolgere in gruppo (sino a tre studenti).
La prova scritta ha come obiettivo la valutazione del livello di comprensione degli aspetti teorici e tecnici di base dell'insegnamento erogato.
L'obiettivo del progetto di gruppo, attraverso l'utilizzo di sofrware open source, è lo sviluppo di soluzioni tecnologiche a problemi affrontati a lezione. In particolare, si considerano ambiti applicativi reali che necessitano della definizione di sistemi i cui fondamenti sono stati presentati a lezione.

Orario di ricevimento

Previo appuntamento con la docente.

Export

Aims

This course aims at introducing the basic concepts, the formal models and the main techniques to define and design Information Retrieval Systems (also called Search Engines, and in particular Web Search Engines when working on the Web to the aim of retrieving Web pages) and Information Filtering (IF) systems. In this context, the main problem is the assessment of the relevance of documents with respect to the information needs formulated in a user’s query. Students will acquire the capability of understanding and defining algorithms for documents indexing and retrieval, and to use open source software to implement ad hoc search engines. They will also develop a search engine application by using open source software.

This course aims at introducing the basic concepts, the formal models and the main techniques to define and design Information Retrieval Systems (also called Search Engines, and in particular Web Search Engines when working on the Web to the aim of retrieving Web pages) and Information Filtering (IF) systems. In particular, various techniques for the analysis and the indexing of texts will be presented, also including a basic introduction to multimedia documents indexing. Moreover, the issue of estimating the relevance of documents to a query will be addressed: several models finalised at the assessment of a numeric estimate of relevance (degree or probability) of a document to a query will be explained. The main approaches to personalized search will be presented. The course will also introduce additional applications related to text analysis and mining, such as crawling, analysis and retrieval of user generated content on Social Media (e.g. Twitter, Facebook, etc.).

Detailed program

Definition of Text Mining and basic differences between Data Mining and Text Mining.
Introduction to some tasks related to Text Mining
Text pre-processing, indexing and formal representation
Information Retrieval models: basic models (Boolean model, Vector Space model, probabilistic models). Advanced models (e.g. neural models). Introduction to multimedia information retrieval.
Web Search Engines: crawling, link analysis and other factors for estimating relevance of Web pages.
The evaluation of Search Engines.
Advanced topics
Introduction to open source software for the development of search engines.

Prerequisites

Basic knowledge of statistics and of linear algebra.

Teaching form

The course will be taught in English, and it will be constituted of both lectures introducing the main topics and of sessions in a laboratory where the usage of an open source software for the implementation of search engines will be explained and experienced. Seminars taught by international experts will be organised.

Textbook and teaching resource

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008.
John Scott, Social Network Analysis (Third Ed.), SAGE, 2013.

Semester

First semester

Assessment method

Individual written examination constituted by both exercises and open questions related to the course content. Definition of a laboratory project that can be also developed by groups of students (up to three students).
The written examination is aimed at assessing the level of understanding of the basic theoretical and technical aspects taught during the course.
The goal of the group project is the usage of open source software that will be employed to develop technological solutions to the problems addressed in the course. In particular, real application areas will be considered, which require the definition of systems presented during the course.

Office hours

To be agreed with the teacher.

Enter

Field of research

INF/01

ECTS

Term

First semester

Activity type

Mandatory to be chosen

Course Length (Hours)

Degree Course Type

2-year Master Degree

Language

ita, eng

Teacher

PK

Pranav Kasela
GP

Gabriella Pasi
Marco Viviani

View previous A.Y. opinion

Find the books for this course in the Library

Manual enrolments

Self enrolment (Student)