Course information | Information Retrieval

Course Syllabus

Italiano ‎(it)‎
English ‎(en)‎

Export

Obiettivi

L'obiettivo del corso è fornire un'introduzione ai concetti fondamentali, ai modelli formali, e alle tecniche per la realizzazione di sistemi per il reperimento automatico di documenti testuali in forma digitale (sistemi di "Information Retrieval", detti Motori di Ricerca o Motori di Ricerca su Web quando i documenti da reperire sono costituiti da pagine Web). Dopo una introduzione all'analisi automatica di testi e alla loro rappresentazione (tecniche base di NLP), si definirà il problema del reperimento automatico di informazioni, che presuppone la stima della rilevanza dei documenti rispetto alle necessità informative dell'utente espresse in una query. Al termine del corso lo studente sarà in grado di comprendere tecniche relative all'analisi e alla rappresentazione di testi in liguaggio naturale, nonchè il funzionamento di un motore di ricerca. Lo studente sarà inoltre in grado di utilizzare software "open source" per la definizione di applicazioni di Information Retrieval. Il laboratorio sarà finalizzato alla realizzazione di un'applicazione.

Contenuti sintetici

Il corso introdurrà dapprima il problema della rappresentazione di testi per un loro trattamento automatico, e le tecniche utilizzate in molte applicazioni di NLP per il pre-processing di testi e per una loro rappresentazione formale (incluse tecniche neurali). Si presenteranno poi le tecniche pricipali per la progettazione e la realizzazione di motori di ricerca, con accenni a tecniche per la definizione di sistemi per la raccomandazione di informazioni.
Verranno introdotti i principali modelli per la determinazione della stima di rilevanza di un documento rispetto alle necessità informative dell'utente, dal modello vettoriale ai più recenti modelli neurali.
Verranno inoltre presentate alcune tecniche per la personalizzazione della ricerca.
Il corso introdurrà anche il problema della collezione, l'analisi e reperimento di contenuto generato dagli utenti sui Social Media, e il tema della valutazione dell'efficacia dei motori di ricerca.

Programma esteso

Intoduzione al text processing e al Natural Language Processing (NLP)
Introduzione al Text Mining e ad alcune applicazioni correlate al NLP.
Pre-Processing, indicizzazione e rappresentazione formale di testi (bag of words, word embeddings, statistical language models, neural language models - large language models)
Modelli di sistemi di Information Retrieval: i modelli base (Booleano, Vettoriale, modelli Probabilistici). Modelli avanzati ( modelli neurali). Accenni a motori di ricerca per documenti multimediali.
I motori di ricerca su Web: crawling, link analysis e altri fattori per la stima della rilevanza di pagine Web.
La valutazione dei motori di ricerca.
Argomenti avanzati
Introduzione a software open source per la definizione di motori di ricerca

Prerequisiti

Nozioni di base di Statistica e di Algebra Lineare.

Modalità didattica

Il corso verrà tenuto in lingua inglese e prevede lezioni ed esercitazioni in laboratorio. Si organizzeranno seminari tenuti da esperti a livello internazionale.

Materiale didattico

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008.

Periodo di erogazione dell'insegnamento

Primo semestre

Modalità di verifica del profitto e valutazione

Prova scritta individuale costituita da esercizi e domande aperte sui contenuti del corso. Realizzazione di un progetto di laboratorio che è possibile svolgere in gruppo (sino a tre studenti).
La prova scritta ha come obiettivo la valutazione del livello di comprensione degli aspetti teorici e tecnici di base dell'insegnamento erogato.
L'obiettivo del progetto di gruppo, attraverso l'utilizzo di sofrware open source, è lo sviluppo di soluzioni tecnologiche a problemi affrontati a lezione. In particolare, si considerano ambiti applicativi reali che necessitano della definizione di sistemi i cui fondamenti sono stati presentati a lezione.

Orario di ricevimento

Previo appuntamento con la docente.

Export

Aims

The objective of the course is to provide an introduction to the fundamental concepts, formal models, and techniques for implementing systems for the automatic retrieval of textual documents in digital form (“Information Retrieval” systems, aka Search Engines, or Web Search Engines when the documents to be retrieved consist of Web pages). After an introduction to the automatic analysis of texts and their representation (basic NLP techniques), the problem of information retrieval will be defined, which presupposes the estimate of the relevance of documents to the user's information needs expressed in a query. At the end of the course, the student will be able to understand the basic techniques related to the analysis and representation of texts in natural language, as well as the operation of a search engine. The student will also be able to use “open source” software to define Information Retrieval applications. The lab will be aimed at the implementation of an application.

The course will first introduce the problem of text representation for automatic text processing, and the techniques used in many NLP applications for text pre-processing and formal text representation (including neural techniques). The main techniques for the design and implementation of search engines will then be presented, with mention of techniques for defining recommender systems.
The main IR models for estimating the relevance of a document with respect to the user's information needs expressed in a query will be then introduced, from the vector space model to the more recent neural models.
Some techniques for personalizing search will also be presented.
The course will also introduce the problem of gathering and retrieving user-generated content on Social Media, and the issue of evaluating the effectiveness of search engines.

Detailed program

Introduction to Text Processing and Natural Language Processing (NLP)
Introduction to Text Mining and to some tasks related to NLP
Text pre-processing, indexing and formal representation of texts ((bag of words, word embeddings, statistical language models, neural language models - large language models)
Information Retrieval models: basic models (Boolean model, Vector Space model, probabilistic models). Advanced models (neural models). Introduction to multimedia information retrieval.
Web Search Engines: crawling, link analysis and other factors for estimating relevance of Web pages.
The evaluation of Search Engines.
Advanced topics
Introduction to open source software for the development of search engines.

Prerequisites

Basic knowledge of statistics and of linear algebra.

Teaching form

The course will be taught in English, and it will be constituted of both lectures introducing the main topics and of sessions in a laboratory where the usage of an open source software for the implementation of search engines will be explained and experienced. Seminars taught by international experts will be organised.

Textbook and teaching resource

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008.

Semester

First semester

Assessment method

Individual written examination constituted by both exercises and open questions related to the course content. Definition of a laboratory project that can be also developed by groups of students (up to three students).
The written examination is aimed at assessing the level of understanding of the basic theoretical and technical aspects taught during the course.
The goal of the group project is the usage of open source software that will be employed to develop technological solutions to the problems addressed in the course. In particular, real application areas will be considered, which require the definition of systems presented during the course.

Office hours

To be agreed with the teacher.

Enter

Field of research

INF/01

ECTS

Term

First semester

Activity type

Mandatory to be chosen

Course Length (Hours)

Degree Course Type

2-year Master Degreee

Language

English

Teacher

GP

Gabriella Pasi

Tutor

GP

Georgios Peikos

View previous A.Y. opinion

Find the books for this course in the Library

Manual enrolments

Self enrolment (Student)