Summary of Natural Language Processing

Course Syllabus

Italiano ‎(it)‎
English ‎(en)‎

Export

Obiettivi

Il corso si propone di introdurre gli elementi fondazionali e i più recenti modelli computazionali avanzati relativi al processamento del linguaggio naturale. Lo studente, al termine dell'attività formativa, avrà acquisito conoscenze e competenze relative ad algoritmi, strumenti e modelli per l’elaborazione e l’analisi del linguaggio naturale, al fine di sfruttare i più recenti sistemi di processamento presenti allo stato dell’arte.

Contenuti sintetici

Elementi fondazionali di rappresentazione del linguaggio naturale
Semantica delle parole
Large Language Models
Applicazioni di NLP

Programma esteso

Fundamentals
- Rationalist and Empiricist Approaches to Language
- The Ambiguity of Language: Why NLP Is Difficult
- Linguistic Essentials
  - Words, Tokens, Lemmas, Stems
  - Parts of Speech and Morphology
  - Phrase Structure
- Dirty Hands-on Text
  - 1.4.1 Lexical resources
  - 1.4.2 Word counts
  - 1.4.3 Zipf’s laws
  - 1.4.4 Collocations
  - 1.4.5 Concordances
Vector Semantics
- Frequentist Representation of Text (TF, TF-IDF, etc..)
- Word Embeddings
  - Word2Vec
  - FastText
  - Glove
- Visualization of Embeddings:
  - Principal Components Analysis
  - T-distributed stochastic neighbor embedding
  - Uniform Manifold Approximation and Projection
Transformers and Large Language Models
- Attention Mechanisms: Self and Multi Head Attention
- Positional Embeddings
- Transformers as Language Models
- Pretraining Large Language Models
- Prompting and Instruct Tuning
- Interpretability and Explainability of Language Models
NLP Applications
- Text and Token Classification
- Chatbots and Dialog Systems
- Word Sense Disambiguation
- Topic Modeling
- Machine Translation

Prerequisiti

Utile, ma non obbligatorio: apprendimento automatico, programmazione in python

Modalità didattica

Lezioni ed esercitazioni in aula.
Il corso verrà erogato in lingua inglese.

Materiale didattico

Cristopher MANNING and Hinrich SCHÜTZE. Foundations of Statistical Natural Language Processing. MIT Press.
Dan JURAFSKY and James H. MARTIN. Speech and Language Processing. Prentice Hall.

Periodo di erogazione dell'insegnamento

Secondo semestre

Modalità di verifica del profitto e valutazione

Progetto ed Esame Orale. Sono assenti prove in itinere intermedie.
Il progetto consisterà nello sviluppo di uno strumento di natural language processing basato su metodi e modelli presentati a lezione. Il progetto prevede una valutazione espressa in un range 0-24.
L'orale prevede 4 domande di teoria tra gli argomenti del corso elencati nel programma dettagliato. Per ciascuna domanda verrà data una valutazione compresa pari a -2, per una risposta errata o mancata risposta, e +2 punti per una risposta corretta.

Orario di ricevimento

Su appuntamento da concordare via email con il docente.

Export

Aims

The course aims to introduce the foundational elements and the most recent advanced computational models related to natural language processing. At the end of the training activity, the student will have acquired knowledge and skills related to algorithms, tools and models for processing and analyzing natural language, in order to exploit the most recent state-of-the-art processing systems.

Foundations of natural language representation
Semantics of words
Large Language Models
NLP applications

Detailed program

Fundamentals
- Rationalist and Empiricist Approaches to Language
- The Ambiguity of Language: Why NLP Is Difficult
- Linguistic Essentials
  - Words, Tokens, Lemmas, Stems
  - Parts of Speech and Morphology
  - Phrase Structure
- Dirty Hands-on Text
  - 1.4.1 Lexical resources
  - 1.4.2 Word counts
  - 1.4.3 Zipf’s laws
  - 1.4.4 Collocations
  - 1.4.5 Concordances
Vector Semantics
- Frequentist Representation of Text (TF, TF-IDF, etc..)
- Word Embeddings
  - Word2Vec
  - FastText
  - Glove
- Visualization of Embeddings:
  - Principal Components Analysis
  - T-distributed stochastic neighbor embedding
  - Uniform Manifold Approximation and Projection
Transformers and Large Language Models
- Attention Mechanisms: Self and Multi Head Attention
- Positional Embeddings
- Transformers as Language Models
- Pretraining Large Language Models
- Prompting and Instruct Tuning
- Interpretability and Explainability of Language Models
NLP Applications
- Text and Token Classification
- Chatbots and Dialog Systems
- Word Sense Disambiguation
- Topic Modeling
- Machine Translation

Prerequisites

Useful, but not required: machine learning, python programming

Teaching form

Lectures and classroom exercises.
The course will be given in English.

Textbook and teaching resource

Cristopher MANNING and Hinrich SCHÜTZE. Foundations of Statistical Natural Language Processing. MIT Press.
Dan JURAFSKY and James H. MARTIN. Speech and Language Processing. Prentice Hall.

Semester

Second semester

Assessment method

Project and Oral Exam. Intermediate tests are absent.
The project will consist in the development of a natural language processing tool based on methods and models presented during the course. The project is evaluated in the range of 0-24 points.
The oral exam consists of 4 questions about theory addressed during the course and listed in the detailed program. For each question, an evaluation of -2 will be given, for an incorrect answer or no answer, and +2 points for a correct answer.

Office hours

By appointment to be agreed via email with the teacher.

Enter

Field of research

INF/01

ECTS

Term

Second semester

Activity type

Mandatory to be chosen

Course Length (Hours)

Degree Course Type

2-year Master Degreee

Language

English

Teacher

EF

Elisabetta Fersini
AR

Alessandro Raganato

View previous A.Y. opinion

Find the books for this course in the Library

Manual enrolments

Self enrolment (Student)