Skip to main content
If you continue browsing this website, you agree to our policies:
  • Condizioni di utilizzo e trattamento dei dati
Continue
x
e-Learning - UNIMIB
  • Home
  • My Media
  • More
Listen to this page using ReadSpeaker
 Log in
e-Learning - UNIMIB
Home My Media
Percorso della pagina
  1. Science
  2. Bachelor Degree
  3. Artificial Intelligence [E312PV - E311PV]
  4. Courses
  5. A.A. 2023-2024
  6. 2nd year
  1. Text Mining and Natural Language Processing
  2. Summary
Insegnamento Course full name
Text Mining and Natural Language Processing
Course ID number
2324-2-E311PV011
Course summary SYLLABUS

Course Syllabus

  • Italiano ‎(it)‎
  • English ‎(en)‎
Export

Obiettivi

The aim of the course is to provide an introduction to the fundamental concepts related to the Linguistic aspects of human languages, and Natural Language Processing (NLP) techniques; moreover, in the course, some NLP applications will be presented, i.e. Information Retrieval and Machine Translation.

After successfully completing the course, students will be able to:

-describe basic linguistic aspects of human languages.
-explain the common computational vector space models for words applied in language technology.
-describe the challenges related to word vector models.
-know the main neural language models and apply them for different applications.

Contenuti sintetici

This course will first provide the notions of the morphological and syntactic structure of human languages, useful in creating more linguistically aware NLP systems.

The course will then introduce some NLP tasks and text representation techniques. Starting from statistical methods to modern neural approaches, an overview of fundamental techniques will be presented and practiced, such as the n-gram model, Word2Vec, the encoder-decoder paradigm, and neural language models. Open-source software for NLP will be introduced and used throughout the lab sessions.

Programma esteso

Introduction to levels of linguistics analysis and typological differences

Morphology/morphophonology

Morphosyntax/syntax

Parts of speech

Heads, arguments, adjuncts

Argument types and grammatical functions

Mismatches between syntactic position and semantic roles

Resources

Introduction to some NLP tasks

Data Pre-Processing (eg. tokenization, NER, etc.)

Text representation (eg. tf-idf)

Statistical LM (eg. n-gram model)

Dense vector representation (eg. Word2Vec, FastText, etc.)

Deep Neural Approaches for NLP (eg. Encoder-Decoder, Neural Language Model)

Applications of NLP:

Information Retrieval

Machine Translation

Prerequisiti

Basic knowledge of statistics, programming languages, and machine learning.

Modalità didattica

The course will be taught in English, and it will be constituted of both lectures introducing the main topics and laboratory sessions where open-source tools will be explained and employed. Seminars held by experts at national and international levels may be part of the course.

Materiale didattico

Emily M. Bender, "Linguistic Fundamentals for Natural Language Processing", Synthesis lectures on human language technologies, Morgan&Claypool Publishers, 2013.

Daniel Jurafsky and James Martin, "Speech and Language Processing, 2nd Edition", Prentice Hall, 2008.

Yoav Goldberg, "Neural Network Methods for Natural Language Processing", Synthesis lectures on human language technologies, Morgan&Claypool Publishers, 2017.

Periodo di erogazione dell'insegnamento

Second Semester

Modalità di verifica del profitto e valutazione

Written and optional oral individual examination, definition of a laboratory project that can be developed also by groups of students (up to three students).

The written examination is aimed at assessing the level of understanding of the basic aspects taught during the course; it is constituted by a set of open questions.

The goal of the group project is the usage of open-source software that will be employed to develop technological solutions to the problems addressed in the course. In particular, real application areas will be considered, which require the definition of systems presented during the course.

Orario di ricevimento

To be agreed with the teachers

Export

Aims

The aim of the course is to provide an introduction to the fundamental concepts related to the Linguistic aspects of human languages, and Natural Language Processing (NLP) techniques; moreover, in the course, some NLP applications will be presented, i.e. Information Retrieval and Machine Translation.

After successfully completing the course, students will be able to:

-describe basic linguistic aspects of human languages.
-explain the common computational vector space models for words applied in language technology.
-describe the challenges related to word vector models.
-know the main neural language models and apply them for different applications.

Contents

This course will first provide the notions of the morphological and syntactic structure of human languages, useful in creating more linguistically aware NLP systems.

The course will then introduce some NLP tasks and text representation techniques. Starting from statistical methods to modern neural approaches, an overview of fundamental techniques will be presented and practiced, such as the n-gram model, Word2Vec, the encoder-decoder paradigm, and neural language models. Open-source software for NLP will be introduced and used throughout the lab sessions.

Detailed program

Introduction to levels of linguistics analysis and typological differences

Morphology/morphophonology

Morphosyntax/syntax

Parts of speech

Heads, arguments, adjuncts

Argument types and grammatical functions

Mismatches between syntactic position and semantic roles

Resources

Introduction to some NLP tasks

Data Pre-Processing (eg. tokenization, NER, etc.)

Text representation (eg. tf-idf)

Statistical LM (eg. n-gram model)

Dense vector representation (eg. Word2Vec, FastText, etc.)

Deep Neural Approaches for NLP (eg. Encoder-Decoder, Neural Language Model)

Applications of NLP:

Information Retrieval

Machine Translation

Prerequisites

Basic knowledge of statistics, programming languages, and machine learning.

Teaching form

The course will be taught in English, and it will be constituted of both lectures introducing the main topics and laboratory sessions where open-source tools will be explained and employed. Seminars held by experts at national and international levels may be part of the course.

Textbook and teaching resource

Emily M. Bender, "Linguistic Fundamentals for Natural Language Processing", Synthesis lectures on human language technologies, Morgan&Claypool Publishers, 2013.

Daniel Jurafsky and James Martin, "Speech and Language Processing, 2nd Edition", Prentice Hall, 2008.

Yoav Goldberg, "Neural Network Methods for Natural Language Processing", Synthesis lectures on human language technologies, Morgan&Claypool Publishers, 2017.

Semester

Second Semester

Assessment method

Written and optional oral individual examination, definition of a laboratory project that can be developed also by groups of students (up to three students).

The written examination is aimed at assessing the level of understanding of the basic aspects taught during the course; it is constituted by a set of open questions.

The goal of the group project is the usage of open-source software that will be employed to develop technological solutions to the problems addressed in the course. In particular, real application areas will be considered, which require the definition of systems presented during the course.

Office hours

To be agreed with the teachers

Enter

Key information

Field of research
INF/01
ECTS
6
Term
Second semester
Activity type
Mandatory
Course Length (Hours)
56
Degree Course Type
Degree Course
Language
English

Staff

    Teacher

  • MG
    Maria Teresa Guasti
  • GP
    Gabriella Pasi
  • AR
    Alessandro Raganato

Students' opinion

View previous A.Y. opinion

Bibliography

Find the books for this course in the Library

Enrolment methods

Manual enrolments
Self enrolment (Student)

You are not logged in. (Log in)
Policies
Get the mobile app
Powered by Moodle
© 2025 Università degli Studi di Milano-Bicocca
  • Privacy policy
  • Accessibility
  • Statistics