The course aims to provide PhD students with a solid and practical understanding of modern biological databases and their role in biomolecular research.
The course focuses on how biological information is organized, curated, identified, and integrated across heterogeneous resources, with particular emphasis on gene and protein identifiers, data versioning, cross-database mapping, and genome-centric data interpretation.
By the end of the course, students will be able to critically navigate biological databases, translate datasets across identifier systems, and correctly interpret genomic and functional information in a research-oriented context.
Contents
-
Overview of biological databases: primary, secondary, and integrative resources
-
Gene and protein identifiers: semantics, versioning, and stability
-
Cross-referencing and identifier translation across databases
-
Genome browsers as integrative platforms for biological data
-
Interpretation of genomic annotations and experimental tracks
Detailed program
Introduction to biological databases
Classification of biological databases (primary, secondary, integrative). Major international resources (NCBI, Ensembl, UniProt, PDB, pathway and functional databases). Strengths, limitations, and typical use cases.
Gene and protein identifiers
Concept of biological identifiers. Differences between stable and unstable identifiers. Versioning systems. Causes of identifier changes over time. One-to-many and many-to-one mappings. Common pitfalls in dataset interpretation.
Cross-database identifier mapping
Translation of gene and protein identifiers across databases. Use of integrated tools (e.g. BioMart). Handling ambiguous, deprecated, or unmapped identifiers. Best practices for dataset harmonization.
Genome browsers and genomic context
Conceptual foundations of genome browsers. Assemblies, coordinates, and gene models. Comparison of major genome browsers. Visualization of genes, transcripts, and regulatory elements.
User-provided data and custom tracks
Integration of experimental data into genome browsers. Interpretation of user tracks. Linking genomic localization to functional and biological interpretation.
Throughout the course, theoretical concepts are continuously reinforced through hands-on exercises performed directly in the computer classroom.
Prerequisites
Basic knowledge of molecular biology (genes, transcripts, proteins) is required.
No advanced computational skills are assumed; however, familiarity with standard bioinformatics terminology is recommended.
Teaching form
The course is taught entirely in a computer classroom and combines short frontal lectures with guided hands-on practical sessions.
Students actively interact with databases and tools during the lessons, applying concepts to realistic biological examples and datasets.
Assessment method
Assessment is based on an in-class practical assignment performed during the course.
Students are required to analyze and interpret a small biological dataset by navigating databases, translating identifiers, and integrating genomic information.
The assessment is evaluated on a pass/fail basis.