Nikolaj Blom - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Nikolaj Blom

Description:

Resources of Biomolecular Data: Sequences, Structures and Functionality PhD course #27803 Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 62
Provided by: LarsJ152
Category:

less

Transcript and Presenter's Notes

Title: Nikolaj Blom


1
Resources of Biomolecular Data Sequences,
Structures and Functionality PhD course 27803
Nikolaj Blom Center for Biological Sequence
Analysis BioCentrum-DTU Technical University of
Denmark nikob_at_cbs.dtu.dk
2
Outline
  • Magnitudes and Scales
  • Resources Data Sources Tools
  • Primary DNA sources
  • Sequence Repositories
  • Structure Repositories
  • Functional Categorization
  • Integration of Databases
  • The Human Genome
  • Genome Browsers
  • Prediction Tools
  • Evaluation of Prediction Servers
  • Starting points
  • Link collections

3
Learning Objectives
  • The student should be able to
  • Describe differences between sequence
    repositories and curated databases
  • Describe the challenges of maintaining
    genome-wide biological databases
  • List two entry points for getting an overview of
    my gene of interest
  • Describe how prediction servers may be evaluated

4
Resources Sources Tools
  • There is A LOT OF biomolecular databases/sources
  • A LOT OF overlap of information/redundancy
  • A LOT OF TOOLS
  • Personal picks/preferences
  • User-friendliness
  • Update intervals
  • Curation efforts / error correction
  • Linkage to other DBs

5
Faster than Moores law...
6
Faster than Moores law...
7
Human Genome Published HUGO Nature,
15.feb.2001 Celera Science, 16.feb.2001
8
Magnitudes and Scales
  • Human genome 3,200,000,000 bp
  • Single basepair ? full genome is 9 orders of
    magnitude
  • Genome Football field 3 billion leaves of
    grass
  • Single base A T G C (or SNP) 1 leaf of grass
  • Genome browsing
  • Zooming from whole stadium to single leaf

9
How we got the sequence
  • Sanger chain termination method

10
Primary DNA sources
  • Trace files repositories
  • Single read 500-1000 bp (golf ball size / jig
    saw puzzle)
  • Variable quality
  • WashU-Merck Human EST Project / Trace files
  • Base-calling non-trivial

G, C or nothing?
11
Assembly is Non-trivial!

12
Sequence repositories - GenBank et al.
  • GenBank / EMBL / DDBJ
  • Highly redundant (many versions of same gene)
  • Cross-updated daily
  • Version history is recorded
  • Previous sequence records can be retrieved
  • Contigs/HTGS (100-200 kb) finishing at different
    stages
  • Draft ? Finished
  • Includes genomic DNA, cDNA, ESTs, translated
    peptides

13
Non-redundant and Curated databases
  • Non-redundant
  • Manual or automatic curation
  • DNA
  • RefSeq (NCBI semi-automated)
  • Ensembl gene index (automated)
  • Protein
  • RefSeq (NCBI semi-automated)
  • TrEMBL (EMBL automated)

14
Curated database UniProt/SwissProt
  • SIB - Swiss Institute of Bioinformatics
  • Protein Knowledgebase / Sequence Database
  • Highly curated
  • Experimental evidence evaluated (e.g.
    modifications)
  • All 80,000 entries checked by Amos Bairoch
    himself -)
  • ExPASy - Expert Protein Analysis System
  • Proteomics tools links local servers

15
Structure databases / Protein Data Bank (PDB)
  • X-ray , NMR biomolecular structures
  • Protein Data Bank (PDB)
  • http//www.rcsb.org/pdb/

16
Structure databases / Protein Data Bank (PDB)
17
Functional Categorization
  • Gene Ontology (GO)
  • Hierarchical
  • Controlled vocabulary

18
Functional Categorization
  • Gene Ontology (GO) http//www.geneontology.org/
  • Molecular Function - the tasks performed by
    individual gene products examples are
    transcription factor and DNA helicase
  • Biological Process - broad biological goals, such
    as mitosis or purine metabolism, that are
    accomplished by ordered assemblies of molecular
    functions
  • Cellular Component - subcellular structures,
    locations, and macromolecular complexes examples
    include nucleus, telomere, and origin recognition
    complex

19
Integration of databases - Webs of web-sites
  • Links, links, links...
  • SRS Sequence Retrieval System
  • Powerful, complex query language
  • BioDAS Distributed Annotation System

http//srs.ebi.ac.uk/
20
For my gene, how do I
  • Get an overview of the sequence information
    known? (GeneCardsOMIM)
  • Examine the Genome Neighbourhood? (Genome
    Browsers)
  • Predict protein post-translational modifications
    (PTMs)? (Prediction servers)
  • (Evaluate the value of predicted features)

21
GeneCards http//nciarray.nci.nih.gov/cards/
22
GeneCards-II
23
GeneCards-III
24
GeneCards-IV
25
GeneCards-V
26
Genetic/Medical Information
  • OMIM, Online Mendelian Inheritance in Man (NCBI)
  • The OMIM database is a catalog of human genes and
    genetic disorders
  • gt16,000 entries (April, 2006)
  • Examples cystic fibrosis, prions, amyloid
    precursor protein
  • Condensed, highly curated descriptions of
    genetics/disease/animal models/references

27
OMIM-I (http//www3.ncbi.nlm.nih.gov/Omim/)
28
OMIM-II
29
OMIM-III
30
For my gene, how do I
  • Get an overview of the sequence information
    known? (GeneCardsOMIM)
  • Examine the Genome Neighbourhood? (Genome
    Browsers)
  • Predict protein post-translational modifications
    (PTMs)? (Prediction servers)
  • (Evaluate the value of predicted features)

31
Genome Browsing
  • Three public
  • Open access
  • Use same genome build/assembly
  • NCBI (U.S.)
  • UCSC (Santa Cruz, U.S.)
  • EnsEmbl (EBI, EU)
  • (One private)
  • (Restricted, commercial closed 2005)

32
Celera Discovery System Database
33
Genome Browsers - Portals to the Genomic World
  • UCSC Univ. California Santa Cruz (U.S.)
  • http//genome.ucsc.edu/
  • NCBI National Center for Biotechnology
    Information (U.S.)
  • http//www.ncbi.nlm.nih.gov/Genomes/index.html
  • EnsEmbl European Molecular Biology Laboratory
    (E.U.)
  • http//www.ensembl.org/

34
UCSC Genome Browser
35
UCSC Genome Browser II
36
NCBI
37
NCBI
38
(No Transcript)
39
EnsEmbl Genome Browser
40
EnsEmbl Genome Browser
41
EnsEmbl Genome Browser
42
EnsEmbl Genome Browser
43
EnsEmbl Genome Browser
44
EnsEmbl Genome Browser
45
For my gene, how do I
  • Get an overview of the sequence information
    known? (GeneCards)
  • Examine the Genome Neighbourhood? (Genome
    Browsers)
  • Predict protein post-translational modifications
    (PTMs) or Gene Structure? (Prediction servers)
  • ...and evaluate the reliability of prediction
    methods

46
CBS Services/Toolbox http//www.cbs.dtu.dk/service
s/
47
(No Transcript)
48
(No Transcript)
49
NetPhos a prediction server
http//www.cbs.dtu.dk/services/NetPhos/
50
NetPhos a prediction server
51
Evaluating Prediction Servers
  • Performance on independent/cross-validated data
    presented?
  • Published in peer-reviewed journal?
  • Cited by others?
  • Science Citation Index
  • Linked to from credible web sites?
  • Google Page-rank
  • linkURL search

52
Evaluating Prediction Servers
53
2can Bioinformatics Education
  • At EBI European Bioinformatics Institute
  • http//www.ebi.ac.uk/2can/index.html
  • Tutorials, resource links, etc.

54
EnsEMBL Bioinformatics Education
55
Starting Points
  • General Bioinformatics
  • NCBI, National Center for Biotechnology
    Information, U.S.
  • EBI, European Bioinformatics Institute
  • Prediction Tools
  • CBS, DK
  • Expasy (Protein analysis), Switzerland

56
Dynamic Resources
  • Pros
  • Includes most recent developments
  • Updated regularly
  • User interface improves(usually)
  • Cons
  • Difficult to keep pace
  • Tutorials and lectures hard to recycle -(
  • Difficult to use at irregular intervals

57
Genome Browsers - Portals to the Genomic World
  • Three main entry points
  • NCBI, UCSC, EnsEmbl
  • Essentially contain same information
  • High degree of linking to secondary databases
  • Advisable to become familiar with only one genome
    browser
  • Learn to navigate and make queries
  • GeneCards and OMIM
  • well suited for getting a quick overview of a
    gene of interest

58
Prediction Servers
  • Evaluate scientific soundness
  • Look for indications of quality (citations, etc.)
  • Remember that prediction servers provide...well,
    predictions!

59
Learning Objectives
  • The student should be able to
  • Describe differences between sequence
    repositories and curated databases
  • Describe the challenges of maintaining
    genome-wide biological databases
  • List two entry points for getting an overview of
    my gene of interest
  • Describe how prediction servers may be evaluated

60
Immediate Feedback
Title Resources of Biomolecular Data
Sequences, Structures and Functionality
  • Did the lecture live up to your expectations?
  • Did you expect to learn about resources that were
    not covered during this lecture?
  • NB! You can also provide input at the general
    course evaluation

61
The End
25,000?
Write a Comment
User Comments (0)
About PowerShow.com