Protein and Proteome Annotation - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Protein and Proteome Annotation

Description:

Objective - identify and describe all the physico-chemical, ... SCOP. http://scop.mrc-lmb.cam.ac.uk/scop/ Lecture 2.5. 20. Expression Databases. Swiss 2D Page ... – PowerPoint PPT presentation

Number of Views:351
Avg rating:3.0/5.0
Slides: 72
Provided by: Comp684
Category:

less

Transcript and Presenter's Notes

Title: Protein and Proteome Annotation


1
Protein and Proteome Annotation
  • David Wishart
  • University of Alberta
  • Edmonton, AB
  • david.wishart_at_ualberta.ca

2
Annotating 2D Gels
Trypsin Gel punch
p53
Trx
G6PDH
3
Is This Annotated?
p53
Information 1) pI 2) MW 3) name (abbr) 4)
accession 5) relative amnt
Trx
G6PDH
4
How About This?
Information 1) name (abbr) 2) accession 3)
relative amnt 4) coexpressors
5
Is This Annotated?
gtP12345 Sequence 1 GATTACAGATTACAGATTACAGATTACAGAT
TACAG ATTACAGATTACAGATTACAGATTACAGATTACAGA TTACAGA
TTACAGATTACAGATTACAGATTACAGAT TACAGATTAGAGATTACAGA
TTACAGATTACAGATT ACAGATTACAGATTACAGATTACAGATTACAGA
TTA CAGATTACAGATTACAGATTACAGATTACAGATTAC AGATTACAG
ATTACAGATTACAGATTACAGATTACA GATTACAGATTACAGATTACAG
ATTACAGATTACAG ATTACAGATTACAGATTACAGATTACAGATTACAG
A TTACAGATTACAGATTACAGATTACAGATTACAGAT
6
Protein Annotation
  • Objective - identify and describe all the
    physico-chemical, functional and structural
    properties of a protein including its sequence,
    accession , mass, pI, absorptivity, solubility,
    active sites, binding sites, reactions,
    substrates, homologues, function, name(s),
    abundance, location, 2o structure, 3D structure,
    domains, pathways, interacting partners

7
Protein vs. Proteome Annotation
  • Protein annotation is concerned with one or a
    small number (lt50) proteins from one or several
    types of organisms
  • Proteome annotation is concerned with entire
    proteomes (gt2000 proteins) from a specific
    organism (or for all organisms) - need for speed

8
Different Levels of Annotation
  • Sparse typical of many gel or microarray
    annotations, usually just includes name and
    accession number
  • Moderate typical of many sequence databases or
    of experiments aimed at identifying protein
    complexes or ligands
  • Detailed not typical (occasionally found in
    organism-specific databases)

9
Different Levels of Database Annotation
  • GenBank (large of sequences, minimal
    annotation)
  • PIR (large of sequences, slightly better
    annotation)
  • SwissProt (small of sequences, even better
    annotation)
  • Organsim-specific DB (very small of sequences,
    best annotation)

10
GenBank Annotation
11
PIR Annotation
12
Swiss-Prot Annotation
13
CCDB Annotation
14
CCDB Annotation
15
Ultimate Goal...
  • To achieve the same level of protein/proteome
    annotation as found in CCDB for all
    genes/proteins from 2D GE data, from microarray
    data or for sequence databases in general

How?
16
Annotation Methods
  • Annotation by homology (BLAST)
  • requires a large, well annotated database of
    protein sequences
  • Annotation by sequence composition
  • simple statistical/mathematical methods
  • Annotation by sequence features, profiles or
    motifs
  • requires sophisticated sequence analysis tools

17
Annotation by Homology
  • Statistically significant sequence matches
    identified by BLAST searches against GenBank
    (nr), SWISS-PROT, PIR, ProDom, BLOCKS, KEGG, WIT,
    Brenda, BIND
  • Properties or annotation inferred by name,
    keywords, features, comments

Databases Are Key
18
Sequence Databases
  • GenBank
  • www.ncbi.nlm.nih.gov/
  • EMBL/trEMBL
  • www.ebi.ac.uk/trembl/
  • DDBJ
  • www.nig.ac.jp/
  • PIR
  • http//pir.georgetown.edu/
  • SwissProt
  • www.expasy.ch/sprot/
  • UniProt
  • http//www.pir.uniprot.org/

19
Structure Databases
  • RCSB-PDB
  • http//www.rcsb.org/pdb/
  • MSD
  • http//www.ebi.ac.uk/msd/index.html
  • CATH
  • http//www.biochem.ucl.ac.uk/bsm/cath/
  • SCOP
  • http//scop.mrc-lmb.cam.ac.uk/scop/

20
Expression Databases
  • Swiss 2D Page
  • http//ca.expasy.org/ch2d/
  • SMD
  • http//genome-www5.stanford.edu/MicroArray/SMD/
  • ArrayExpress
  • http//www.ebi.ac.uk/arrayexpress/
  • Gene Expr. Omnibus
  • http//www.ncbi.nlm.nih.gov/geo/

21
Metabolism Databases
  • KEGG
  • http//www.genome.ad.jp/kegg/metabolism.html
  • Roche/Boeringer
  • http//www.expasy.org/cgi-bin/search-biochem-index
  • EcoCyc
  • www.ecocyc.org/
  • MetaCyc
  • http//metacyc.org/

22
Interaction Databases
  • BIND
  • http//www.blueprint.org/bind/bind.php
  • DIP
  • http//dip.doe-mbi.ucla.edu/
  • MINT
  • http//mint.bio.uniroma2.it/mint/
  • IntAct
  • http//www.ebi.ac.uk/intact/index.html

23
Bibliographic Databases
  • PubMed Medline
  • http//www.ncbi.nlm.nih.gov/PubMed/
  • Science Citation Index
  • http//isi4.isiknowledge.com/portal.cgi
  • Your Local eLibrary
  • www.XXXX.ca
  • Current Contents
  • http//www.isinet.com/isi/

24
Annotation by HomologyAn Example
  • 76 residue protein from Methanobacter
    thermoautotrophicum (newly sequenced)
  • What does it do?
  • MMKIQIYGTGCANCQMLEKNAREAVKELGIDAEFEKIKEMDQILEAGLTA
    LPGLAVDGELKIMGRVASKEEIKKILS

25
PSI BLAST
Select Database
26
PSI-BLAST
27
PSI-BLAST
28
PSI-BLAST
29
Conclusions
  • Protein is a thioredoxin or glutaredoxin
    (function, family)
  • Protein has thioredoxin fold (2o and 3D
    structure)
  • Active site is from residues 11-14 (active site
    location)
  • Protein is soluble, cytoplasmic (cellular
    location)

30
Annotation Methods
  • Annotation by homology (BLAST)
  • requires a large, well annotated database of
    protein sequences
  • Annotation by sequence composition
  • simple statistical/mathematical methods
  • Annotation by sequence features, profiles or
    motifs
  • requires sophisticated sequence analysis tools

31
Annotation by Composition
  • Molecular Weight
  • Isoelectric Point
  • UV Absorptivity
  • Hydrophobicity

32
Where To Go
33
Isoelectric Point
  • The pH at which a protein has a net charge0
  • Q S Ni/(1 10pH-pKi)

34
UV Absorptivity
  • OD280 (5690 x W 1280 x Y)/MW x Conc.
  • Conc. OD280 x MW/(5690 X W 1280 x Y)

OH
N
35
Hydrophobicity
  • Indicates Solubility
  • Indicates Stability
  • Indicates Location (membrane or cytoplasm)
  • Indicates Globularity or tendency to form
    spherical structure

36
Annotation Methods
  • Annotation by homology (BLAST)
  • requires a large, well annotated database of
    protein sequences
  • Annotation by sequence composition
  • simple statistical/mathematical methods
  • Annotation by sequence features, profiles or
    motifs
  • requires sophisticated sequence analysis tools

37
Where To Go
38
Sequence Feature Databases
  • PROSITE - http//www.expasy.ch/
  • BLOCKS - http//blocks.fhcrc.org/
  • DOMO - http//www.infobiogen.fr/services/domo/
  • PFAM - http//pfam.wustl.edu
  • PRINTS - http//www.biochem.ucl.ac.uk/bsm/dbrowser
    /PRINTS
  • SEQSITE - PepTool

39
What Can Be Predicted?
  • O-Glycosylation Sites
  • Phosphorylation Sites
  • Protease Cut Sites
  • Nuclear Targeting Sites
  • Mitochondrial Targ Sites
  • Chloroplast Targ Sites
  • Signal Sequences
  • Signal Sequence Cleav.
  • Peroxisome Targ Sites
  • ER Targeting Sites
  • Transmembrane Sites
  • Tyrosine Sulfation Sites
  • GPInositol Anchor Sites
  • PEST sites
  • Coil-Coil Sites
  • T-Cell/MHC Epitopes
  • Protein Lifetime
  • A whole lot more.

40
Cutting Edge Sequence Feature Servers
  • Membrane Helix Prediction
  • http//www.cbs.dtu.dk/services/TMHMM-2.0/
  • T-Cell Epitope Prediction
  • http//syfpeithi.bmi-heidelberg.com/scripts/MHCSer
    ver.dll/home.htm
  • O-Glycosylation Prediction
  • http//www.cbs.dtu.dk/services/NetOGlyc/
  • Phosphorylation Prediction
  • http//www.cbs.dtu.dk/services/NetPhos/
  • Protein Localization Prediction
  • http//psort.nibb.ac.jp/

41
Subcellular Localization
42
Subcellular Localization
http//www.cs.ualberta.ca/bioinfo/PA/Sub/
43
Proteome Analyst (SubCell)
44
2o Structure Prediction
  • PredictProtein-PHD (72)
  • http//cubic.bioc.columbia.edu/predictprotein/
  • Jpred (73-75)
  • http//www.compbio.dundee.ac.uk/www-jpred/submit.
    html
  • SAM-T02 (75)
  • http//www.cse.ucsc.edu/research/compbio/HMM-apps/
    T02-query.html
  • PSIpred (77)
  • http//bioinf.cs.ucl.ac.uk/psipred/psiform.html

45
Putting It All Together
Seq Motifs
Composition
Homology
46
Putting It All Together
  • PEDANT
  • http//pedant.gsf.de/
  • GeneQuiz
  • http//jura.ebi.ac.uk8765/ext-genequiz/
  • Magpie
  • http//magpie.ucalgary.ca/
  • Proteome Analyst
  • http//www.cs.ualberta.ca/bioinfo/PA/

47
(No Transcript)
48
Programs Used By Pedant
  • HMMER
  • PSORT
  • PREDATOR
  • COILS
  • FGENESH
  • pI
  • PROSEARCH
  • TargetP
  • SAPS
  • NCBI-BLAST
  • SEG
  • InterProScan
  • SignalP
  • TMHMM
  • tRNAscan-SE
  • GENSCAN

49
Databases Used By Pedant
  • EMBL
  • PIR-PSD
  • SWISS-PROT
  • Functional Cat
  • PROSITE
  • TrEMBL
  • Blocks
  • PDB
  • SCOP
  • COGs
  • Pfam
  • STRIDE

50
(No Transcript)
51
http//jura.ebi.ac.uk8765/gqsrv/submit
52
GeneQuiz Functions
  • Amino acid biosynthesis
  • Biosynthesis of cofactors, prosthetic
    groups, carriers
  • Cell envelope
  • Cellular processes
  • Central intermediary metabolism
  • Energy metabolism
  • Fatty acid and phospholipid metabolism
  • Other categories
  • Purines, pyrimidines, nucleosides, and
    nucleotides
  • Regulatory functions
  • Replication
  • Transcription
  • Translation
  • Transport and binding proteins
  • Unknown

53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
Home Page
58
Proteome Analyst
  • Uses PSI-BLAST, PSI-PRED and motif analysis tools
  • Extracts keyword information from homologues and
    uses Naïve Bayes classifiers to infer function
  • Combines sequence motif and sequence profile
    information to complete functional classification
  • Supports custom classifier/ontology

59
BacMap
  • Picking up where we left off with the CCDB
    (Google bacmap)
  • Idea is to generate a visual atlas of all (not
    just Escherichia coli) bacterial chromosomes and
    plasmids but with links to extensive genome
    annotation
  • Attempt to re-use annotation and graphing tools
    originally developed for the CCDB

60
BacMap
http//wishart.biology.ualberta.ca/BacMap/
61
BacMap
62
Text Search Tools
63
Sequence Search Tools
64
Bacterial Biography Card
65
Genome Statistics
66
Proteome Statistics
67
BacMap
  • Each genome has a short description of the
    organism and sequence data
  • Supports zoomable, hyperlinked, clickable map
    views of the genome
  • Supports text search of gene names, protein names
    and synonyms
  • Supports BLAST search and supplies genome-wide
    stats
  • Currently going through major update

Stothard P, et al. BacMap an interactive picture
atlas of annotated bacterial genomes. Nucleic
Acids Res. 2005 Jan 133 Database IssueD317-20.
68
What if Your Organism or Genome isnt in BacMap?
http//wishart.biology.ualberta.ca/basys/
69
BASys
  • Bacterial Annotation System
  • A publicly available web server that performs
    automated annotation of bacterial genomes given
    only the gene sequence of a chromosome or plasmid
  • Takes about 24 hrs for an average genome (4
    megabases)
  • Output includes images and annotation text (about
    70 fields for each gene)

70
Typical BASys Result
71
Conclusion
  • Genome annotation is the same as proteome
    annotation required after any gene sequencing
    and gene ID effort
  • Can be done either manually or automatically
  • Need for high throughput, automated pipelines
    to keep up with the volume of genome sequence
    data
  • Area of active research and development with
    about ½ of all bioinformaticians working on some
    aspect of this process
Write a Comment
User Comments (0)
About PowerShow.com