The%20Gene%20Ontology%20Annotation%20(GOA)%20Database%20and%20enhancement%20of%20GO%20annotations%20through%20InterPro2GO - PowerPoint PPT Presentation

About This Presentation
Title:

The%20Gene%20Ontology%20Annotation%20(GOA)%20Database%20and%20enhancement%20of%20GO%20annotations%20through%20InterPro2GO

Description:

The Gene Ontology Annotation (GOA) Database and enhancement of GO annotations ... GOA's priority is to annotate the human, mouse and rat proteomes ... – PowerPoint PPT presentation

Number of Views:154
Avg rating:3.0/5.0
Slides: 38
Provided by: rodrig7
Learn more at: http://geneontology.org
Category:

less

Transcript and Presenter's Notes

Title: The%20Gene%20Ontology%20Annotation%20(GOA)%20Database%20and%20enhancement%20of%20GO%20annotations%20through%20InterPro2GO


1
The Gene Ontology Annotation (GOA) Database and
enhancement of GO annotations through InterPro2GO
  • Nicky Mulder
  • mulder_at_ebi.ac.uk

2
Contents
  • Introduction to GOA
  • Manual GOA annotation
  • Electronic annotation
  • InterPro2GO
  • GOA data flow
  • Uses of GOA
  • Future plans

3
What is GO annotation?
GO Term ID
  • An annotation is a statement that a gene product
  • has a particular molecular function
  • is involved in a particular biological process
  • is located within a certain cellular component
  • as determined by a particular method
  • as described in a particular reference.

Evidence Code
Reference
4
Gene Ontology Annotation (GOA) Database
  • GOAs priority is to annotate the human, mouse
    and rat proteomes
  • Largest open-source contributor of annotations to
    GO
  • Provides 10 million annotations for more than
    111,000 species
  • Share and integrate GO annotation

5
How do we annotate GO terms
? Manual Annotation ? Electronic
Annotation
  • All annotations must
  • be attributed to a source
  • indicate what evidence was found to support the
    GO term-gene/protein association

6
Manual annotation
  • High quality
  • Specific gene or gene product associations made
    using
  • Peer reviewed papers
  • Evidence codes
  • BUT
  • Time-consuming
  • Requires trained biologists

7
Manual GO annotation
8
Protein2GO tool Online
9
Information captured by GOA
Source GOID Term Evid RefDB RefID With DB With ID Qualifier
10
How successful is manual-GOA?
Source No. of annotations No. of distinct proteins
Proteome Inc. 22054 6568
UniProt 67910 13697
IntAct 22002 11013
MGI 124919 29837
SGD 21761 5076
FlyBase 52386 8775
RGD 8036 3369
HGNC 3699 798
GeneDB 5502 1384
TAIR/TIGR 3367 1895
ZFIN 1012 334
Roslin Institute 14 6
AgBase 889 173
Reactome 15 12
WormBase 893 443
TIGR 139 79
Gramene 139 2812
GDB 165 103
TOTAL MANUAL 336237 70728
111740 taxa
July 2006
11
Electronic Annotation
  • Large-scale assignment of GO terms to UniProtKB
    entries using existing information within
    database entries and manual mappings
  • Get IEA evidence code

12
www.uniprot.org/
13
Mappings of external concepts to GO
http//www.geneontology.org/GO.indices.shtml
14
InterPro2GO mapping
  • InterPro is a resource that integrates protein
    signatures databases, e.g. Pfam, Prints, Prosite,
    ProDom, SMART, TIGRFAMs etc.
  • It provides a means of classifying proteins into
    families and identifying domains.
  • Each InterPro entry groups proteins belonging to
    the same family and potentially having the same
    function

15
InterPro2Go mapping
  • Done manually, but using tools
  • Look at InterPro and protein annotation
  • For all Swiss-Prot proteins matching entry truly
  • Get stats on DE lines, keywords, comments
  • Check how conserved common annotation is
  • Find appropriate GO term at most specific level
    that applies to all proteins (not necessarily
    domains)

16
Tools used SQUID
Statistics options keyword description Gene
name Organism Comments, etc.
17
SQUID statistics output
18
SQUID statistics output
19
InterPro2GO mapping in entry
20
InterProScan output with GO terms
21
InterPro2GO sanity checks
  • Run weekly
  • Reports
  • Obsolete GO terms
  • Obsolete (deleted) IPRs
  • Secondary IPRs

22
Quality of GO mapping
  • BioCreAtIvE test set -635 GO annotations through
    InterPro2GO

Manually checked 44 proteins, 107 predictions 97
correct (90) -40 exact -57 same lineage 10
new lineage (unknown) 0 incorrect
Camon et al., 2005, BMC Bioinformatics
23
InterPro2GO mapping statistics
Total no. IPRS mapped to GO 7126
of IPRs mapped to at least 1 GO term 54
No. IPRS mapped to molecular function 5741
No. IPRS mapped to biological process 5543
No. IPRS mapped to cellular component 3426
No. GO terms mapped 2811
No. UniProt proteins mapped through interpro2go 2006489 (61)
UniProt covered by InterPro 77.6
24
How successful is IEA-GOA in general?
  • Provides large coverage
  • High Quality
  • However these annotations often use high-level
    GO terms and provide little detail.

IEA Method No. of annotations No. of distinct proteins
InterPro2GO 6281916 2006489
HAMAP2GO 199904 85814
SP Keyword2GO 3613883 1287830
EC2GO 207540 202657
TOTAL 10303243 2167001
Manual ones 336237 70728
Jun 2006
25
Total GO statistics
Total no. GO annotations 10639480
GO associations manual 3.16
GO associations electronic 96.84
GO associations interpro2GO 59
Total no. proteins annotated to GO 2168717
UniProt GO annotated in total 68.2
UniProt GO annotated manually 2.2
UniProt GO annotated electronically 66
UniProt GO annotated through interpro2go 61
26
GOA data flow
Gene association files
27
Gene Association file format
http//www.geneontology.org/GO.annotation.shtml
28
Example GOA cow file
29
Output from the GOA database
New
Non-Redundant based on IPI
GOA Cow
Redundant
GA slim for UniProt GO slims
Data also available in SRS, UniProt, QuickGO,
MODs, Ensembl etc.
30
GA Files for Non-redundant species
  • Non-redundant complete protein set for each
    proteome is identified (gt25 GO coverage)
  • Includes UniProt, IPI and MOD-specific IDs, e.g.
    mouse (MGI), rat (RGD), zebrafish (ZFIN) etc.
  • Xref files available with identifiers from
    UniProt, IPI, RefSeq, Ensembl, UniGene etc.

ftp//ftp.ebi.ac.uk/pub/databases/GO/goa ftp//ftp
.ebi.ac.uk/pub/databases/integr8
31
Uses of GOA data
  • Access protein functional information
  • Look at relationships between proteins, e.g.
    IntAct
  • Connect biological information to gene expression
    data
  • Determine functional composition of a proteome
    using GO slim

32
Uses of GOA
Find functional information on proteins
http//www.ebi.ac.uk/ego
33
Uses of GOA
Find functional information on interaction
proteins (IntAct)
httpwww.ebi.ac.uk/intact
34
Uses of GOA
Overview proteome with GO Slim
http//www.ebi.ac.uk/integr8
35
Uses of GOA
Analysis of high-throughput data according to GO
Microarray data analysis
Proteomics data analysis
GO classification
GO classification
Larkin JE et al, Physiol Genomics, 2004
Kislinger T et al, Mol Cell Proteomics, 2003
Cunliffe HE et al, Cancer Res, 2003
36
Future plans
  • Continue deep level annotation of human, mouse
    and rat
  • Manually annotate splice variants
  • Outreach and inclusion of new datasets e.g. grape
  • New electronic mappings, e.g. unipathway2go
  • Ortholog prediction for electronic GO annotation
  • Develop tools for annotation training

37
Acknowledgements
Rolf Apweiler Head of sequence database group
Evelyn Camon GOA Coordinator Daniel
Barrell GOA Programmer Emily Dimmer GOA
Curator Rachael Huntley GOA Curator David
Binns John Maslen QuickGO, GOA tools All EBI
UniProtKB Curators, HAMAP(SIB), IntAct, GO
Editorial Office _at_ EBI All GO Consortium
associate members
Write a Comment
User Comments (0)
About PowerShow.com