Title: TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations
1TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
- George Papachristoudis
- Sophia Kossida
Meeting on Bioinformatics and Medical
Informatics Athens, 4 5 October 2006
2 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Organization of the presentation
- Introduction
- Why was the development
- of TAGGO a necessity?
- Description of the tool
- Results
- Conclusions
- Comparison
- Acknowledgements
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
3What is TAGGO?
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Introduction
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
4 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Introduction
Is it Tango?
No!
But they are both appealing Each on its domain
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
5TAGGO is a tool which tries to derivethe
proteins main functions automatically
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Introduction
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
6 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Introduction
What are the main questions to fully characterize
a proteins main activities?
nucleus
Where is found to be active?
In what processes is involved in?
metabolism
What are its main functions?
protein
protein binding
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
7 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Organization of the presentation
- Introduction
- Why was the development
- of TAGGO a necessity?
- Description of the tool
- Results
- Comparison
- Conclusions
- Acknowledgements
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
8three main reasons justify this
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Why was the development of TAGGO a necessity?
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
91) Biologists currently waste a lot of time and
effort in searching manually for the
characteristics of each protein.
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Why was the development of TAGGO a necessity?
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
102) In virtue of saving time, there is a
refinement in the information provided by large
amounts of data.
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Why was the development of TAGGO a necessity?
Each dataset contains 1000 - 3000 proteins, while
each protein is annotated to 5 - 12 GO terms
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
113) The process is hampered further by the wide
variations in terminology (nomenclature) that
each biologist group adopts
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Why was the development of TAGGO a necessity?
The Gene Ontology (GO)
- The GO project
- provides controlled, well structured vocabularies
- widely acceptable nomenclature
- The Gene Ontology Dictionary
- http//www.geneontology.org/doc/GODict.DAT
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
12 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Organization of the presentation
- Introduction
- Why was the development
- of TAGGO a necessity?
- Description of the tool
- Results
- Comparison
- Conclusions
- Acknowledgements
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
13What is the Gene Ontology Consortium?The GO
Consortium is the set of model organism and
protein databases and biological research
communities actively involved in the development
and application of the Gene Ontology project.
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 1st Part
What is the Gene Ontology project?The Gene
Ontology (GO) project is a collaborative effort
to address the need for consistent descriptions
of gene products in different databases. The
project began as a collaboration between three
model organism databases, the FlyBase
(Drosophila), the Saccharomyces Genome Database
(SGD) and the Mouse Genome Database (MGD), in
1998. Since then it has grown to include several
plant, animal and microbial databases.
Source An Introduction to the Gene Ontology The
GO Consortium
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
14 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 1st Part
GOA Databases
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
15Genomes of seven (7) species supplied Human,
Mouse, Rat, Arabidopsis, Zebrafish, Chicken, Cow
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 1st Part
What is the Gene Annotation (GOA) project?The
GOA project provides high-quality Gene Ontology
(GO) annotations to proteins in the UniProt
Knowledgebase (SWISS-PROT / TrEMBL / PIR-PSD) and
InterPro databases.
Source Gene Ontology Annotation (GOA) Database
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
16 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
17 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 1st Part
- GOA file format The associations between gene
products and GO terms are stored in
tab-delimited files. Each record is comprised
of 15 fields (not all mandatory)
- Most important fields
- gene / gene product ID (e.g. Q5T0T3)
- gene / gene product symbol
- (e.g. TRIO_HUMAN)
- gene / gene product synonym (IPI)
- (e.g. IPI00396431)
- GO term ID (e.g. GO0007582)
- GO term aspect ( P, F or C)
- Evidence Code (EC) (e.g. TAS, IEA)
EC is an index of the records reliability 13
different EC types 12 manual, 1 electronic (IEA)
Source Annotation File Fields
Source Guide to GO Evidence Codes
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
18 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 1st Part
Input proteins ------------------ BACH_HUMAN O0011
5 IPI00748153
BACH_HUMAN ? GO 0000062
?
BACH_HUMAN ? GO 0004759
GO 0004759
BACH_ HUMAN
GO 0000062
GO 0005764
O00115
?
GO 0005622
IPI00748153
O00115 ? GO0005764
IPI00748153 ? GO 0005622
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
19 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
20 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 2nd Part
What is The Gene Ontology (GO)?The Gene
Ontology is in fact an umbrella under which
reside three structured, controlled and
orthogonal vocabularies (ontologies) that
describe gene products in terms of their
associated biological processes, cellular
components and molecular functions in a
species-independent manner.
Gene Ontology
Biological Process Ontology
Biological Process Ontology
Molecular Function Ontology
Cellular Component Ontology
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
21 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 2nd Part
- Structure of each ontology Each ontology is
totally independent of the others It is a
Directed Acyclic Graph (DAG) structure It allows
multiple parentalship Comprised by IS_A and
PART_OF links IS_A each term
inherits the attributes of its parent(s)
PART_OF each term is a component of its
parent(s) IS_A relations outnumber PART_OF
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
22 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Example of BP Ontology
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
23 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Example of CC Ontology
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
24 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Example of MF Ontology
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
25 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 2nd Part
We now have gene product GO term pairs
O00115 endonuclease activity O00115 DNA
metabolism O00151 zinc ion binding O00168
integral to membrane O00170 transcription
factor binding O00217 iron ion binding O00264
integral to plasma membrane O00487
ubiquitin-dependent protein
catabolism
and we want to group them into general categories
catalytic activity
membrane
protein binding
ion binding
metabolism
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
26 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 2nd Part
- How can we achieve that? Here comes the Gene
Ontology - The terms high in the hierarchy (shallow terms)
provide a sufficient deal of abstractness
(generality) - Thus, we could consider them as categories which
hold the basic features of their children - We could parse all the multiple paths leading
from the term to the root of the ontology - Find all the parents of this parent
- Sort the parents in terms of descending
generality - Find the most general parent and regard it as
the terms category
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
27 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 2nd Part
Defining generality The Information Content (IC)
of a term is an indication of its generality
The less informative a term c is, the more
general it is In our case b e (natural
logarithm) where
IC(c) logb(Prob(c))
In other words nc number of terms children
including the term itself nr number of roots
children including the root itself
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
28 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 2nd Part
- development
- response to stimulus
- morphogenesis
- response to stress
- tissue development
- response to external stimulus
- growth
- developmental growth
- wound healing
response to stimulus
development
growth
descending IC
developmental growth
morphogenesis
response to stress
tissue development
response to external stimulus
wound healing
tissue regeneration development
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
29 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 2nd Part
- cell
- cell part
- intracellular
- intracellular part
- organelle
cell
organelle
descending IC
cell part
a subtle change in CC ontology
intracellular
intracellular part
intracellular organelle intracellular
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
30 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
31 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 3rd Part
Creating pie charts
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
32 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Organization of the presentation
- Introduction
- Why was the development
- of TAGGO a necessity?
- Description of the tool
- Results
- Comparison
- Conclusions
- Acknowledgements
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
33 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Results
- we used a human brain protein set as input
set 1401 proteins - each protein is annotated approximately to 8
terms 11429 protein GO term
annotated pairs - huge amount of collected data
- finding the ten most general categories of each
term - determination of the categories of each protein
- finding 15 CC categories,
- 18 BP categories,
- 20 MF categories
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
34 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Organization of the presentation
- Introduction
- Why was the development
- of TAGGO a necessity?
- Description of the tool
- Results
- Comparison
- Conclusions
- Acknowledgements
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
35 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Comparison
CC Ontology
The pie charts almost resemble each other Only
in intracellular, there is a big difference
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
36 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Comparison
CC Ontology
- Slight differences gt The most
remarkable in intracellular - Manual 1783 occurences Automatic 1961
occurences
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
37 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Comparison
MF Ontology
The pie charts almost resemble each other Only
in catalytic activity, there is a big difference
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
38 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Comparison
MF Ontology
- Slight differences gt The most
remarkable in catalytic activity - Manual 2078 occurences Automatic 2858
occurences
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
39 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Comparison
BP Ontology
The pie charts almost resemble each other
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
40 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Comparison
BP Ontology
- Slight differences gt The most
remarkable in metabolism - Manual 2167 occurences Automatic 2353
occurences
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
41 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Organization of the presentation
- Introduction
- Why was the development
- of TAGGO a necessity?
- Description of the tool
- Results
- Comparison
- Conclusions
- Acknowledgements
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
42 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Conclusions
- What are the goodies of this tool?
- Potential of discarding annotations that are
supported by not so reliable ECs - Extremely fast process
- All the protein GO term pairs considered
- Convenience of searching the ten most general
categories of each term - Usage of one of the most reliable Biological
Ontologies (Gene Ontology) for the results
extraction (well structured ontologies based on
biological evidence, widely accepted
nomenclature) - On the other hand
- Sometimes, terms are assigned to very abstract
categories (low info content) - Categories with tiny percentages often do not
merge into a broader - one
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
43 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
TAGGO
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
44 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Organization of the presentation
- Introduction
- Why was the development
- of TAGGO a necessity?
- Description of the tool
- Results
- Comparison
- Conclusions
- Acknowledgements
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
45 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Acknowledgements
- My project supervisor
- Assoc. Professor Sophia Kossida
- for the constant feedback and support
- as well as for our
- constructive discussions
- throughout our collaboration
- Special thanks go to
- Karin Soderman
- for supplying the data and
- contributing to the results comparison
- and evaluation
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
46 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Acknowledgements
- I want also to sincerely thank
- Fotis E. Psomopoulos
- (PhD student of ISSEL)
- for offering me generously and
- straightaway his help whenever
- I asked for his advice
- The Director of the
- Intelligent Systems and Software
- Engineering Lab (ISSEL),
- Professor Pericles A. Mitkas
- and my diploma thesis supervisor,
- Sotiris T. Diplaris (PhD student of ISSEL)
- for introducing me to such topics
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
47 TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Questions
3
2
the end
1
times up! -)
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics