TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations

Description:

Genomes of seven (7) species supplied: Human, Mouse, Rat, Arabidopsis, Zebrafish, Chicken, Cow ... Zebrafish. Exclude entries. with undesired ECs. GO ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 48
Provided by: Gior4
Category:

less

Transcript and Presenter's Notes

Title: TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


1
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
  • George Papachristoudis
  • Sophia Kossida

Meeting on Bioinformatics and Medical
Informatics Athens, 4 5 October 2006
2
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Organization of the presentation
  • Introduction
  • Why was the development
  • of TAGGO a necessity?
  • Description of the tool
  • Results
  • Conclusions
  • Comparison
  • Acknowledgements

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
3
What is TAGGO?
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Introduction
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
4
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Introduction
Is it Tango?
No!
But they are both appealing Each on its domain
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
5
TAGGO is a tool which tries to derivethe
proteins main functions automatically
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Introduction
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
6
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Introduction
What are the main questions to fully characterize
a proteins main activities?
nucleus
Where is found to be active?
In what processes is involved in?
metabolism
What are its main functions?
protein
protein binding
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
7
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Organization of the presentation
  • Introduction
  • Why was the development
  • of TAGGO a necessity?
  • Description of the tool
  • Results
  • Comparison
  • Conclusions
  • Acknowledgements

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
8
three main reasons justify this
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Why was the development of TAGGO a necessity?
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
9
1) Biologists currently waste a lot of time and
effort in searching manually for the
characteristics of each protein.
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Why was the development of TAGGO a necessity?
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
10
2) In virtue of saving time, there is a
refinement in the information provided by large
amounts of data.
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Why was the development of TAGGO a necessity?
Each dataset contains 1000 - 3000 proteins, while
each protein is annotated to 5 - 12 GO terms
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
11
3) The process is hampered further by the wide
variations in terminology (nomenclature) that
each biologist group adopts
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Why was the development of TAGGO a necessity?
The Gene Ontology (GO)
  • The GO project
  • provides controlled, well structured vocabularies
  • widely acceptable nomenclature
  • The Gene Ontology Dictionary
  • http//www.geneontology.org/doc/GODict.DAT

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
12
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Organization of the presentation
  • Introduction
  • Why was the development
  • of TAGGO a necessity?
  • Description of the tool
  • Results
  • Comparison
  • Conclusions
  • Acknowledgements

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
13
What is the Gene Ontology Consortium?The GO
Consortium is the set of model organism and
protein databases and biological research
communities actively involved in the development
and application of the Gene Ontology project.
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 1st Part
What is the Gene Ontology project?The Gene
Ontology (GO) project is a collaborative effort
to address the need for consistent descriptions
of gene products in different databases. The
project began as a collaboration between three
model organism databases, the FlyBase
(Drosophila), the Saccharomyces Genome Database
(SGD) and the Mouse Genome Database (MGD), in
1998. Since then it has grown to include several
plant, animal and microbial databases.
Source An Introduction to the Gene Ontology The
GO Consortium
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
14
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 1st Part
GOA Databases
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
15
Genomes of seven (7) species supplied Human,
Mouse, Rat, Arabidopsis, Zebrafish, Chicken, Cow
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 1st Part
What is the Gene Annotation (GOA) project?The
GOA project provides high-quality Gene Ontology
(GO) annotations to proteins in the UniProt
Knowledgebase (SWISS-PROT / TrEMBL / PIR-PSD) and
InterPro databases.
Source Gene Ontology Annotation (GOA) Database
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
16
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
17
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 1st Part
  • GOA file format The associations between gene
    products and GO terms are stored in
    tab-delimited files. Each record is comprised
    of 15 fields (not all mandatory)
  • Most important fields
  • gene / gene product ID (e.g. Q5T0T3)
  • gene / gene product symbol
  • (e.g. TRIO_HUMAN)
  • gene / gene product synonym (IPI)
  • (e.g. IPI00396431)
  • GO term ID (e.g. GO0007582)
  • GO term aspect ( P, F or C)
  • Evidence Code (EC) (e.g. TAS, IEA)

EC is an index of the records reliability 13
different EC types 12 manual, 1 electronic (IEA)
Source Annotation File Fields
Source Guide to GO Evidence Codes
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
18
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 1st Part
Input proteins ------------------ BACH_HUMAN O0011
5 IPI00748153
BACH_HUMAN ? GO 0000062
?
BACH_HUMAN ? GO 0004759
GO 0004759
BACH_ HUMAN
GO 0000062
GO 0005764
O00115
?
GO 0005622
IPI00748153
O00115 ? GO0005764
IPI00748153 ? GO 0005622
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
19
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
20
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 2nd Part
What is The Gene Ontology (GO)?The Gene
Ontology is in fact an umbrella under which
reside three structured, controlled and
orthogonal vocabularies (ontologies) that
describe gene products in terms of their
associated biological processes, cellular
components and molecular functions in a
species-independent manner.
Gene Ontology
Biological Process Ontology
Biological Process Ontology
Molecular Function Ontology
Cellular Component Ontology
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
21
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 2nd Part
  • Structure of each ontology Each ontology is
    totally independent of the others It is a
    Directed Acyclic Graph (DAG) structure It allows
    multiple parentalship Comprised by IS_A and
    PART_OF links IS_A each term
    inherits the attributes of its parent(s)
    PART_OF each term is a component of its
    parent(s) IS_A relations outnumber PART_OF

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
22
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Example of BP Ontology
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
23
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Example of CC Ontology
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
24
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Example of MF Ontology
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
25
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 2nd Part
We now have gene product GO term pairs
O00115 endonuclease activity O00115 DNA
metabolism O00151 zinc ion binding O00168
integral to membrane O00170 transcription
factor binding O00217 iron ion binding O00264
integral to plasma membrane O00487
ubiquitin-dependent protein
catabolism
and we want to group them into general categories
catalytic activity
membrane
protein binding
ion binding
metabolism
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
26
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 2nd Part
  • How can we achieve that? Here comes the Gene
    Ontology
  • The terms high in the hierarchy (shallow terms)
    provide a sufficient deal of abstractness
    (generality)
  • Thus, we could consider them as categories which
    hold the basic features of their children
  • We could parse all the multiple paths leading
    from the term to the root of the ontology
  • Find all the parents of this parent
  • Sort the parents in terms of descending
    generality
  • Find the most general parent and regard it as
    the terms category

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
27
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 2nd Part
Defining generality The Information Content (IC)
of a term is an indication of its generality
The less informative a term c is, the more
general it is In our case b e (natural
logarithm) where
IC(c) logb(Prob(c))
In other words nc number of terms children
including the term itself nr number of roots
children including the root itself
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
28
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 2nd Part
  • development
  • response to stimulus
  • morphogenesis
  • response to stress
  • tissue development
  • response to external stimulus
  • growth
  • developmental growth
  • wound healing

response to stimulus
development
growth
descending IC
developmental growth
morphogenesis
response to stress
tissue development
response to external stimulus
wound healing
tissue regeneration development
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
29
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 2nd Part
  • cell
  • cell part
  • intracellular
  • intracellular part
  • organelle

cell
organelle
descending IC
cell part
a subtle change in CC ontology
intracellular
intracellular part
intracellular organelle intracellular
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
30
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
31
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Description of the tool 3rd Part
Creating pie charts
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
32
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Organization of the presentation
  • Introduction
  • Why was the development
  • of TAGGO a necessity?
  • Description of the tool
  • Results
  • Comparison
  • Conclusions
  • Acknowledgements

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
33
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Results
  • we used a human brain protein set as input
    set 1401 proteins
  • each protein is annotated approximately to 8
    terms 11429 protein GO term
    annotated pairs
  • huge amount of collected data
  • finding the ten most general categories of each
    term
  • determination of the categories of each protein
  • finding 15 CC categories,
  • 18 BP categories,
  • 20 MF categories

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
34
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Organization of the presentation
  • Introduction
  • Why was the development
  • of TAGGO a necessity?
  • Description of the tool
  • Results
  • Comparison
  • Conclusions
  • Acknowledgements

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
35
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Comparison
CC Ontology
The pie charts almost resemble each other Only
in intracellular, there is a big difference
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
36
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Comparison
CC Ontology
  • Slight differences gt The most
    remarkable in intracellular
  • Manual 1783 occurences Automatic 1961
    occurences

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
37
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Comparison
MF Ontology
The pie charts almost resemble each other Only
in catalytic activity, there is a big difference
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
38
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Comparison
MF Ontology
  • Slight differences gt The most
    remarkable in catalytic activity
  • Manual 2078 occurences Automatic 2858
    occurences

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
39
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Comparison
BP Ontology
The pie charts almost resemble each other
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
40
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Comparison
BP Ontology
  • Slight differences gt The most
    remarkable in metabolism
  • Manual 2167 occurences Automatic 2353
    occurences

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
41
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Organization of the presentation
  • Introduction
  • Why was the development
  • of TAGGO a necessity?
  • Description of the tool
  • Results
  • Comparison
  • Conclusions
  • Acknowledgements

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
42
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Conclusions
  • What are the goodies of this tool?
  • Potential of discarding annotations that are
    supported by not so reliable ECs
  • Extremely fast process
  • All the protein GO term pairs considered
  • Convenience of searching the ten most general
    categories of each term
  • Usage of one of the most reliable Biological
    Ontologies (Gene Ontology) for the results
    extraction (well structured ontologies based on
    biological evidence, widely accepted
    nomenclature)
  • On the other hand
  • Sometimes, terms are assigned to very abstract
    categories (low info content)
  • Categories with tiny percentages often do not
    merge into a broader
  • one

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
43
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
TAGGO
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
44
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Organization of the presentation
  • Introduction
  • Why was the development
  • of TAGGO a necessity?
  • Description of the tool
  • Results
  • Comparison
  • Conclusions
  • Acknowledgements

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
45
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Acknowledgements
  • My project supervisor
  • Assoc. Professor Sophia Kossida
  • for the constant feedback and support
  • as well as for our
  • constructive discussions
  • throughout our collaboration
  • Special thanks go to
  • Karin Soderman
  • for supplying the data and
  • contributing to the results comparison
  • and evaluation

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
46
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Acknowledgements
  • I want also to sincerely thank
  • Fotis E. Psomopoulos
  • (PhD student of ISSEL)
  • for offering me generously and
  • straightaway his help whenever
  • I asked for his advice
  • The Director of the
  • Intelligent Systems and Software
  • Engineering Lab (ISSEL),
  • Professor Pericles A. Mitkas
  • and my diploma thesis supervisor,
  • Sotiris T. Diplaris (PhD student of ISSEL)
  • for introducing me to such topics

Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
47
TAGGO a Tool for Automatic Grouping of Gene
Ontology Annotations
Questions
3
2
the end
1
times up! -)
Athens, 4 5 October 2006
Meeting on Bioinformatics and
Medical Informatics
Write a Comment
User Comments (0)
About PowerShow.com