Genomic signatures of an aquatic lifestyle: rate variation of orthologous genes from arthropods living in water versus those living on land - PowerPoint PPT Presentation

Loading...

PPT – Genomic signatures of an aquatic lifestyle: rate variation of orthologous genes from arthropods living in water versus those living on land PowerPoint presentation | free to download - id: 7792d0-MDYyY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Genomic signatures of an aquatic lifestyle: rate variation of orthologous genes from arthropods living in water versus those living on land

Description:

Genomic signatures of an aquatic lifestyle: rate variation of orthologous genes from arthropods living in water versus those living on land Prediction of genes ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 38
Provided by: shara88
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Genomic signatures of an aquatic lifestyle: rate variation of orthologous genes from arthropods living in water versus those living on land


1
Genomic signatures of an aquatic lifestyle rate
variation of orthologous genes from arthropods
living in water versus those living on
land Prediction of genes involved in influencing
terrestrial and aquatic lifestyles in
arthropods A Bioinformatics PipeLine Mortha
Sharat Kumar Sumit Middha John K. Colbourne May
17th 2007
2
Overview
Problem Statement Background Data
Methods Results Future Work References Acknowledg
ements
3
gtProblem Statement
Problem Statement
  • Based on the knowledge of
  • Organismal lineages of arthropods
  • Morphology
  • Habitat diversity
  • Gene sequence data
  • Consider arthropods with aquatic and terrestrial
    lifestyles, Using just techniques and tools in
    Comparative Genomics to predict rate variations
    in Orthologs.
  • Can we - Predict the genes which might have a key
    role in supporting aquatic or terrestrial
    lifestyles in arthropods?

4
gtProblem Statement
Problem Statement
  • A Bioinformatics Pipeline
  • Have a structured methodology, steps - a Pipeline
    for future projects.
  • Spend less time and effort on thinking about the
    correct steps to be followed - Have a fixed
    methodology.
  • Learn from mistakes.
  • Spend minimal time tweaking the code.
  • Spend more time playing with the data and
    analyses than spend time on writing code for
    future projects.
  • Is it a Tool?
  • No.
  • You wont get all the results on the click of a
    button. Too many things involved.
  • Programs. Some tweaking necessary based on the
    data, number of organisms.

5
Problem Statement gtBackground
Background
  • Homologs, Orthologs and Paralogs
  • Homologs -
  • A gene related to a second gene by descent from a
    common ancestral DNA sequence.
  • Superset of Orthologs and Paralogs.
  • Orthologs -
  • Orthologs are genes in different species that
    evolved from a common ancestral gene
  • Result of a Speciation event.
  • Normally, orthologs retain the same function in
    the course of evolution.
  • Paralogs -
  • Paralogs are genes related by duplication within
    a genome.
  • Paralogs evolve new functions, even if these are
    related to the original one.

6
Problem Statement gtBackground
Background
Homologs, Orthologs and Paralogs
http//www.ncbi.nlm.nih.gov/Education/BLASTinfo/or
thologs3.gif What we are interested are
Orthologs.
7
Problem Statement gtBackground
Background
Evolutionary Rates Rate at which genes evolve
in a particular lineage. Measured by the number
of amino acid substitution with some underlying
algorithm model Substitution Model Molecular
Clock Hypothesis This postulates that the rate
of evolution measured by the amino acid
substitution is roughly constant overtime and
across different lineages. However the
evolutionary rates of some genes are higher/lower
across certain lineage groups. What does
correlation in Evolutionary Rates mean? Selective
forces acting on these genes have been similar
between the lineages. What does this
mean? Lifestyles / Environment of the organisms.
8
Problem Statement gtBackground
Species Introduced
Anopheles gambiae
Tribolium castenum
Drosophila melanogaster
Daphnia pulex
Daphnia magna
Apis mellifera
Caenorhabditis elegans
9
Problem Statement gtBackground
Fruit Fly Drosophila melanogaster
Lifestyle - Terrestrial
Mosquito Anopheles gambiae Lifestyle - Aquatic
/ Terrestrial
Beetle Tribolium casteneum Lifestyle -
Terrestrial
Honey Bee Apis mellifera Lifestyle - Terrestrial
Water Flea Daphnia magna Lifestyle - Aquatic
Water Flea Daphnia pulex Lifestyle - Aquatic
Nematode worm Caenorhabditis elegans Lifestyle
- Terrestrial
10
Problem Statement gtBackground
Phylogeny
11
Problem Statement gtBackground
Phylogeny of the Species
12
Problem Statement gtBackground
Aquatic Genes
Ox
Ox
Ox
The Ortholog Cluster Ox has Similar Substitution
Rates - Similar Evolutionary Rates - Similar
Selective Forces acting on them? More closely
related to other species. Could they play role in
supporting aquatic lifestyle?
13
Problem Statement gtBackground
Terrestrial Genes
Oy
Oy
Oy
What about Ortholog Cluster Oy? Could they play
role in supporting Terrestrial lifestyle?
14
Problem Statement Background gtData
Data
  • About the Data -
  • Varied sources.
  • The number of sequences for each organism vary.
  • Annotated amino acids to EST Contigs.
  • Lengths of the sequences differ greatly.

15
Problem Statement Background gtData
.Data - Sequence Lengths in base pairs
16
Problem Statement Background Data gtMethods
Methods
Detect Orthologs
All-against-All Criteria
Alignments
Cleaning of Alignments
Evolutionary rate tests
Analysis
17
Problem Statement Background Data gtMethods
Methods - The PipeLine
Detect Orthologs
RBBH
All-against-All Criteria
Scripts
Alignments
TCoffee
Cleaning of Alignments
Scripts
Evolutionary rate tests
RRTree
Results
Analysis
18
Problem Statement Background Data gtMethods
Methods - RBBH
RBBH Reciprocal Best Blast Hits - What is
it? Proteins from different organisms that are
each others top Blast hit.
Ax
By
Gene x from A and gene y from B are orthologs.
What if -
Ax
By
Cz
Can x, y and z be considered an ortholog cluster?
19
Problem Statement Background Data gtMethods
Methods - All-Against-All Criteria
  • One protein sequence from each organism is
    accepted into an Ortholog cluster if each protein
    has a RBBH from every other Organism.
  • All-against- All is very stringent.Very high
    confidence in the inferred Orthology.

A
For 5 Organisms. We have.
C
B
D
E
20
Problem Statement Background Data gtMethods
Methods - All-Against-All Criteria
No of Organisms No of Blasts
2 2
3 6
4 12
5 20
6 30
7 42
After checking the All-Against-All Criteria we
are left with high confidence ortholog clusters.
21
Problem Statement Background Data gtMethods
Methods - Alignments and Cleaning
  • Alignments were carried out using TCoffee.
  • The leading and the trailing gaps -
  • Do not correspond to Indels.
  • Do not have information associated with them.
  • If the leading and trailing gaps are not clipped?
  • Inaccurate Substitution Rates result.
  • They leading and the trailing gaps have to be
    clipped -
  • Clipped from the start and the end of an
    alignment when a highly conserved block is
    encountered.

22
Problem Statement Background Data gtMethods
23
Problem Statement Background Data gtMethods
Methods - Alignments and Cleaning
Black - Before trimming, Red - After trimming
24
Problem Statement Background Data gtMethods
Relative Rate Tests - RRTree
What exactly is Relative Rate Tests? Calculates
the rate of amino acid/nucleotide substitution
across lineages with respect to an outgroup.
25
Problem Statement Background Data gtMethods
Relative Rate Tests - Models
  • Kimura 2 Parameter -
  • Jukes Cantor
  • Uncorrected Distance

Substitution Matrix
Base Frequencies?
26
Problem Statement Background Data Methods gtResults
Results
Pairwise Ortholog Distribution between species
27
Problem Statement Background Data Methods gtResults
Ortholog Detection Tools - Each have their own
underlying Algorithm - COGs - Clusters of
Orthologous Groups OrthoMCL InParanoid KOG -
euKaryotic Orthologous Groups The paper - Tim
Hulsen, Martin A Huygen, Jacob de Vileg and Peter
MA Groenen Benchmarking ortholog identification
methods using functional genomics data Rated
InParanoid as the best Ortholog Detection
tool. InParanoid is also one of the most widely
used tool .
28
Problem Statement Background Data Methods gtResults
  • Why not just used a published tool like
    InParanoid for Ortholog Detection?
  • The benchmarking paper - InParanoid gave the
    largest number of False Positive.
  • False Positives - Paralogs.
  • Paralogs are undesirable in our study. We are
    interested in genes with the same function..
  • RBBH gave the least number of False Positives
  • How did our RBBH method fare when compared to
    InParanoid?

29
Problem Statement Background Data Methods gtResults
Results
Orthologs clusters present is all -- Drosophila
melanogaster Anopheles gambiae Tribolium
casteneum Apis mellifera Daphnia pulex Daphnia
magna Caenorhabditis elegans The 5 species with
atleast Daphnia magna or Daphnia pulex 69
552
932 - 380
59 met All-Against-All
692
1052 - 360
65 met All-Against-All
1244
Total Genes to work with
30
Problem Statement Background Data Methods gtResults
Results
When considering the all the seven species 6
of the genes had high similarity in evolutionary
rates in Anopheles gambiae and the Daphnia (both
Daphnia pulex and Daphnia magna).

Aquatic Lifestyle?
.
.
.
31
Problem Statement Background Data Methods gtResults
Results
  • Now What? - We have Gene IDs
  • See if the genes belong to some gene families?
  • Statistical Tests.
  • GO !
  • What is Gene Ontology?
  • The Gene Ontology project provides a controlled
    vocabulary to describe gene and gene product
    attributes in any organism.

32
Problem Statement Background Data Methods gtResults
Results
.
.
.
.
.
.
33
Problem Statement Background Data Methods Results
gtFuture Work
Future Work/Project
  • Prediction of genes involved in influencing
    Social behavior in Insects.
  • Use the same methodology , the PipeLine
  • The approach would exactly be the same - instead
    of arthropod species with aquatic and terrestrial
    lifestyle, the study will have insect species
    with known social behavioral and non-social
    behavioral traits.

social
non-social
social
34
Problem Statement Background Data Methods Results
Future Work gtReferences
References
Zdobnov EM, von Mering C ,et al. - Comparative
genome and protein analysis of Anopheles gambiae
and Drosophila melanogaster. Dirk Steinke,
Walter Salzburger, Ingo Braasch and Axel Meyer -
Many genes in fish have species specific
asymmetric rates of molecular evolution.\newline
J. W. Kijas,M. Menzies and A.Ingham - Sequence
diversity and rates of molecular evolution
between sheep and cattle genes. Phylogenetic
Inference, Swofford, Olsen, Waddell, and Hillis,
in Molecular Systematics, 2nd ed., Sinauer Ass.,
Inc., 1996, Ch. 11. F. Tajima and M. Nei, Mol.
Biol. Evol. 1984, 1, 269. M. Kimura, J. Mol.
Evol. 1980, 16, 111.4.K. Tamura, Mol. Biol. Evol.
1992, 9, 678. L. Jin and M. Nei, Mol. Biol.
Evol. 1990, 7, 82. M. Kimura, The Neutral Theory
of Molecular Evolution, Camb. Uni. Press, Camb.,
1983.\ Insights into social insects from the
genome of the honeybee Apis mellifera Nature 443,
931-949(26 October 2006).
35

Problem Statement Background Data Methods Results
Future Work gtReferences
References
Alexandre Hassanin (2006). Phylogeny of
Arthropoda inferred from mitochondrial sequences
Strategies for limiting the misleading effects of
multiple changes in pattern and rates of
substitution. Molecular Phylogenetics and
Evolution 38 100 116. Tim Hulsen ,Martijn A
Huynen et al, Benchmarking ortholog
identification methods using functional genomics
data. Joel Savard, Diethard Tautz and Martin J
Lercher., Genome-wide acceleration of protein
evolution in flies(Diptera), BMC Evolutionary
Biology 2006 Cedric Notredame, Desmond Higgins
and Jaap Heringa., T-Coffee A Novel Method for
Fast and Accurate Multiple Sequence Alignment,
JMB 2000 Robinson-Rechavi M, Huchon D., RRTree
Relative-rate tests between groups of sequences
on a phylogenetic tree., Bioinformatics 2000, 16,
296-297. Tim Hulsen, Martijn A Huynen, Jacob de
Vlieg and Peter MA Groenen Benchmarking
ortholog identification methods using functional
genomics data, Genome Biology 2006 Jukes TH,
Cantor CR (1969) Evolution of protein molecules.
in Munro HN (Ed.) Mammalian protein metabolism.
Academic Press, New York 132178-2189.
36
Problem Statement Background Data Methods Results
Future Work References gtAcknowledgements
Acknowledgements
This could not have been possible without the
aid, support, guidance and patience of - John K.
Colbourne Sumit Middha The CGB Staff - The
Bioinformatics Group The Genomics Group Thanks
to Memo Dalkilic and Haixu Tang for their
valuable feedback on the project. Computing
Facilities - CGB Special Thanks to my family and
friends and Professor Edward L Robertson.
37
Thank You
About PowerShow.com