Computational Analyses of Eukaryotic Gene Evolution - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Computational Analyses of Eukaryotic Gene Evolution

Description:

Exon Intron Gene Structure Phenomenon due to exon intron structure Alternative Splicing. ... melanogaster 13451 10305 ... drosophila/ 19967 19931 ... – PowerPoint PPT presentation

Number of Views:401
Avg rating:3.0/5.0
Slides: 55
Provided by: Soura7
Category:

less

Transcript and Presenter's Notes

Title: Computational Analyses of Eukaryotic Gene Evolution


1
Computational Analyses ofEukaryotic Gene
Evolution
  • Sourav Chatterji
  • souravc_at_cs.berkeley.edu
  • May 18, 2006

2
  • White Blood Cells From Cancer-resistant
  • Mice Cure Cancers In Ordinary Mice
  • (Science Daily News. May 9, 2006)

3
  • The cancer-resistant mice all stem from a single
    mouse discovered in 1999.
  • The original studies showed that the resistance
    is inherited.
  • About half of the progeny inherit the resistance.
  • Cui and his colleagues believe that the
    resistance results from a mutation in a single
    gene and are attempting to find it, but that has
    proved frustrating.
  • Mouse genome published in 2002.

4
Understanding the biology of genes
  • Draft Human Genome (2001).
  • 30,000-40,000 genes
  • Finished Human Genome (2004).
  • 20,000-25,000 genes
  • A Third Approach to Gene Prediction Suggests
    Thousands of Additional Human Transcribed
    Regions (Glusman et al. 2006).

5
Understanding the biology of genes
  • Why is it a hard problem?
  • Pseudogenes.
  • Long Introns.
  • Conserved Non Coding Sequences (CNSes).
  • Determination of Gene Boundaries.
  • Alternative Splicing.

6
Understanding the biology of genes
  • what's needed is knowledge about why genes have
    the characteristics they do Pennisi, 2003
  • Studying gene evolution should help.

Illustration Terry E Smith
7
State of Sequencing Projects
  • 25 Mammalian Genomes.
  • Sequencing around H.sapiens.
  • 12 Fly Genomes.
  • Sequencing around D. melanogaster.
  • 5 Worm Genomes.
  • Sequencing around C. elegans.

Source NHGRI
8
(No Transcript)
9
(No Transcript)
10
Outline
  1. Reference Based Annotation.
  2. The GeneMapper Algorithm.
  3. Annotation of Fruitfly Genomes.
  4. Evolution of Gene Structure.

11
Outline
  1. Reference Based Annotation.
  2. The GeneMapper Algorithm.
  3. Annotation of Fruitfly Genomes.
  4. Evolution of Gene Structure.

12
Source http//rana.lbl.gov/drosophila/
13
Species ESTs mRNAs RefSeq
D. melanogaster 383407 19931 19967
D. simulans 5013 80 None
D. yakuba 11015 808 None
D. erecta None 6 None
D. ananassae None 11 None
D. pseudoobscura 35042 40 None
D. mojavensis 361 2 None
D. virilis 663 41 None
D. grimshawi None None None
Source UCSC browser
14
Reference Based Annotation
  • How to accurately annotate the newly sequenced
    genomes?
  • Transfer annotations from a well-studied
    reference genome.
  • Implicitly creates a data set for studying the
    evolution of genes.

15
Protein Alignment Approach
Reference Protein
Genomic Sequence
  • Procrustes Gelfand at al. 1996
  • GeneWise Birney at al. 2004
  • Integral part of the ENSEMBL gene annotation
    pipeline.
  • Not aware of exon/intron boundaries.
  • Accuracy decreases when sequence identity is low.

16
Similarity Based Approach
Reference Gene
Target Sequence
  • Projector Meyer and Durbin 2004
  • Predicts the global gene structure using a pair
    HMM.
  • Uses heuristics to decrease the search space.
  • GeneMapper
  • Uses a bottom up algorithm for predicting the
    gene structure.
  • Not suitable if the exon/intron structure of the
    gene has diverged a lot.

17
Outline
  1. Reference Based Annotation.
  2. The GeneMapper Algorithm.
  3. Annotation of Fruitfly Genomes.
  4. Evolution of Gene Structure.

18
The GeneMapper Algorithm
  • Bottom Up Algorithm
  • Predict the ortholog of each reference exon in
    the target sequence.
  • Join exon predictions together to predict gene
    structure.
  • Multiple Species GeneMapper
  • Uses all available information if the gene has to
    be mapped into multiple target species.
  • Uses a profile based approach to get more
    accurate annotations in evolutionary distant
    species.

19
(No Transcript)
20
Mapping Exons Accurately
  • Predicting the ortholog of a reference exon in
    the target sequence
  • Accurately model the evolution of exons.
  • Use StrataSplice to model splice sites.

21
Mapping Exons Accurately
  • Use version of Smith Waterman algorithm.
  • Exact Optimization feasible.
  • Green edges to model the evolution of codons.
  • Uses 64 64 COD distance matrices.
  • Red edges to allow for frameshifts.

22
Multiple Species GeneMapper
  • Generates a gene profile of orthologous genes.
  • A more complete characterization than a single
    gene.
  • Each column contains an alignment of
    orthologous codons.
  • Special columns of width 1 are allowed to
    account for frameshifts and sequencing errors.

23
Multiple Species GeneMapper
  • Iteratively builds a gene profile to capture the
    characteristics of the gene.
  • The profile helps us map exons more accurately to
    evolutionary distant species.
  • ExonAligner is modified to align the profile with
    the target sequence.
  • Uses different models for conserved and
    non-conserved codons.

24
Exploiting Phylogeny Species Hopping
25
Exploiting Phylogeny Species Hopping
Map gene into closest species
26
Exploiting Phylogeny Species Hopping
Map gene into closest species
27
Exploiting Phylogeny Species Hopping
Add the prediction to the profile
28
Exploiting Phylogeny Species Hopping
Use profile to map gene into the next closest
species
29
GeneMapper Performance
GeneWise Projector GeneMapper
Gene Sn. 61.3 59.9 81.7
Gene Sp. 60.8 59.5 81.7
Exon Sn. 92.8 94.2 97.2
Exon Sp. 93.4 90.5 97.8
Nucl Sn. 99.86 99.78 99.88
Nucl Sp. 99.91 99.70 99.94
30
GeneMapper Performance
GeneWise Projector GeneMapper
Gene Sn. 61.3 59.9 81.7
Gene Sp. 60.8 59.5 81.7
Exon Sn. 92.8 94.2 97.2
Exon Sp. 93.4 90.5 97.8
Nucl Sn. 99.86 99.78 99.88
Nucl Sp. 99.91 99.70 99.94
31
Sources of Errors
  • Highly Divergent Exons
  • Exon Splitting
  • Improved handling in latest version.
  • Assembly and Sequencing Errors

32
Outline
  1. Reference Based Annotation.
  2. The GeneMapper Algorithm.
  3. Annotation of Fruitfly Genomes.
  4. Evolution of Gene Structure.

33
Annotating the Fruitfly Genomes
34
The Fruitfly Genomes
35
(No Transcript)
36
The Annotation pipeline
  • Construct whole genome homology maps of nine
    fruitfly genomes by using Mercator Dewey and
    Pachter, unpublished
  • Extend the map using extrapolation.
  • Use the map as a guide to transfer D.
    melanogaster RefSeq annotations by using
    GeneMapper.

37
Generating Homology Maps
Waterston et al. , 2002
38
Issues With Homology Maps
  • Gene duplications
  • Distinguishing between paralogs and orthologs.
  • Incomplete coverage.
  • Low sequence identity.
  • Insufficient Anchor Coverage
  • Micro-rearrangements.

39
(No Transcript)
40
Fruitfly Annotations
41
Fruitfly Annotations
  • Annotations of the 11 fruitfly genomes
  • http//bio.math.berkeley.edu/genemapper/CAF1_genes
    _v0.2
  • Browsable on the UCSC browser
  • http//bio.math.berkeley.edu/genemapper/fruitfly.h
    tml
  • Gene Alignments
  • http//bio.math.berkeley.edu/genemapper/CAF1_aln/

42
Annotation Statistics
Species Transcripts Unique Complete
D. melanogaster 19697 13488 N/A
D. simulans 18274 12353 17074
D. yakuba 18551 12594 17614
D. erecta 18700 12682 18203
D. ananassae 17398 11561 15858
D. pseudoobscura 16651 10867 14595
D. mojavensis 15908 10214 13192
D. virilis 16032 10305 13451
D. grimshawi 15700 10063 13107
43
Outline
  1. Reference Based Annotation.
  2. The GeneMapper Algorithm.
  3. Annotation of Fruitfly Genomes.
  4. Evolution of Gene Structure.

44
Exon Intron Gene Structure
  • Phenomenon due to exon intron structure
  • Alternative Splicing.
  • Regulation through UTRs.
  • Nonsense Mediated Decay.
  • Diversity in Gene Structure
  • Prokaryotic genes are intronless.
  • Tremendous Diversity within Eukaryotes.
  • Might have been responsible for the formation of
    nucleus Martin and Koonin, 2006.

45
Evolution of Gene Structure
  • Introns early Theory
  • Introns lost in prokaryotic evolution.
  • Introns late Theory
  • Spliceosomal introns invented during eukaryotic
    evolution.
  • Reality probably in middle.
  • Most introns are fairly new.
  • Eukaryotic ancestor had spliceosomal mechanisms.

46
Evolution of Gene Structure
  • Conservation of intron positions among various
    eukaryotic cladesRogozin et al. 2003
  • Both loss and gain of introns in various
    eukaryotic lineages Rogozin et al. 2003
  • Intron preferentially lost near 3 ends of genes
    Roy and Gilbert, 2005
  • Excess of 5 introns in eukaryotic genomes Lin
    and Zhang, 2005

47
Mechanisms of Intron Gain
  • Duplication Theory Tarrio et al. 1998.
  • New Introns are formed by duplication of existing
    introns.
  • Transposons Theory Crick 1979
  • Novel Introns arise by insertion of transposons.

48
Mechanisms of Intron Loss
  • Recombination Theory Bernstein et al., 1983
  • Recombination of reverse transcribed mRNA
    transcript with the genome results in the loss of
    introns.
  • Predicts intron loss at 3 end.
  • Deletion Theory Kent and Zahler, 2000
  • Intron loss by genomic deletion.
  • Predicts inexact intron loss.

49
Finding Intron Gain/Loss
  • Two criteria for detecting credible intron gain
    Logsdin et al. 1998
  • Strong Phylogenetic Support.
  • Source of Intron DNA should be identifiable.
  • Hard to satisfy this criteria as there are few
    constraints on intron evolution.

50
Finding Intron Gain/Loss
The Melanogaster Group
51
Finding Intron Gain/Loss
52
Finding Intron Gain/Loss
53
Finding Intron Gain/Loss
Present
Absent
Present
54
Finding Intron Gain/Loss
Absent
Absent
Present
55
Finding Intron Gain/Loss
  • Determination of Intron Gain
  • GeneMapper allows for a single inserted intron in
    a exon.
  • Determination of Intron Loss
  • Minimum intron length for maintaining splicing
    reaction.

56
Finding Intron Gain/Loss
  • Used GeneMapper to search for lost/inserted
    introns in all FlyBase genes.
  • Found 231 inserted introns.
  • Found 105 deleted introns.

57
Lengths of Inserted Introns
58
Phases of Inserted Introns
59
Mechanisms of Intron Loss
  • Recombination Hypothesis
  • Predicts that intron loss should occur at the 3
    end.
  • Testing the hypothesis
  • 53 of the 105 lost introns are either the last or
    penultimate intron at the 3 end.
  • 82 of 105 lost introns are in the 3 third.
  • 94 of 105 lost introns are in the 3 half.

60
Mechanisms of Intron Loss
  • Deletion Hypothesis
  • Predicts inexact intron loss.
  • Testing the hypothesis
  • Look at gene alignments around fusion events.
  • There doesnt appear to be many intron losses
    through inexact deletions.

61
Mechanisms of Intron Gain
  • Duplication Hypothesis
  • Introns formed by duplication of existing
    introns.
  • Testing the hypothesis
  • Use BLAST to search for matches with existing
    Melanogaster introns.
  • No duplications found.

62
Mechanisms of Intron Gain
  • Transposon Hypothesis
  • Testing the hypothesis
  • Searched for TEs using RepeatMasker.
  • 3 of the 231 new introns were found to be
    transposons.
  • It seems that even though some introns are formed
    by insertion of TEs into genes, this mechanism is
    used very rarely.

63
Conclusions
  • Large scale sequencing projects provide an
    unprecedented opportunity to study genome
    evolution.
  • We have developed computational tools to study
    genome evolution.
  • These tools can be used to prove or disprove
    theories about gene evolution.
Write a Comment
User Comments (0)
About PowerShow.com