Identification of Transposable Elements Using Multiple Alignments of Related Genomes - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Identification of Transposable Elements Using Multiple Alignments of Related Genomes

Description:

Anat Caspi and Lior Pachter1 compared the genomes of four fruit fly species. ... 1Anat Caspi and Lior Pachter, 'Indentification of transposable elements ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 15
Provided by: PIE69
Category:

less

Transcript and Presenter's Notes

Title: Identification of Transposable Elements Using Multiple Alignments of Related Genomes


1
Identification of Transposable Elements Using
Multiple Alignments of Related Genomes
  • I690 Project Presentation
  • By Yin Wu

2
Transposable Elements
  • Transposable Elements (TE) are the chief cause of
    gapped regions in up to 10 of currently
    sequenced genomes.
  • TE causes repeated alignment gaps in multiple
    genome alignment.

3
Multiple Alignments Between Related Genomes
Consider a speciation event causing the recent
divergence of genomes S1 and S2. We expect to see
some gaps in the alignment due to small
insertions and deletions. Those long and repeated
gaps are likely to be TEs. We call these gaps
Repeated Insertion Regions (RIR). On the other
hand, RIRs are the traces of TEs. By aligning
genomes of related species, it is possible to
identify TEs.
4
Previous Work
  • Anat Caspi and Lior Pachter1 compared the genomes
    of four fruit fly species.
  • They located most of the (currently annotated)
    TEs in the RIR.
  • They Identified new instances of TE for
    known/unknown TE families.

1Anat Caspi and Lior Pachter, Indentification of
transposable elements using multiple alignments
of related genomes.
5
Previous Work (contd)
Conserved Region
Insertion Region (gap)
Annotated TEs
6
Previous Work (method)
  • Multiple alignment of homologous regions of
    related genomes to find Insertion Regions (IR)
  • Local alignment of each set of IRs to find
    Repeated Insertion Regions (RIR)
  • Filter and assemble RIRs.
  • Compare the RIRs against the BDBP1 natural TE
    annotation set.

1http//www.fruitfly.org/p_disrupt/TE.html
7
Previous Work (limitation)
  • TE may be partially aligned to random sequence
    fragments by traditional multiple alignment
    method.
  • Multiple alignment methods are less tolerant to
    long insertions events than Hidden Markov Model
    (HMM).
  • Therefore, pairwise HMM may report more complete
    RIR than multiple alignment does.

8
This Project V.S. Previous Work
  • This Project
  • Align homologous regions of each pair of genomes
    using pair-wise HMM1.
  • Compare the pair-wise alignments to find
    consensus gaps (IR).
  • Local alignment of each set of IRs to find
    Repeated Insertion Regions (RIR)
  • Filter and assemble RIRs.
  • Compare the RIRs against the BDBP natural TE
    annotation set.
  • Prvious Work
  • Multiple alignment of homologous regions of
    related genomes to find Insertion Regions (IR)
  • Local alignment of each set of IRs to find
    Repeated Insertion Regions (RIR)
  • Filter and assemble RIRs.
  • Compare the RIRs against the BDBP1 natural TE
    annotation set.

1Provided by Dr. Haixu Tang
9
Input and Output
  • Input
  • Aligned syntenic regions of the genomes of four
    species of drosophila.
  • BDBP natural TE annotation set.
  • Output
  • RIR of the genomes
  • Data Analysis
  • BDBP TE coverage of the RIR set. (i.e. how much
    percent of the BDBP TE are covered by the RIR
    set.)

10
Method
  • Align homologous regions of each pair of genomes
    using pair-wise HMM.
  • Compare the pair-wise alignments to find
    consensus gaps (IR).
  • Local alignment of each set of IRs to find
    Repeated Insertion Regions (RIR)
  • Filter and assemble RIRs.
  • Compare the RIRs against the BDBP natural TE
    annotation set.

11
Filter and assemble RIRs (some details)
Micro-satellite (NOT TE)
Tandem Repeats
Nested Repeats
Concatenated Repeats
12
Filter and assemble RIRs (contd)
  • Micro-satellite regions Short (lt20 bp) repeats
    with close and sequential hits to self.
  • Tandem repeats Long (gt30 bp) repeats which
    sequentially align to both self and to
    subcomponents in other IRs.

13
Filter and assemble RIRs (contd)
  • Nested repeats Long non-overlapping (gt30 bp)
    that sequentially align to other IRs, where there
    is no intersection between the set of IRs to
    which each subcomponent aligned.
  • Concatenated repeats IRs within a certain
    genomic distance (lt700 bp) that align
    sequentially to other insertion regions.

14
Thanks to
  • Dr. Haixu Tang
Write a Comment
User Comments (0)
About PowerShow.com