Title: Identification of Transposable Elements Using Multiple Alignments of Related Genomes
1Identification of Transposable Elements Using
Multiple Alignments of Related Genomes
- I690 Project Presentation
- By Yin Wu
2Transposable Elements
- Transposable Elements (TE) are the chief cause of
gapped regions in up to 10 of currently
sequenced genomes. - TE causes repeated alignment gaps in multiple
genome alignment.
3Multiple Alignments Between Related Genomes
Consider a speciation event causing the recent
divergence of genomes S1 and S2. We expect to see
some gaps in the alignment due to small
insertions and deletions. Those long and repeated
gaps are likely to be TEs. We call these gaps
Repeated Insertion Regions (RIR). On the other
hand, RIRs are the traces of TEs. By aligning
genomes of related species, it is possible to
identify TEs.
4Previous Work
- Anat Caspi and Lior Pachter1 compared the genomes
of four fruit fly species. - They located most of the (currently annotated)
TEs in the RIR. - They Identified new instances of TE for
known/unknown TE families.
1Anat Caspi and Lior Pachter, Indentification of
transposable elements using multiple alignments
of related genomes.
5Previous Work (contd)
Conserved Region
Insertion Region (gap)
Annotated TEs
6Previous Work (method)
- Multiple alignment of homologous regions of
related genomes to find Insertion Regions (IR) - Local alignment of each set of IRs to find
Repeated Insertion Regions (RIR) - Filter and assemble RIRs.
- Compare the RIRs against the BDBP1 natural TE
annotation set.
1http//www.fruitfly.org/p_disrupt/TE.html
7Previous Work (limitation)
- TE may be partially aligned to random sequence
fragments by traditional multiple alignment
method. - Multiple alignment methods are less tolerant to
long insertions events than Hidden Markov Model
(HMM). - Therefore, pairwise HMM may report more complete
RIR than multiple alignment does.
8This Project V.S. Previous Work
- This Project
- Align homologous regions of each pair of genomes
using pair-wise HMM1. - Compare the pair-wise alignments to find
consensus gaps (IR). - Local alignment of each set of IRs to find
Repeated Insertion Regions (RIR) - Filter and assemble RIRs.
- Compare the RIRs against the BDBP natural TE
annotation set.
- Prvious Work
- Multiple alignment of homologous regions of
related genomes to find Insertion Regions (IR) - Local alignment of each set of IRs to find
Repeated Insertion Regions (RIR) - Filter and assemble RIRs.
- Compare the RIRs against the BDBP1 natural TE
annotation set.
1Provided by Dr. Haixu Tang
9Input and Output
- Input
- Aligned syntenic regions of the genomes of four
species of drosophila. - BDBP natural TE annotation set.
- Output
- RIR of the genomes
- Data Analysis
- BDBP TE coverage of the RIR set. (i.e. how much
percent of the BDBP TE are covered by the RIR
set.)
10Method
- Align homologous regions of each pair of genomes
using pair-wise HMM. - Compare the pair-wise alignments to find
consensus gaps (IR). - Local alignment of each set of IRs to find
Repeated Insertion Regions (RIR) - Filter and assemble RIRs.
- Compare the RIRs against the BDBP natural TE
annotation set.
11Filter and assemble RIRs (some details)
Micro-satellite (NOT TE)
Tandem Repeats
Nested Repeats
Concatenated Repeats
12Filter and assemble RIRs (contd)
- Micro-satellite regions Short (lt20 bp) repeats
with close and sequential hits to self. - Tandem repeats Long (gt30 bp) repeats which
sequentially align to both self and to
subcomponents in other IRs.
13Filter and assemble RIRs (contd)
- Nested repeats Long non-overlapping (gt30 bp)
that sequentially align to other IRs, where there
is no intersection between the set of IRs to
which each subcomponent aligned. - Concatenated repeats IRs within a certain
genomic distance (lt700 bp) that align
sequentially to other insertion regions.
14Thanks to