Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population - PowerPoint PPT Presentation

About This Presentation
Title:

Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population

Description:

One Subtree-Prune-Regraft (SPR) Event. Recombination: simulated by SPR. The rest of two trees (without pruned subtrees) remain the same ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 18
Provided by: ywu5
Category:

less

Transcript and Presenter's Notes

Title: Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population


1
Inferring Local Tree Topologies for SNP Sequences
Under Recombination in a Population
  • Yufeng Wu
  • Dept. of Computer Science and Engineering
  • University of Connecticut, USA

2
Genetic Variations
Sites
AATGTAGCCGA AATATAACCTA AATGTAGCCGT AATGTAACCTA CA
TATAGCCGT
AATGTAGCCGA AATATAACCTA AATGTAGCCGT AATGTAACCTA CA
TATAGCCGT
Each SNP induces a split
DNA sequences
  • Single-nucleotide polymorphism (SNP) a site
    (genomic location) where two types of nucleotides
    occur frequently in the population.
  • Haplotype, a binary vector of SNPs (encoded as
    0/1).
  • Haplotypes offer hints on genealogy.

3
Genealogy Evolutionary History of Genomic
Sequences
  • Tells how individuals in a population are
    related
  • Helps to explain diseases disease mutations
    occur on branches and all descendents carry the
    mutations
  • Problem How to determine the genealogy for
    unrelated individuals?
  • Complicated by recombination

Diseased (case)
Healthy (control)
Individuals in current population
4
Recombination
  • One of the principle genetic forces shaping
    sequence variations within species
  • Two equal length sequences generate a third new
    equal length sequence in genealogy
  • Spatial order is important different parts of
    genome inherit from different ancestors.

110001111111001
000110000001111
5
Ancestral Recombination Graph (ARG)
Mutations
Recombination
10
01
00
10
01
00
11
S1 00 S2 01 S3 10 S4 11
Assumption At most one mutation per site
S1 00 S2 01 S3 10 S4 10
6
Local Trees
ARG
  • ARG represents a set of local trees.
  • Each tree for a continuous genomic region.
  • No recombination between two sites ? same local
    trees for the two sites
  • Local tree topology informative and useful

Local tree near sites 1 and 2
Local tree near site 2
Local tree to the right of site 3
7
Inference of Local Tree Topologies
  • Question given SNP haplotypes, infer local tree
    topologies (one tree for each SNP site, ignore
    branch length)
  • Hein (1990, 1993)
  • Enumerate all possible tree topologies at each
    site
  • Song and Hein (2003,2005)
  • Parsimony-based
  • Local tree reconstruction can be formulated as
    inference on a hidden Markov model.

8
Local Tree Topologies
  • Key technical difficulty
  • Brute-force enumeration of local tree topologies
    not feasible when number of sequences gt 9
  • Can not enumerate all tree topologies
  • Trivial solution create a tree for a SNP
    containing the single split induced by the SNP.
  • Always correct (assume one mutation per site)
  • But not very informative need more refined trees!

A 0 B 0 C 1 D 0 E 1 F 0 G 1 H 0
9
How to do better? Neighboring Local Trees are
Similar!
  • Nearby SNP sites provide hints!
  • Near-by local trees are often topologically
    similar
  • Recombination often only alters small parts of
    the trees
  • Key idea reconstructing local trees by combining
    information from multiple nearby SNPs

10
RENT REfining Neighboring Trees
  • Maintain for each SNP site a (possibly
    non-binary) tree topology
  • Initialize to a tree containing the split induced
    by the SNP
  • Gradually refining trees by adding new splits to
    the trees
  • Splits found by a set of rules (later)
  • Splits added early may be more reliable
  • Stop when binary trees or enough information is
    recovered

11
A Little Background Compatibility
1 2 3 4 5
a b c d e f g
0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 1 1 0
0 0 1 1 0 1 0 0 1 0 1
Sites 1 and 2 are compatible, but 1 and 3 are
incompatible.
M
  • Two sites (columns) p, q are incompatible if
    columns p,q contains all four ordered pairs
    (gametes) 00, 01, 10, 11. Otherwise, p and q are
    compatible.
  • Easily extended to splits.
  • A split s is incompatible with tree T if s is
    incompatible with any one split in T. Two trees
    are compatible if their splits are pairwise
    compatible.

12
Fully-Compatible Region Simple Case
  • A region of consecutive SNP sites where these
    SNPs are pairwise compatible.
  • May indicate no topology-altering recombination
    occurred within the region
  • Rule for site s, add any such split to tree at
    s.
  • Compatibility very strong property and unlikely
    arise due to chance.

13
Split Propagation More General Rule
  • Three consecutive sites 1,2 and 3. Sites 1 and 2
    are incompatible. Does site 3 matter for tree at
    site 1?
  • Trees at site 1 and 2 are different.
  • Suppose site 3 is compatible with sites 1 and 2.
    Then?
  • Site 3 may indicate a shared subtree in both
    trees at sites 1 and 2.
  • Rule a split propagates to both directions until
    reaching a incompatible tree.

14
Unique Refinement
  • Consider the subtree with leaves 1,2 and 3.
  • Which refinement is more likely?
  • Add split of 1 and 2 the only split that is
    compatible with neighboring T2.
  • Rule refine a non-binary node by the only
    compatible split with neighboring trees

15
One Subtree-Prune-Regraft (SPR) Event
  • Recombination simulated by SPR.
  • The rest of two trees (without pruned subtrees)
    remain the same
  • Rule find identical subtree Ts in neighboring
    trees T1 and T2, s.t. the rest of T1 and T2 (Ts
    removed) are compatible. Then joint refine T1- Ts
    and T2- Ts before adding back Ts.

Subtree to prune
More complex rules possible.
16
Simulation
  • Hudsons program MS (with known coalescent local
    tree topologies) 100 datasets for each settings.
  • Data much larger and perform better or similarly
    for small data than Song and Heins method.
  • Test local tree topology recovery scored by Song
    and Heins shared-split measure

? 15
? 50
17
Acknowledgement
  • Software available upon request.
  • More information available at http//www.engr.uco
    nn.edu/ywu
  • I want to thank
  • Yun S. Song
  • Dan Gusfield
Write a Comment
User Comments (0)
About PowerShow.com