Title: Inference of Complex Genealogical Histories In Populations and Application in Mapping Complex Traits
1 Inference of Complex Genealogical Histories In Populations and Application in Mapping Complex Traits
Dept. of Computer Science and Engineering
University of Connecticut
2 Genealogy Evolutionary History of Genomic Sequences
Tells how sequences in a population are related
Helps to explain diseases disease mutations occur on branches and all descendents carry the mutations
Genealogy unknown. Only have SNP haplotypes (binary sequences).
Problem Inference of genealogy for unrelated haplotypes
Not easy partly due to recombination
Diseased (case) Healthy (control) Sequences in current population 3 Recombination
One of the principle genetic forces shaping sequence variations within species
Two equal length sequences generate a third new equal length sequence in genealogy
Spatial order is important different parts of genome inherit from different ancestors.
110001111111001 1100 00000001111 000110000001111 4 Ancestral Recombination Graph (ARG) Mutations Recombination 10 01 00 10 01 00 11 S1 00 S2 01 S3 10 S4 11 Assumption At most one mutation per site S1 00 S2 01 S3 10 S4 10 5 What is the Use of an ARG May look at the ARG directly. But for noisy data another way of using ARGs an ARG represents a set of local trees! Data 0000 0101 0110 1110 1010 0000 0000 0100 0010 Local trees evolutionary history for different genomic regions between recombination breakpoints. 1010 0110 0101 1010 0000 0110 1110 6 At which Local Tree Did Disease Mutations Occur
Clear separation of cases/controls not expected for complex diseases
Case Control 7 How to infer ARGs
But we do not know the true ARG!
Goal infer ARGs from haplotypes
First practical ARG association mapping method (Minichiello and Durbin 2006)
Use plausible ARGs heuristic
Less complex disease model implicitly assume one disease mutation with major effects.
My results (Wu RECOMB 2007)
Generate ARGs with a provable property and works on a well-defined complex disease model
Focus on parsimonious history
8 Simulation Results (Wu 2007)
TMARG/MARGARITA sample ARGs decompose to local trees and look for association signals.
LATAG infer local trees at focal points.
Average mapping error for 50 simulated datasets from Zollner and Pritchard
Comparison TMARG (minARGs) TMARG (near minARGs) LATAG (Z. P.) MARGARITA (M. D.). TMARG (my program) and MARGRITA are much faster than LATAG. 9 Preliminary Results GAW16 Data
GAW16 data from the North American Rheumatoid Arthritis Consortium (NARAC) 868 cases and 1194 controls. Chromosome one 40929 SNPs.
Running TMARG on large-scale data
Break into non-overlapping windows
Run fastPHASE (Scheet and Stephens 06) to obtain haplotypes
Run TMARG with Chi-square mode
Caution more investigation needed. 10 A Related ProblemInference of Local Tree Topologies Directly (Wu 2008 Submitted) 11 Inference of Local Tree Topologies
Recall ARG represents a set of local trees.
Question given SNP haplotypes infer local tree topologies (one tree for each SNP site ignore branch length)
Hein (1990 1993)
Song and Hein (20032005) enumerate all possible tree topologies at each site
12 Local Tree Topologies
Key technical difficulty enumerate all tree topologies
Brute-force enumeration of local tree topologies not feasible when number of sequences gt 9
Trivial solution create a tree for a SNP containing the single split induced by the SNP.
Always correct (assume one mutation per site)
But not very informative need more refined trees!
A 0 B 0 C 1 D 0 E 1 F 0 G 1 H 0 13 How to do better Neighboring Local Trees are Similar!
Nearby SNP sites provide hints!
Near-by local trees are often topologically similar
Recombination often only alters small parts of the trees
Key idea reconstruct local trees by combining information from multiple nearby SNPs
14 RENT REfining Neighboring Trees
Maintain for each SNP site a (possibly non-binary) tree topology
Initialize to a tree containing the split induced by the SNP
Gradually refining trees by adding new splits to the trees
Splits found by a set of rules (later)
Splits added early may be more reliable
Stop when binary trees or enough information is recovered
15 A Little Background Compatibility 1 2 3 a b c d e 0 0 0 1 0 0 0 0 1 1 0 1 0 1 1 Sites 1 and 2 are compatible but 1 and 3 are incompatible. M
Two sites (columns) p q are incompatible if columns pq contains all four ordered pairs (gametes) 00 01 10 11. Otherwise p and q are compatible.
Easily extended to splits.
A split s is incompatible with tree T if s is incompatible with any one split in T. Two trees are compatible if their splits are pairwise compatible.
16 Fully-Compatible Region Simple Case
A region of consecutive SNP sites where these SNPs are pairwise compatible.
May indicate no topology-altering recombination occurred within the region
Rule for site s add any such split to tree at s.
Compatibility very strong property and unlikely arise due to chance.
17 Split Propagation More General Rule
Three consecutive sites 12 and 3. Sites 1 and 2 are incompatible. Does site 3 matter for tree at site 1
Trees at site 1 and 2 are different.
Suppose site 3 is compatible with sites 1 and 2. Then
Site 3 may indicate a shared subtree in both trees at sites 1 and 2.
Rule a split propagates to both directions until reaching a incompatible tree.
18 One Subtree-Prune-Regraft (SPR) Event
Recombination simulated by SPR.
The rest of two trees (without pruned subtrees) remain the same
Rule find compatible subtree Ts in neighboring trees T1 and T2 s.t. the rest of T1 and T2 (Ts removed) are compatible. Then joint refine T1- Ts and T2- Ts before adding back Ts.
More complex rules possible. 19 Simulation
Hudsons program MS (with known coalescent local tree topologies) 100 datasets for each settings.
Data much larger and perform better or similarly for small data than Song and Heins method.
Test local tree topology recovery scored by Song and Heins shared-split measure
15 50 20 Acknowledgement
More information available at http//www.engr.uco nn.edu/ywu
I want to thank
Yun S. Song
And National Science Foundation and UConn Research Foundation
PowerShow.com is a leading presentation/slideshow sharing website. Whether your application is business, how-to, education, medicine, school, church, sales, marketing, online training or just for fun, PowerShow.com is a great resource. And, best of all, most of its cool features are free and easy to use.
You can use PowerShow.com to find and download example online PowerPoint ppt presentations on just about any topic you can imagine so you can learn how to improve your own slides and presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!
For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!