CSE 291: Advanced Topics in Computational Biology L3: population substructure admixture mapping - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

CSE 291: Advanced Topics in Computational Biology L3: population substructure admixture mapping

Description:

Q: Is this a chance event, or is there selection for this haplotype. Coalescent application ... Check size of haplotype blocks. Does it vary when migrations are ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 33
Provided by: vineet50
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: CSE 291: Advanced Topics in Computational Biology L3: population substructure admixture mapping


1
CSE 291 Advanced Topics in Computational
BiologyL3 population sub-structure/admixture
mapping
  • Vineet Bafna/Pavel Pevzner

www.cse.ucsd.edu/classes/sp05/cse291-a
2
Ancestral Recombination Graph
3
Review Coalescent theory applications
  • Coalescent simulations allow us to test various
    hypothesis. The coalescent/ARG is usually not
    inferred, unlike in phylogenies.

4
Coalescent theory example
  • Ex 1400bp at Sod locus in Dros.
  • 10 taxa
  • 5 were identical. The other 5 had 55 mutations.
  • Q Is this a chance event, or is there selection
    for this haplotype.

5
Coalescent application
  • 10000 coalescent simulations were performed on 10
    taxa.
  • 55 mutations on the coalescent branches
  • Count the number of times 5 lineages are
    identical
  • The event happened in 1.1 of the cases.
  • Conclusion selection, or some other mechanism
    explains this data.

6
Coalescent example Out of Africa hypothesis
  • Looking at lineage specific mutations might help
    discard the candelabra model. How?
  • How do we decide between the multi-regional and
    Out-of-Africa model? How do we decide if the
    ancestor was African?

7
Coalescent simulation
  • Example 2
  • Sample from a region. What is the recombination
    rate in this region?
  • Gabriel et al. Science 2002.
  • 3 populations were sampled at multiple regions
    spanning the genome
  • 54 regions (Average size 250Kb)
  • SNP density 1 over 2Kb
  • 90 Individuals from Nigeria (Yoruban)
  • 93 Europeans
  • 42 Asian
  • 50 African American

8
Population specific recombination
  • D was used as the measure between SNP pairs.
  • SNP pairs were in one of the following
  • Strong LD
  • Strong evidence for recombination
  • Others (13 of cases)
  • Can this be used for Out-Of-Africa hypothesis?

Gabriel et al., Science 2002
9
Haplotype Blocks
  • A haplotype block is a region of low
    recombination.
  • Define a region as a block if less than 5 of the
    pairs show strong recombination
  • Much of the genome is in blocks.
  • Distribution of block sizes vary across
    populations.

10
Testing Out-of-Africa
  • Generate simulations with and without migration.
  • Check size of haplotype blocks.
  • Does it vary when migrations are allowed?
  • When the new population has a bottleneck?
  • If there were a bottleneck that created European
    and Asian populations, can we say anything about
    frequency of alleles that are African specific?
  • Should they be high frequency, or low frequency
    in African populations?

11
Haplotype Block implications
  • The genome is mostly partitioned into haplotype
    blocks.
  • Within a block, there is extensive LD.
  • Is this good, or bad, for association mapping?

12
Population Sub-structure
13
Population sub-structure can increase LD
  • Consider two populations that were isolated and
    evolving independently.
  • They might have different allele frequencies in
    some regions.
  • Pick two regions that are far apart (LD is very
    low, close to 0)

14
Recent ad-mixing of population
  • If the populations came together recently (Ex
    African and European population), artificial LD
    might be created.
  • D 0.15 (instead of 0.01), increases 10-fold
  • This spurious LD might lead to false associations
  • Other genetic events can cause LD to arise, and
    one needs to be careful

0 .. 1 0 .. 1 0 .. 0 1 .. 1 0 .. 1 0 .. 1 0 ..
1 0 .. 1 0 .. 1
Pop. AB
p10.5 q10.5 P110.1 D0.1-0.250.15
1 .. 0 1 .. 0 0 .. 0 1 .. 1 1 .. 0 1 .. 0 1 ..
0 1 .. 0 1 .. 0
15
Determining population sub-structure
  • Given a mix of people, can you sub-divide them
    into ethnic populations.
  • Turn the problem of spurious LD into a clue.
  • Find markers that are too far apart to show LD
  • If they do show LD (correlation), that shows the
    existence of multiple populations.
  • Sub-divide them into populations so that LD
    disappears.

16
Determining Population sub-structure
  • Same example as before
  • The two markers are too similar to show any LD,
    yet they do show LD.
  • However, if you split them so that all 0..1 are
    in one population and all 1..0 are in another, LD
    disappears

17
Iterative algorithm for population sub-structure
  • Define
  • N number of individuals (each has a single
    chromosome)
  • k number of sub-populations.
  • Z ? 1..kN is a vector giving the
    sub-population.
  • Zik gt individual i is assigned to population
    k
  • Xi,j allelic value for individual i in position
    j
  • Pk,j,l frequency of allele l at position j in
    population k

18
Example
  • Ex consider the following assignment
  • P1,1,0 0.9
  • P2,1,0 0.1

0 .. 1 0 .. 1 0 .. 0 1 .. 1 0 .. 1 0 .. 1 0 ..
1 0 .. 1 0 .. 1 0 .. 1
1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
1 .. 0 1 .. 0 0 .. 0 1 .. 1 1 .. 0 1 .. 0 1 ..
0 1 .. 0 1 .. 0 1 .. 0
19
Goal
  • X is known.
  • P, Z are unknown.
  • The goal is to estimate Pr(P,ZX)
  • Various learning techniques can be employed.
  • maxP,Z Pr(XP,Z) (Max likelihood estimate)
  • maxP,Z Pr(XP,Z) Pr(P,Z) (MAP)
  • Sample P,Z from Pr(P,ZX)
  • Here a Bayesian (MCMC) scheme is employed to
    sample from Pr(P,ZX). We will only consider a
    simplified version

20
AlgorithmStructure
  • Iteratively estimate
  • (Z(0),P(0)), (Z(1),P(1)),.., (Z(m),P(m))
  • After convergence, Z(m) is the answer.
  • Iteration
  • Guess Z(0)
  • For m 1,2,..
  • Sample P(m) from Pr(P X, Z(m-1))
  • Sample Z(m) from Pr(Z X, P(m))
  • How is this sampling done?

21
Example
  • Choose Z at random, so each individual is
    assigned to be in one of 2 populations. See
    example.
  • Now, we need to sample P(1) from Pr(P X, Z(0))
  • Simply count
  • Nk,j,l number of people in pouplation k which
    have allele l in position j
  • pk,j,l Nk,j,l / N

0 .. 1 0 .. 1 0 .. 0 1 .. 1 0 .. 1 0 .. 1 0 ..
1 0 .. 1 0 .. 1 0 .. 1
1 2 2 1 1 2 1 2 1 2 1 2 2 1 1 2 1 2 2 1
1 .. 0 1 .. 0 0 .. 0 1 .. 1 1 .. 0 1 .. 0 1 ..
0 1 .. 0 1 .. 0 1 .. 0
22
Example
  • Nk,j,l number of people in population k which
    have allele l in position j
  • pk,j,l Nk,j,l / Nk,j,
  • N1,1,0 4
  • N1,1,1 6
  • p1,1,0 4/10
  • p1,2,0 4/10
  • Thus, we can sample P(m)

0 .. 1 0 .. 1 0 .. 0 1 .. 1 0 .. 1 0 .. 1 0 ..
1 0 .. 1 0 .. 1 0 .. 1
1 2 2 1 1 2 1 2 1 2 1 2 2 1 1 2 1 2 2 1
1 .. 0 1 .. 0 0 .. 0 1 .. 1 1 .. 0 1 .. 0 1 ..
0 1 .. 0 1 .. 0 1 .. 0
23
Sampling Z
  • PrZ1 1 Pr01 belongs to population 1?
  • We know that each position should be in linkage
    equilibrium and independent.
  • Pr01 Population 1 p1,1,0 p1,2,1
    (4/10)(6/10)(0.24)
  • Pr01 Population 2 p2,1,0 p2,2,1
    (6/10)(4/10)0.24
  • Pr Z1 1 0.24/(0.240.24) 0.5

Assuming, HWE, and LE
24
Sampling
  • Suppose, during the iteration, there is a bias.
  • Then, in the next step of sampling Z, we will do
    the right thing
  • Pr01 pop. 1 p1,1,0 p1,2,1 0.70.7
    0.49
  • Pr01 pop. 2 p2,1,0 p2,2,1 0.30.3
    0.09
  • PrZ1 1 0.49/(0.490.09) 0.85
  • PrZ6 1 0.49/(0.490.09) 0.85
  • Eventually all 01 will become 1 population, and
    all 10 will become a second population

0 .. 1 0 .. 1 0 .. 0 1 .. 1 0 .. 1 0 .. 1 0 ..
1 0 .. 1 0 .. 1 0 .. 1
1 1 1 2 1 2 1 2 1 1 2 2 2 1 2 2 1 2 2 1
1 .. 0 1 .. 0 0 .. 0 1 .. 1 1 .. 0 1 .. 0 1 ..
0 1 .. 0 1 .. 0 1 .. 0
25
Allowing for admixture
  • Define qi,k as the fraction of individual i that
    originated from population k.
  • Iteration
  • Guess Z(0)
  • For m 1,2,..
  • Sample P(m),Q(m) from Pr(P,Q X, Z(m-1))
  • Sample Z(m) from Pr(Z X, P(m),Q(m))

26
Estimating Z (admixture case)
  • Instead of estimating Pr(Z(i)kX,P,Q), (origin
    of individual i is k), we estimate
    Pr(Z(i,j,l)kX,P,Q)

i,1
i,2
j
27
Results on admixture prediction
28
Results Thrush data
  • For each individual, q(i) is plotted as the
    distance to the opposite side of the triangle.
  • The assignment is reliable, and there is evidence
    of admixture.

29
Population Structure
  • 377 locations (loci) were sampled in 1000 people
    from 52 populations.
  • 6 genetic clusters were obtained, which
    corresponded to 5 geographic regions (Rosenberg
    et al. Science 2003)

Oceania
Eurasia
East Asia
America
Africa
30
Population sub-structureresearch problem
  • Systematically explore the effect of admixture.
    Can admixture be predicted for a locus, or for an
    individual
  • The sampling approach may or may not be
    appropriate. Formulate as an optimization/learning
    problem
  • (w/out admixture). Assign individuals to
    sub-populations so as to maximize linkage
    equilibrium, and hardy weinberg equilibrium in
    each of the sub-populations
  • (w/ admixture) Assign (individuals, loci) to
    sub-populations

31
Admixture mapping
32
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com