Haplotype Blocks - PowerPoint PPT Presentation

About This Presentation
Title:

Haplotype Blocks

Description:

N. Patil et al., (2001), Blocks of Limited Haplotype Diversity Revealed by High ... two copies of chromosome 21 using a rodent-human somatic cell hybrid technique ... – PowerPoint PPT presentation

Number of Views:425
Avg rating:3.0/5.0
Slides: 39
Provided by: pola4
Category:

less

Transcript and Presenter's Notes

Title: Haplotype Blocks


1
Haplotype Blocks
  • An Overview
  • A. Polanski
  • Department of Statistics
  • Rice University

2
Key Papers
  1. N. Patil et al., (2001), Blocks of Limited
    Haplotype Diversity Revealed by High-Resolution
    Scanning of Human Chromosome 21, Science, vol.
    294, pp. 1719-1723
  2. N. Wang et al., (2002), Distribution of
    Recombination Crossovers and the Origin of
    Haplotype Blocks The Interplay of Population
    History, Recombination and Mutation, Am. J. Hum.
    Genet., vol. 71, pp. 1227-1234.
  3. K. Zhang et al., (2002), A Dynamic Programming
    Algorithm for Haplotype Block Partitioning, PNAS,
    vol. 99, pp. 7335-7339

3
Supplementary Papers
  1. R. Hudson, N. Kaplan, (1985), Statistical
    Properties of the Number of Recombination Events
    in The History of a Sample of DNA sequences,
    Genetics, vol. 111, pp. 147-164
  2. R. Hudson, 2002, Generating Samples under a
    Wright-Fisher Neutral Model of Genetic Variation,
    Bioinformatics, vol. 18, pp. 337-338
  3. D. Reich et al., (2001), Linkage Disequilibrium
    in the Human Genome, Nature, vol. 411, pp. 199-204

4
What are Haplotype Blocks ?
  • Haplotype block a sequence of contiguous
    markers on DNA, homogeneous according to some
    criterion
  • Markers Single Nucleotide Polymorphisms (SNPs)

5
Data (Patil et al. 2001)
  • Chromosome 21
  • Physically separated the two copies of chromosome
    21 using a rodent-human somatic cell hybrid
    technique
  • Sample of 20 copies of chromosome 21 (32397439
    bases)
  • Found 35989 SNPs

6
Fig. 2 from (Patil et al. 2001)
7
SNP no i
01000000000000000000100000000000000100001110000000
00100000001001000000001001000000000000000000001000
000001101000010101010 0000000010000000000010000000
00010010000100000000000000101100100100101000100100
0000000010010001011000000001101010010101010 000000
00010001000101100010100000000101000110000000000101
00000000000100000100110000011101001000000110000110
001000100011010 0000000000000100010010001010000000
01010001100000000001010000000000010000010011000001
1101001000000110000110001000100011010 000000001000
00000000100001000001001000000000000000000010010010
01001010001001000000000010010001011000000001100100
000000000 0010000000100001000010010000000000010000
01100000000001010000000010010011010001000000001000
0001001000001001110100000000000 000000001000000000
00100001001101001000000000000000000010010010010010
10001001000000000010010001011000000001100100000000
000 1000100000000000000001000001000101000000000000
00000100000000100100000100100100000010000000010000
1000000001101010010101010 000000000000100000001000
00000000000100000110000000000000000010010000000010
01000000000000000000001000000001101000010101010 00
00000010000000000010000100000100100000000000000000
00100100110100101000100100000000001001000101100000
0001100100000000000 100010000000000000000100000100
01010000000000000000010000000010010000000010010000
00000000000000001000010001101010010101010 00001000
00000000100001000000000101000000000000000000000000
00100101000000100100000000000000000000100000000110
1000010101010 100010000000000000000100000100010100
00000000000000010000000010010000010010010000001000
00000100001000000001101010010101010 00000001001000
00000010010000000000011000011010000000010100000010
10010010010001001000001010000100100000100111010000
0000000 100010000000001000000100000100010100000000
00000000010000000010010000010010010000001000000001
00001000000001101010010101010 00000000001000000000
10010000000000010000011000000000010100000000100100
10010001000000001000000100100000100111010100000000
1 000000000010000000001001000000000001000001101000
00000101000000101001001001000100100000100000010010
01001001110100000000000 00010010000100000010001000
00001010000000011001111110000000110000000000000010
011101010000001010100100000000001000001011110 0000
10000000000010000100000000010100000000000000000000
00000010010100000010010000000000000000000010000000
01101000010101010 00010100000000000010000000000000
10000010011101000010000000100000000000000010010001
010000001000100100100000001000001011010

20
i 1, 2, , 35989
8
Problems
9
How do we determine boundaries between blocks ?
  1. Average value of standarized coefficient of
    linkage disequilibrium is greater than some
    threshold (Wang et al. 2002, Reich et al. 2001)
  2. Infer sites in the sample of DNA sequences where
    recombination events happened in the past history
    (Wang et al. 2002, Hudson, 2002)
  3. Chromosome coverage minimum number of SNPs to
    account for majority of haplotypes (Patil et al.
    2001, Zhang et al. 2002)

10
What evolutionary forces are responsible for
haplotype blocks formation ?
  • Mutation
  • Genetic drift
  • Recombination
  • Recombination hot spots

11
Methods
12
Method 1 (Wang et al. 2002)
Infer sites in the sample of DNA sequences where
recombination events happened in the past
history
13
Three gamete condition
  • Consider a pair of SNPs, SNP1 and SNP2. If there
    was no recombination between SNP1 and SNP2, they
    must satisfy three gamete condition

GC
SNP1
SNP2
SNP1
SNP2
AC
A
C
GT
A?G
C?T
G
C
G
T
14
Four gamete test (Hudson and Kaplan, 1985)
  • If we see all four gametes at SNP1 and SNP2

SNP1
SNP2
A
C
4GT
G
C
G
T
A
T
Then there must have been a recombination event
between these sites in their past history
15
Array of pairwise 4GT test results
  • Hudson and Kaplan, 1985

0, if there are less then 4 gametes
D, dij
1, if there are 4 gametes
What is the minimal number of recombinations that
could explain observed data ? Statistics FR
(Hudson and Kaplan, 1985)
16
Fig. 1 from Wang et al., 2002
D
Block 1
Block 2
Block 3
17
Wang et al., 2002 - Study
  • R. Hudsons program for simulating genealogies
    with mutation, drift and recombination under
    various demographic scenarios
  • Study of dependence of average lengths of blocks
    on different factors
  • Comparison of simulation results to data from
    Patil et al., 2002

18
Dependence of average lengths of blocks on
recombination frequency
19
on sample size
20
... on mutation intensity
21
Comparison to data from Patil et al. 2001
  • Compute distribution of haplotype block lengths
    in the data from Patil et al. 2001
  • Try to tune parameters ? and R to obtain similar
    distribution in the simulations

22
Failed
23
Try a mixture of two different recombination
frequencies - better
24
Method 2 (Patil, 2001)
Chromosome coverage minimum number of SNPs to
account for majority of haplotypes
25
Fig. 2 from (Patil et al. 2001)
26
Problem formulation
  • Define block boundaries to minimize the number of
    SNPs that distinguish at least ? percent of the
    haplotypes in each block

27
Common haplotypes
  • Those represented more than one in the block

28
Condition
  • Common haplotypes must constitute at least ?80
    percent of all haplotypes in the block
  • Blocks that do not satisfy this are not allowed

29
Fragment of Fig. 2 from Patil et al., 2001
30
Notation
  • B block defined as numbers of SNPs,
  • e.g., B 45, 46,.50, or B i, i1,, j
  • L(B) length of the block (number of SNPs)
  • f(B) minimum number of SNPs required to
    distinguish common haplotypes

31
Greedy Solution
01000000000000000000100000000000000100001110000000
00100000001001000000001001000000000000000000001000
000001101000010101010 0000000010000000000010000000
00010010000100000000000000101100100100101000100100
0000000010010001011000000001101010010101010 000000
00010001000101100010100000000101000110000000000101
00000000000100000100110000011101001000000110000110
001000100011010 0000000000000100010010001010000000
01010001100000000001010000000000010000010011000001
1101001000000110000110001000100011010 000000001000
00000000100001000001001000000000000000000010010010
01001010001001000000000010010001011000000001100100
000000000 0010000000100001000010010000000000010000
01100000000001010000000010010011010001000000001000
0001001000001001110100000000000 000000001000000000
00100001001101001000000000000000000010010010010010
10001001000000000010010001011000000001100100000000
000 1000100000000000000001000001000101000000000000
00000100000000100100000100100100000010000000010000
1000000001101010010101010 000000000000100000001000
00000000000100000110000000000000000010010000000010
01000000000000000000001000000001101000010101010 00
00000010000000000010000100000100100000000000000000
00100100110100101000100100000000001001000101100000
0001100100000000000 100010000000000000000100000100
01010000000000000000010000000010010000000010010000
00000000000000001000010001101010010101010 00001000
00000000100001000000000101000000000000000000000000
00100101000000100100000000000000000000100000000110
1000010101010 100010000000000000000100000100010100
00000000000000010000000010010000010010010000001000
00000100001000000001101010010101010 00000001001000
00000010010000000000011000011010000000010100000010
10010010010001001000001010000100100000100111010000
0000000 100010000000001000000100000100010100000000
00000000010000000010010000010010010000001000000001
00001000000001101010010101010 00000000001000000000
10010000000000010000011000000000010100000000100100
10010001000000001000000100100000100111010100000000
1 000000000010000000001001000000000001000001101000
00000101000000101001001001000100100000100000010010
01001001110100000000000 00010010000100000010001000
00001010000000011001111110000000110000000000000010
011101010000001010100100000000001000001011110 0000
10000000000010000100000000010100000000000000000000
00000010010100000010010000000000000000000010000000
01101000010101010 00010100000000000010000000000000
10000010011101000010000000100000000000000010010001
010000001000100100100000001000001011010
.
Start
End
1. Increment end
0. Fix Start End
2. Compute ratio L(B)/f(B)
3. Stop at max
4. Go to 0
32
Results
  • 4563 representative SNPs (13)
  • 4135 blocks

33
Method 3 (Zhang et al. 2002)
  • Solves the same problem of 80 chromosome
    coverage, but using the better method of dynamic
    programming

34
Dynamic programming solution
i
B1(i)
B2(i)
B3(i)
01000000000000000000100000000000000100001110000000
00100000001001000000001001000000000000000000001000
000001101000010101010 0000000010000000000010000000
00010010000100000000000000101100100100101000100100
0000000010010001011000000001101010010101010 000000
00010001000101100010100000000101000110000000000101
00000000000100000100110000011101001000000110000110
001000100011010 0000000000000100010010001010000000
01010001100000000001010000000000010000010011000001
1101001000000110000110001000100011010 000000001000
00000000100001000001001000000000000000000010010010
01001010001001000000000010010001011000000001100100
000000000 0010000000100001000010010000000000010000
01100000000001010000000010010011010001000000001000
0001001000001001110100000000000 000000001000000000
00100001001101001000000000000000000010010010010010
10001001000000000010010001011000000001100100000000
000 1000100000000000000001000001000101000000000000
00000100000000100100000100100100000010000000010000
1000000001101010010101010 000000000000100000001000
00000000000100000110000000000000000010010000000010
01000000000000000000001000000001101000010101010 00
00000010000000000010000100000100100000000000000000
00100100110100101000100100000000001001000101100000
0001100100000000000 100010000000000000000100000100
01010000000000000000010000000010010000000010010000
00000000000000001000010001101010010101010 00001000
00000000100001000000000101000000000000000000000000
00100101000000100100000000000000000000100000000110
1000010101010 100010000000000000000100000100010100
00000000000000010000000010010000010010010000001000
00000100001000000001101010010101010 00000001001000
00000010010000000000011000011010000000010100000010
10010010010001001000001010000100100000100111010000
0000000 100010000000001000000100000100010100000000
00000000010000000010010000010010010000001000000001
00001000000001101010010101010 00000000001000000000
10010000000000010000011000000000010100000000100100
10010001000000001000000100100000100111010100000000
1 000000000010000000001001000000000001000001101000
00000101000000101001001001000100100000100000010010
01001001110100000000000 00010010000100000010001000
00001010000000011001111110000000110000000000000010
011101010000001010100100000000001000001011110 0000
10000000000010000100000000010100000000000000000000
00000010010100000010010000000000000000000010000000
01101000010101010 00010100000000000010000000000000
10000010011101000010000000100000000000000010010001
010000001000100100100000001000001011010

Optimal partition of SNPs 1,2, i
Assume that for all i1, 2, , j-1 we know
optimal block partition, B1(i), B2(i), , Bk(i)
that minimizes
35
Bellmans equation
36
Results
  • 3582 representative SNPs (compared to 4563 from
    greedy algorithm)
  • 2575 blocks (compared to 4135 blocks from greedy
    algorithm)

37
Conclusions
  • Studying haplotype block partitions is very
    important to
  • 1. Constructing haplotype maps for genetic
  • traits
  • 2. Understanding recombination in human
  • genome

38
To expect
  • A lot of papers in this area appearing in
    scientific journals
Write a Comment
User Comments (0)
About PowerShow.com