Title: A new multipoint method for genomewide association studies by imputation of genotypes
1A new multipoint method for genome-wide
association studies by imputation of genotypes
- Marchini, J., Howie, B., Myers, S., McVean, G.
Donnelly, P. - Nature Genetics 906-913 (2007)
- Presented by Yixuan Chen
- 10/12/2007
2Contents
- Introduction
- Methods
- Results
3Association Study
4Association Test
5Disease Mapping
Disease Gene
Disease status
S2
...
SNP1
...
a ? 2 1 1 a ? 1
2 1
1 2 2 1 1 2 1 2 1 2 1 1
2 2
1 2 2 1 2 1 1
2
2 1 1 1 1 1 1
1
c 2 1 ? ?c 1 1
? ?
1 2 2 1 1 2 1 1 2 2 2 1
1 1
1 1 2 1 1 2 2 2 2 2 1 1
2 1
2 2 ? 1 1 1 ?
1
a 1 1 2 1a 1 1
1 2
6Haplotype
7Enhancement by Imputation
- Dense genotypes by commercial genotyping chips
- Unlikely to include the true causal variant
- Based on the observed data, impute the missing
data - These in silico genotypes can then be used as
if the SNPs involved were directly genotyped - Multipoint instead of single-SNP approaches
8Methods
- HH1,,HN denotes a set of N known haplotypes
- HiHi1,,HiL is a single haplotype
- H is set to be the 120 CEU haplotypes in the
HapMap project - GG1,,GK denotes the genotype data on the K
individuals in a new study - GiGi1,,GiL
9Hidden Markov Model for each genotype
- The model for Pr(GiH) is a HMM
- The hidden states are a sequence of pairs of the
N known haplotypes in the set H
10(No Transcript)
11The HMM Model
- Pr(Zi(1), Zi(2)H) defines the prior probability
on how the sequences of hidden states, Z(1) and
Z(2), change along the sequence. - The switching rates depend upon an estimate of
the fine-scale recombination map based upon the
HapMap Phase II Data.
12The HMM Model (c1)
The effective population size
13The HMM Model (c2)
14Imputation of Missing SNPs
- Association test at an unobserved SNP
- Simulate M realizations of this SNP in H
- Treat the SNP as known in H and can be
conditioned upon to simulate the unknown in G
15Imputation of Missing SNPs (c1)
- Suppose the jth sites in the set H are all missing
A product of approximate conditionals (PAC)
model. Given an ordering of H(1), , H(N), the
missing H(1)j, , H(N)j, are simulated
sequentially.
16Imputation of Missing SNPs (c2)
- Probabilistically choosing a haplotype to which
to apply the first mutation - pr denotes the probability of the first mutation
occurring on the (r1)th haplotype in the ordering
17Imputation of Missing SNPs (c3)
- If mth haplotype carries the first mutation
- The alleles are simulated using higher order
approximate conditionals
18Results
- The imputation at a particular SNP can
theoretically combine information from all typed
SNPs on the same chromosome. - The influence of these SNPs decreases with
increasing genetic distance from the locus of
interest.
19Datasets
- CEU HapMap samples
- Known haplotypes
- Individuals from the 1958 British Birth Cohort
that were typed at 500,000 SNPs on the
Affymetrix 500K chip - Evaluation imputation
- Separately typed 15,000 SNPs on a custom Illumina
chip - Imputation
20(No Transcript)
21(No Transcript)
22Summary
- Combine data from studies that use different
genotyping chips to facilitate these
meta-analytic approaches - The imputation of other types of genetic variants
that may show substantial association with
phenotypes - Imputation of genotypes for different study
designs such as trio designs for association
mapping and mapping by admixture LD (MALD)
studies - Development of statistical information measures
for untyped variants