A new multipoint method for genomewide association studies by imputation of genotypes

About This Presentation

Title:

A new multipoint method for genomewide association studies by imputation of genotypes

Description:

The imputation of other types of genetic variants that may show substantial ... Imputation of genotypes for different study designs such as trio designs for ... – PowerPoint PPT presentation

Number of Views:199

Avg rating:3.0/5.0

Slides: 23

Provided by: yixua

Category:

more less

Transcript and Presenter's Notes

Title: A new multipoint method for genomewide association studies by imputation of genotypes

1
A new multipoint method for genome-wide
association studies by imputation of genotypes

Marchini, J., Howie, B., Myers, S., McVean, G.
Donnelly, P.
Nature Genetics 906-913 (2007)
Presented by Yixuan Chen
10/12/2007

2
Contents

Introduction
Methods
Results

3
Association Study
4
Association Test
5
Disease Mapping
Disease Gene
Disease status
S2
...
SNP1
...
a ? 2 1 1 a ? 1
2 1
1 2 2 1 1 2 1 2 1 2 1 1
2 2
1 2 2 1 2 1 1
2
2 1 1 1 1 1 1
1
c 2 1 ? ?c 1 1
? ?
1 2 2 1 1 2 1 1 2 2 2 1
1 1
1 1 2 1 1 2 2 2 2 2 1 1
2 1
2 2 ? 1 1 1 ?
1
a 1 1 2 1a 1 1
1 2

6
Haplotype
7
Enhancement by Imputation

Dense genotypes by commercial genotyping chips
Unlikely to include the true causal variant
Based on the observed data, impute the missing
data
These in silico genotypes can then be used as
if the SNPs involved were directly genotyped
Multipoint instead of single-SNP approaches

8
Methods

HH1,,HN denotes a set of N known haplotypes
HiHi1,,HiL is a single haplotype
H is set to be the 120 CEU haplotypes in the
HapMap project
GG1,,GK denotes the genotype data on the K
individuals in a new study
GiGi1,,GiL

9
Hidden Markov Model for each genotype

The model for Pr(GiH) is a HMM
The hidden states are a sequence of pairs of the
N known haplotypes in the set H

10
(No Transcript)
11
The HMM Model

Pr(Zi(1), Zi(2)H) defines the prior probability
on how the sequences of hidden states, Z(1) and
Z(2), change along the sequence.
The switching rates depend upon an estimate of
the fine-scale recombination map based upon the
HapMap Phase II Data.

12
The HMM Model (c1)
The effective population size
13
The HMM Model (c2)
14
Imputation of Missing SNPs

Association test at an unobserved SNP
Simulate M realizations of this SNP in H
Treat the SNP as known in H and can be
conditioned upon to simulate the unknown in G

15
Imputation of Missing SNPs (c1)

Suppose the jth sites in the set H are all missing

A product of approximate conditionals (PAC)
model. Given an ordering of H(1), , H(N), the
missing H(1)j, , H(N)j, are simulated
sequentially.
16
Imputation of Missing SNPs (c2)

Probabilistically choosing a haplotype to which
to apply the first mutation
pr denotes the probability of the first mutation
occurring on the (r1)th haplotype in the ordering

17
Imputation of Missing SNPs (c3)

If mth haplotype carries the first mutation
The alleles are simulated using higher order
approximate conditionals

18
Results

The imputation at a particular SNP can
theoretically combine information from all typed
SNPs on the same chromosome.
The influence of these SNPs decreases with
increasing genetic distance from the locus of
interest.

19
Datasets

CEU HapMap samples
Known haplotypes
Individuals from the 1958 British Birth Cohort
that were typed at 500,000 SNPs on the
Affymetrix 500K chip
Evaluation imputation
Separately typed 15,000 SNPs on a custom Illumina
chip
Imputation

20
(No Transcript)
21
(No Transcript)
22
Summary

Combine data from studies that use different
genotyping chips to facilitate these
meta-analytic approaches
The imputation of other types of genetic variants
that may show substantial association with
phenotypes
Imputation of genotypes for different study
designs such as trio designs for association
mapping and mapping by admixture LD (MALD)
studies
Development of statistical information measures
for untyped variants