A new multipoint method for genomewide association studies by imputation of genotypes - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

A new multipoint method for genomewide association studies by imputation of genotypes

Description:

The imputation of other types of genetic variants that may show substantial ... Imputation of genotypes for different study designs such as trio designs for ... – PowerPoint PPT presentation

Number of Views:199
Avg rating:3.0/5.0
Slides: 23
Provided by: yixua
Category:

less

Transcript and Presenter's Notes

Title: A new multipoint method for genomewide association studies by imputation of genotypes


1
A new multipoint method for genome-wide
association studies by imputation of genotypes
  • Marchini, J., Howie, B., Myers, S., McVean, G.
    Donnelly, P.
  • Nature Genetics 906-913 (2007)
  • Presented by Yixuan Chen
  • 10/12/2007

2
Contents
  • Introduction
  • Methods
  • Results

3
Association Study
4
Association Test
5
Disease Mapping
Disease Gene
Disease status
S2
...
SNP1
...
a ? 2 1 1 a ? 1
2 1
1 2 2 1 1 2 1 2 1 2 1 1
2 2
1 2 2 1 2 1 1
2
2 1 1 1 1 1 1
1
c 2 1 ? ?c 1 1
? ?
1 2 2 1 1 2 1 1 2 2 2 1
1 1
1 1 2 1 1 2 2 2 2 2 1 1
2 1
2 2 ? 1 1 1 ?
1
a 1 1 2 1a 1 1
1 2

6
Haplotype
7
Enhancement by Imputation
  • Dense genotypes by commercial genotyping chips
  • Unlikely to include the true causal variant
  • Based on the observed data, impute the missing
    data
  • These in silico genotypes can then be used as
    if the SNPs involved were directly genotyped
  • Multipoint instead of single-SNP approaches

8
Methods
  • HH1,,HN denotes a set of N known haplotypes
  • HiHi1,,HiL is a single haplotype
  • H is set to be the 120 CEU haplotypes in the
    HapMap project
  • GG1,,GK denotes the genotype data on the K
    individuals in a new study
  • GiGi1,,GiL

9
Hidden Markov Model for each genotype
  • The model for Pr(GiH) is a HMM
  • The hidden states are a sequence of pairs of the
    N known haplotypes in the set H

10
(No Transcript)
11
The HMM Model
  • Pr(Zi(1), Zi(2)H) defines the prior probability
    on how the sequences of hidden states, Z(1) and
    Z(2), change along the sequence.
  • The switching rates depend upon an estimate of
    the fine-scale recombination map based upon the
    HapMap Phase II Data.

12
The HMM Model (c1)
The effective population size
13
The HMM Model (c2)
14
Imputation of Missing SNPs
  • Association test at an unobserved SNP
  • Simulate M realizations of this SNP in H
  • Treat the SNP as known in H and can be
    conditioned upon to simulate the unknown in G

15
Imputation of Missing SNPs (c1)
  • Suppose the jth sites in the set H are all missing

A product of approximate conditionals (PAC)
model. Given an ordering of H(1), , H(N), the
missing H(1)j, , H(N)j, are simulated
sequentially.
16
Imputation of Missing SNPs (c2)
  • Probabilistically choosing a haplotype to which
    to apply the first mutation
  • pr denotes the probability of the first mutation
    occurring on the (r1)th haplotype in the ordering

17
Imputation of Missing SNPs (c3)
  • If mth haplotype carries the first mutation
  • The alleles are simulated using higher order
    approximate conditionals

18
Results
  • The imputation at a particular SNP can
    theoretically combine information from all typed
    SNPs on the same chromosome.
  • The influence of these SNPs decreases with
    increasing genetic distance from the locus of
    interest.

19
Datasets
  • CEU HapMap samples
  • Known haplotypes
  • Individuals from the 1958 British Birth Cohort
    that were typed at 500,000 SNPs on the
    Affymetrix 500K chip
  • Evaluation imputation
  • Separately typed 15,000 SNPs on a custom Illumina
    chip
  • Imputation

20
(No Transcript)
21
(No Transcript)
22
Summary
  • Combine data from studies that use different
    genotyping chips to facilitate these
    meta-analytic approaches
  • The imputation of other types of genetic variants
    that may show substantial association with
    phenotypes
  • Imputation of genotypes for different study
    designs such as trio designs for association
    mapping and mapping by admixture LD (MALD)
    studies
  • Development of statistical information measures
    for untyped variants
Write a Comment
User Comments (0)
About PowerShow.com