Realistic Simulation of Genotypes for an Association Mapping Bakeoff Fred Wright, Kirk Wilhelmsen, X - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Realistic Simulation of Genotypes for an Association Mapping Bakeoff Fred Wright, Kirk Wilhelmsen, X

Description:

... reduced genotype frequencies of causative alleles compared to the general population ... True causative SNP. Genome scan threshold. The Bakeoff Data. 5 models ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 16
Provided by: fwri3
Category:

less

Transcript and Presenter's Notes

Title: Realistic Simulation of Genotypes for an Association Mapping Bakeoff Fred Wright, Kirk Wilhelmsen, X


1
Realistic Simulation of Genotypes for an
Association Mapping BakeoffFred Wright, Kirk
Wilhelmsen, Xiaojun Guan, Kevin Gamiel, William
Barry
  • In order to evaluate the efficacy of various
    statistical methods for association mapping, we
    need an approach to generate realistic datasets
  • The population genetics of humans is complicated
    and difficult to simulate
  • Moreover, how can we make results applicable to
    real SNP platforms for genome-wide scans?
  • We thus chose to sample from true (HapMap) data,
    for which most SNPs on major platforms are
    represented
  • The procedure is more general, however, and
    requires only a pool of high-density haplotypes

2
chromosome pool
The HAP-SAMPLE Simulation Approach
0
0
0
1
1
Disease SNP allele values
Case genotypes sampled according to specified
probabilities. Simulation appropriate for ancient
mutations not under strong selection
control chromosomes
case chromosomes
origin
0
0
1
1
0
1
1
1
1
1
1
1
origin
3
Disease model
List of typed SNPs
Simulation
Output (genotypes or phased haplotypes) for
case-control or affected-child trio designs
HAP-SAMPLE Input/Output
4
How are case disease SNP genotypes simulated?
g joint genotype of L disease loci (3L of
these) g0 referent joint genotype D 0 if
control, 1 if disease/case RRg relative risk of
genotype g compared to g0
Obtained from pool
specified
5
  • The preceding slide assumes that controls can
    be thought of as random samples from the
    population
  • But for non-rare diseases, this is not realistic
  • Thus we need to simulate controls in a manner
    similar to cases
  • True controls should show slightly reduced
    genotype frequencies of causative alleles
    compared to the general population
  • Controls simulated in this manner are called
    anti-cases
  • Even for fairly common diseases (e.g.
    prevalencegt5), anti-case genotype probabilities
    (at disease loci) are similar to the general
    population

6
Examples from HAP-SAMPLE paper shows that
simulated SNPs are discoverable
Genome scan threshold
True causative SNP
7
The Bakeoff Data
  • 5 models
  • 300,000 SNPs from the Illumina 300K platform
    simulated using HAP-SAMPLE
  • 5000 cases, 5000 anti-cases for each model
  • How many true disease SNPs per model?
  • A 4 or 5, depending on the model
  • This implies 34 or 35 joint genotypes

8
Model 1 Multiplicative, disease prevalence5
Locus L1 L2 L3 L4 L5 MAF 0.4 0.2 0.1 0.1 0.2 RR
contributions geno0 1 1 1.4 1.5 1 geno1
1.1 1.2 1.25 1 1.4 geno2 1.5 1.7 1 1 1.8
Model 2 Additive, disease prevalence5
Locus L1 L2 L3 L4 L5 MAF 0.4 0.2 0.1 0.1 0.2 RR
contributions geno0 1 1 1.4 1.5 1 geno1
1.1 1.2 1.3 1 1.4 geno2 1.5 1.7 1 1 1.8
9
Model 3 Additive, disease prevalence5
Locus L1 L2 L3 L4 L5 MAF 0.4 0.2 0.1 0.1 0.2 RR
contributions geno0 1 1 1.15 1.4 1 geno1
1.1 1.1 1.15 1 1.2 geno2 1.1 1.2 1 1 1.4
Model 4 Additive, disease prevalence0.01
Locus L1 L2 L3 L4 L5 MAF 0.4 0.2 0.1 0.1 0.2 RR
contributions geno0 1 1 1.4 1.4 1 geno1
1.1 1.1 1.2 1 1.2 geno2 1.1 1.2 1 1 1.4
10
Model 5 two independent effects of 2 epistatic
loci, disease prevalence10
MAF values
11
  • How do we know if we are simulating correctly?
  • There are 243 joint disease genotypes (only 81
    for model 5)
  • For each model, each of these has an expected vs.
    observed frequency in cases and in anti-cases
  • We can compare observed vs expected for the
    simulated data
  • Chance variation should occur

12
Comparing observed cell counts vs. expected, to
see if consistent with chance variation
Expected cell counts for each of the 35 joint
genotypes
13
Expected cell counts for each of the 35 joint
genotypes
14
  • Misc thoughts
  • Because the true disease SNPs are themselves on
    the typing platform, haplotype-based approaches
    should give no extra power
  • The interaction models are a bit complicated.
    But it might be useful to order the genes by
    effect size.
  • We defined a marginal effect size for each
    disease SNP using the expected chi-square
    statistic for the 3 (genotype) X 2 (case status)
    contingency table

We plugged in expected value under true disease
model
Expected under null hypothesis of no gene effect
15
  • The results look reasonable at the disease loci
  • Let the fun begin!
Write a Comment
User Comments (0)
About PowerShow.com