Developing Odds Ratio Estimation under a Multistage Design - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Developing Odds Ratio Estimation under a Multistage Design

Description:

... from such a multistage design can take place with ... test statistics in an inverse variance weighted fashion from each design stage. ... Two stage design: ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 27
Provided by: Jud68
Category:

less

Transcript and Presenter's Notes

Title: Developing Odds Ratio Estimation under a Multistage Design


1
Developing Odds Ratio Estimation under a
Multistage Design
  • Judy Zhong

2
Outline
  • Background
  • Multistage Design
  • Estimation under Multistage Design
  • Simulation Results

3
Background, readiness, potential
  • Genome-wide association studies are now underway,
    enabled by
  • Rapidly decreasing genotyping costs
  • Genotyping costs in the vicinity of 0.01/SNP.
    May continue to fall?
  • Massively multiplexed genotyping technologies
  • Large-scale SNP discovery
  • For example, David Cox and Perlegen Sciences have
    identified 250K tagging SNPs having estimated
    minor allele frequencies of 10 or larger across
    genome. (Perlegen now uses a 360K tag SNP set,
    informed by HapMap2).
  • Could be used to provide insights into disease
    processes and mechanisms to examine genotype
    interactions with preventive interventions to
    identify susceptibles for targeted disease
    screening efforts.
  • From Dr. Ross Prentice

4
Association study sample size
  • For some diseases preceding linkage study
    results may suggest absence of strong
    associations
  • Hence need large sample sizes, e.g., OR 1.5 for
    presence of minor SNP allele (genotype risk
    ratio) n number of cases ( number of
    controls)

  • Test Size
  • Test Power 0.05 0.01
  • minor allele frequency 0.1
  • ß 0.80 763 1211
  • ß 0.95 1301 1875
  • minor allele frequency 0.5
  • ß 0.80 325 515
  • ß 0.95 553 797
  • (from Breslow and Day, 1987, V2)
  • From Dr. Ross Prentice

5
Association study genotyping costs and Approaches
to reducing genotyping costs
  • 1000 cases, 1000 controls, 250K SNPs at 0.01/SNP
    gives genotyping costs of 5 million.
  • Also, conventional testing, even at 0.01
    gives an expected 2500 false positives, under
    global null hypothesis, implying the need for a
    larger sample size.
  • Approaches to reducing genotyping costs
  • Reduced costs/SNP through further technology
    developments
  • Reduce number of SNPs, perhaps restricted to SNPs
    in coding or regulatory regions of known genes
  • Multistage design, perhaps testing at more
    extreme significance levels at later stages
  • From Dr. Ross Prentice

6
2-Stage Design
Random select a proportion cases controls
Remaining Cases Controls
genotyping
SNPs in 1st stage
genotyping
SNPs in 2nd stage
Significant SNPs
meet test criteria
meet test criteria
  • e.g., 500 cases and 500 controls at first stage
    with 0.05 on 250K SNPs
  • followed by 1000 cases and 1000 controls at a
    second stage using the approximately 12,500 SNPs
    significant at first stage and 0.001).
  • Under global null hypothesis 12.5 false
    positives, and genotyping costs of 2.625 million.

7
Association Testing Under a Multistage Design
  • Individual-level data, observe x 0, 1, 2
    according to the number of minor alleles present
    for a SNP
  • Logistic regression of case vs control status on
    x (and potential confounding factors) / s
  • LR comparison of distribution of x between cases
    and controls
  • Test statistic used on the data from stages 1, 2,
    , i or based on separate testing at each design
    stage.
  • Testing at stage i can either be based on an
    inverse variance weighted log-odds ratio
  • From Dr. Ross Prentice

8
Combined odds-ratio testing in a two-stage design
9
Summary From Prentice et al (2006)
  • Hence, in summary, we are able to recommend a
    multistage design for high-dimensional SNP
    association studies.
  • Testing from such a multistage design can take
    place with good power by considering log-odds
    ratio test statistics in an inverse variance
    weighted fashion from each design stage.

10
(No Transcript)
11
Odds-ratio estimation under a two stage design
  • 1. Use final stage estimator
  • 2. Use combined-stage estimator

12
Correction Essential
Significant
H2gtC2
H2 in 2nd stage
H1 in 1st stage
H1gtC1
H2lt-C2
Significant
Significant
H2gtC2
H1lt-C1
H2 in 2nd stage
H2lt-C2
Significant
13
Correction Essential
14
Correction
15
Correction
16
Correction Essential
Significant
H2gtC2
H2 in 2nd stage
H1 in 1st stage
H1gtC1
H2lt-C2
Significant
Significant
H2gtC2
H1lt-C1
H2 in 2nd stage
H2lt-C2
Significant
17
Correction
18
Confidence Interval
  • Use a bootstrap method to get the confidence
    interval for the uncorrected combined log-OR
    estimator
  • Resample Nn0n1 patients (with replacement)
    from the original sample, get a new combined
    log-OR estimator (sometimes the new samples fail
    to go through the two stages, ignore them)
  • Repeat the above procedure B times, and get the
    empirical distribution of
  • Therefore, we get
  • Using the correction equation

19
Simulation
  • Simulate a SNP having two alleles N and D, with
    minor allele D frequency p10 and D is
    associated with a higher risk of getting disease
  • X number of minor allele, X 0, 1, or 2
  • Assuming Hardy-Weinberg equilibrium, thus in the
    control group
  • Assume Ds genetic effect is additive (on
    log-scale)
  • Assume Risk ratioOdds ratio for this rare
    disease
  • log OR associated with X is 1/2log(1.35) 0.15
  • In the case group

20
Simulation
  • Control group and Case group each has 2000
    patients
  • Two stage design
  • Randomly select 1000 cases the matching controls
    for the 1st stage, and the remaining 1000 samples
    for the 2nd stage.
  • At each stage, observe x 0, 1, 2 according to
    the number of minor alleles present for a SNP.
    Logistic regression of case vs control status on
    x (and potential confounding factors)

21
Bias
22
Correction curve and its sensitivity to sigma
23
Correction
24
95 Confidence interval
25
Simulation Results based on 1000 95 Confidence
Intervals
26
Future Work
  • Is it appropriate to use Bootstrap method with
    the presence of selection? Will it cause bias?
    May solve the problem of the CI length and
    conservative coverage rate.
  • Asymptotic distribution of the corrected Log-OR
  • Generalize the correction method to various
    multistage design, including those use pooling at
    the first stage
  • Apply the correction method on genome-wide scan
    in theWomens Health Initiative
  • Develop correction methods to multistage
    biomarker studies or other settings of multistage
    designs
Write a Comment
User Comments (0)
About PowerShow.com