Developing Odds Ratio Estimation under a Multistage Design

About This Presentation

Title:

Developing Odds Ratio Estimation under a Multistage Design

Description:

... from such a multistage design can take place with ... test statistics in an inverse variance weighted fashion from each design stage. ... Two stage design: ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 27

Provided by: Jud68

Category:

more less

Transcript and Presenter's Notes

Title: Developing Odds Ratio Estimation under a Multistage Design

1
Developing Odds Ratio Estimation under a
Multistage Design

Judy Zhong

2
Outline

Background
Multistage Design
Estimation under Multistage Design
Simulation Results

3
Background, readiness, potential

Genome-wide association studies are now underway,
enabled by
Rapidly decreasing genotyping costs
Genotyping costs in the vicinity of 0.01/SNP.
May continue to fall?
Massively multiplexed genotyping technologies
Large-scale SNP discovery
For example, David Cox and Perlegen Sciences have
identified 250K tagging SNPs having estimated
minor allele frequencies of 10 or larger across
genome. (Perlegen now uses a 360K tag SNP set,
informed by HapMap2).
Could be used to provide insights into disease
processes and mechanisms to examine genotype
interactions with preventive interventions to
identify susceptibles for targeted disease
screening efforts.
From Dr. Ross Prentice

4
Association study sample size

For some diseases preceding linkage study
results may suggest absence of strong
associations
Hence need large sample sizes, e.g., OR 1.5 for
presence of minor SNP allele (genotype risk
ratio) n number of cases ( number of
controls)
Test Size
Test Power 0.05 0.01
minor allele frequency 0.1
ß 0.80 763 1211
ß 0.95 1301 1875
minor allele frequency 0.5
ß 0.80 325 515
ß 0.95 553 797
(from Breslow and Day, 1987, V2)
From Dr. Ross Prentice

5
Association study genotyping costs and Approaches
to reducing genotyping costs

1000 cases, 1000 controls, 250K SNPs at 0.01/SNP
gives genotyping costs of 5 million.
Also, conventional testing, even at 0.01
gives an expected 2500 false positives, under
global null hypothesis, implying the need for a
larger sample size.
Approaches to reducing genotyping costs
Reduced costs/SNP through further technology
developments
Reduce number of SNPs, perhaps restricted to SNPs
in coding or regulatory regions of known genes
Multistage design, perhaps testing at more
extreme significance levels at later stages
From Dr. Ross Prentice

6
2-Stage Design
Random select a proportion cases controls
Remaining Cases Controls
genotyping
SNPs in 1st stage
genotyping
SNPs in 2nd stage
Significant SNPs
meet test criteria
meet test criteria

e.g., 500 cases and 500 controls at first stage
with 0.05 on 250K SNPs
followed by 1000 cases and 1000 controls at a
second stage using the approximately 12,500 SNPs
significant at first stage and 0.001).
Under global null hypothesis 12.5 false
positives, and genotyping costs of 2.625 million.

7
Association Testing Under a Multistage Design

Individual-level data, observe x 0, 1, 2
according to the number of minor alleles present
for a SNP
Logistic regression of case vs control status on
x (and potential confounding factors) / s
LR comparison of distribution of x between cases
and controls
Test statistic used on the data from stages 1, 2,
, i or based on separate testing at each design
stage.
Testing at stage i can either be based on an
inverse variance weighted log-odds ratio
From Dr. Ross Prentice

8
Combined odds-ratio testing in a two-stage design
9
Summary From Prentice et al (2006)

Hence, in summary, we are able to recommend a
multistage design for high-dimensional SNP
association studies.
Testing from such a multistage design can take
place with good power by considering log-odds
ratio test statistics in an inverse variance
weighted fashion from each design stage.

10
(No Transcript)
11
Odds-ratio estimation under a two stage design

1. Use final stage estimator
2. Use combined-stage estimator

12
Correction Essential
Significant
H2gtC2
H2 in 2nd stage
H1 in 1st stage
H1gtC1
H2lt-C2
Significant
Significant
H2gtC2
H1lt-C1
H2 in 2nd stage
H2lt-C2
Significant
13
Correction Essential
14
Correction
15
Correction
16
Correction Essential
Significant
H2gtC2
H2 in 2nd stage
H1 in 1st stage
H1gtC1
H2lt-C2
Significant
Significant
H2gtC2
H1lt-C1
H2 in 2nd stage
H2lt-C2
Significant
17
Correction
18
Confidence Interval

Use a bootstrap method to get the confidence
interval for the uncorrected combined log-OR
estimator
Resample Nn0n1 patients (with replacement)
from the original sample, get a new combined
log-OR estimator (sometimes the new samples fail
to go through the two stages, ignore them)
Repeat the above procedure B times, and get the
empirical distribution of
Therefore, we get
Using the correction equation

19
Simulation

Simulate a SNP having two alleles N and D, with
minor allele D frequency p10 and D is
associated with a higher risk of getting disease
X number of minor allele, X 0, 1, or 2
Assuming Hardy-Weinberg equilibrium, thus in the
control group
Assume Ds genetic effect is additive (on
log-scale)
Assume Risk ratioOdds ratio for this rare
disease
log OR associated with X is 1/2log(1.35) 0.15
In the case group

20
Simulation

Control group and Case group each has 2000
patients
Two stage design
Randomly select 1000 cases the matching controls
for the 1st stage, and the remaining 1000 samples
for the 2nd stage.
At each stage, observe x 0, 1, 2 according to
the number of minor alleles present for a SNP.
Logistic regression of case vs control status on
x (and potential confounding factors)

21
Bias
22
Correction curve and its sensitivity to sigma
23
Correction
24
95 Confidence interval
25
Simulation Results based on 1000 95 Confidence
Intervals
26
Future Work

Is it appropriate to use Bootstrap method with
the presence of selection? Will it cause bias?
May solve the problem of the CI length and
conservative coverage rate.
Asymptotic distribution of the corrected Log-OR
Generalize the correction method to various
multistage design, including those use pooling at
the first stage
Apply the correction method on genome-wide scan
in theWomens Health Initiative
Develop correction methods to multistage
biomarker studies or other settings of multistage
designs