How much information is lost by haplotype uncertainty in SNP casecontrol analysis - PowerPoint PPT Presentation

Loading...

PPT – How much information is lost by haplotype uncertainty in SNP casecontrol analysis PowerPoint presentation | free to view - id: 26208c-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

How much information is lost by haplotype uncertainty in SNP casecontrol analysis

Description:

Haplotypes in linkage disequilibrium with 'causal mutation' ... (or measure, Stram, 2003) Does not tell you how much information is lost. and/or who are to blame ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 34
Provided by: uh45
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: How much information is lost by haplotype uncertainty in SNP casecontrol analysis


1
How much information is lost by haplotype
uncertainty in SNP case-control analysis?
  • Hae-Won Uh
  • Jeanine Houwing
  • Hein Putter
  • Hans van Houwelingen
  •  
  • Department of Medical Statistics LUMC
  • Statistical Core GenomEUtwin

2
Starting point
  • Case-control association study
  • Haplotypes in linkage disequilibrium with causal
    mutation
  • Limited number of SNPs within candidate gene
  • Interest in Haplotype Relative Risk
  • Assume Hardy-Weinberg Equilibrium

3
  • Problems
  • unknown phase of haplotype
  • loss of information
  • desire of researchers to set the haplotypes
  • Issues
  • how much is lost due to unknown haplotypes?
  • does it pay off to collect additional
    information on parents/sibs?

4
Side-step Disease model
  • Notation
  • - haplotype information (ordered pairs)
  • - (Additive) disease model
  • Leads to

5
  • Consequences
  • - if HWE in controls, that
  • - also HWE in case
  • (known as haplotype relative risk)
  •  
  • (Remember ln(posterior odds)ln(prior odds)
    ln(LR))

6
Estimating haplotype frequencies under HWE
  • Ideal case no haplotype uncertainty
  • k possible haplotypes
  • n persons with 2 haplotypes per person leads to
    2n observed haplotypes
  • frequency estimate
  • standard error of estimate
  • correlations of estimates (negative, because
    SUM1)

7
Practical situation uncertain haplotypes (
heterozygote SNPs)
  • Problems solved by EM algorithm
  • Software can produce standard errors
  • (or measure, Stram, 2003)
  • Does not tell you how much information is lost
  • and/or who are to blame
  • Does not give you the overall measure of
    uncertainty
  • (does not consider the possible correlations of
    the parameters)

8
Example. Data coming from the GAW......
  • 3 SNPs
  • 100 cases
  • 100 controls
  • Prevalence 3

9
Haplotype frequency estimates, standard errors
with/without phase uncertainty
10
  • Some notation
  • Haplotype denoted by sequence H of k 0s and 1s
  • ( for example with 3 SNPs )
  • Haplotype frequency
  • If there is no uncertainty observed
  • (components of are 0,1 or 2, sum2)

11
Relation between Standard errors and Information
  • (using parametrization )
  • Total information in ideal case
  • Covariance matrix of estimated s

12
In case of uncertain haplotypes
  • Louis (1982)
  • observed information complete data information
    - missing information
  • Covariance of estimates increase due to
    loss-of-information
  • Approximately

13
First practical application
  • List of loss-of-information per haplotype per
    individual (diagonal of C-matrix) for cases and
    controls.
  • Tells you what individuals need additional
    information and how much you would gain if you
    were able to set the haplotype for that
    individual .

14
Loss of information
max information content when no haplotype
uncertainty (complete data)
15
Loss of information and

16
Loss-of-information in cases
1 and 2 homozygote 1/1 and 2/2 H
heterozygote
17
Loss-of-information in controls
18
Further application Optimal selection per group
  • Selecting individuals on
    (A-optimality) might not be ideal. It depends on
    parameterization and ignores correlation between
    parameters.
  •  
  • Theory of optimal design
  • Better is to maximize ( )
    (D-optimality)
  • (Leads automatically to equal groups in
    two-sample problem)
  • Forward stepwise selection is applied for each
    group.

19
Selection patterns
controls
cases
20
Increase of Information
CASES
CONTROLS
21
Conclusion of these graphs selection on basis
of total loss is not that bad
22
Remaining issue 1
  • Best selection for joint determination of
    haplotype risks.
  • We want to maximize the information/ minimize
    the error for
  • Can be achieved by minimizing the cov for

23
minimize the cov
24
maximize power of global test
(chi-square with dfk-1)
25
minimize the cov
maximize power
26
Remaining issue 2
  • Does it pay off
  • to collect additional information on parents?
  • (How much do we gain by genotyping parents?)

27
Resolved ambiguous individuals with additional
parental information (GAW) 100
28
600
29
Expected loss
Notation Ho diplotypes of offspring Go
genotypes of offspring Gp genotypes of both
parents D disease status (case or
control) Expected loss after genotyping
parents
30
Expected loss expected gain (A-optimality)
31
Main results
  • Quantification of loss of information
  • A-optimality loss of information per haplotype
  • D-optimality overall loss of information
  • Optimal (forward stepwise) selection per group
  • Best selection for joint determination of
    haplotype risks
  • Minimize error
  • Maximize power
  • For the general use computation of expected gain

32
References
  • Fedorov VV Theory of Optimal Experiments. New
    York Academic Press 1972
  • Hodge SE, Boenke M, Spence MA Loss of
    information due to ambiguous haplotyping. Nat
    Genet 1999, 21360-361
  • Louis T Finding the observed information matrix
    when using the EM algorithm. J R Stat Soc B,
    1982 44226-233
  • Stram D et al. Modeling and E-M estimation of
    haplotype-specific relative risks from genotype
    data for a case-control study of unrelated
    individuals. Hum Hered, 2003 55179-190

33
  • Extensions
  • - limit attention to interesting haplotypes
  • - additional information on sibs/children
  • - include missing data
  • Availability
  • R-program will be available on our web-site
  • Acknowledgements
  • This project is financially supported by
    GENOMEUTWIN.
  •   
About PowerShow.com