Association Studies: Statistical And Study Design Issues - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Association Studies: Statistical And Study Design Issues

Description:

Genetic association analysis: some statistical issues. Study design: Problems and issues ... Am J Hum Genet 43(4): 520-6. Designs for Family-based LD studies ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 42
Provided by: flemmin9
Category:

less

Transcript and Presenter's Notes

Title: Association Studies: Statistical And Study Design Issues


1
Genomics Flemming Pociot, MD, DMSc Steno Diabetes
Center STAR Research in Diabetology
Epidemiology
Association Studies Statistical And Study Design
Issues
2
This lecture
  • Association in context
  • Genetic association analysis some statistical
    issues
  • Study design Problems and issues
  • The future for association analysis

3
Some challenges
  • Field is young and changing rapidly
  • Literature can be difficult
  • Example Haseman Elston is a standard linkage
    method
  • 4-5 variations on HE recently proposed
  • Hard to compare approaches
  • Software often free, but sometimes not well-tested

4
Some challenges - continued
  • Methods are sometimes oversold.
  • Sure thing of the day
  • Collecting X affected sib pairs
  • Collecting X unrelated cases and controls
  • Isolated populations
  • SNPs

5
Evidence for genetic effect on phenotype
activity
activity
IBD
Genotype
0
1
2
aa
ab
bb
variance
Relationship
Popula-tion
MZ
Sibs
6
How to analyze the genetics underlying a specific
trait or phenotype ?
7
5/38
8
Broad Genetic Epidemiology Study Design
Categories
  • Linkage Analysis
  • Follows meiotic events through families for
    co-segregation of disease and particular genetic
    variants
  • Large Families
  • Sibling Pairs (or other family pairs)
  • Works VERY well for Mendelian diseases
  • Association Studies
  • Detect association between genetic variants and
    disease across families exploits linkage
    disequilibrium
  • Case-Control designs
  • Cohort designs
  • Parents affected child trios (TDT)
  • May be more appropriate for complex diseases

9
Allelic architecture and mapping strategy
Unlikely to exist
Magnitude of effect
Fnct. Studies
Frequency in population
10
What determines the allelic architecture?
Evolutionary selection
11
Association Study Approaches
  • Candidate gene search
  • Limited variants or haplotypes based on prior
    knowledge expert opinion, linkage peaks
  • Genome-wide scan
  • Dense set of markers throughout genome

Family-based and population-based designs
12
Phenotype-genotype association
  • In practical terms, an observed statistical
    association between an allele and a phenotypic
    trait will be due to one of three situations
  • The finding could be due to chance or artifact,
    e.g., confounding or selection bias
  • The allele is in linkage disequilibrium with an
    allele at another locus that directly affects the
    expression of the phenotype or
  • The allele itself is functional and directly
    affects the expression of the phenotype.

13
Candidate Allele Testing
  • Test markers for association with disease
    predisposition
  • One approach perform standard single-locus
    chi-squared tests

By Alleles OR ad/bc Test ?21DF
By Genotype Test c22DF
14
LD Gene Mapping
  • General idea
  • Exploit the phenomenon of linkage disequilibrium
    (LD) between alleles of closely linked markers to
    identify genetic regions associated with disease
    status.
  • i.e., Test for LD between marker loci and
    disease allele
  • LD strength (magnitude) ? 1 / r
  • LD will be highest at areas of the genome that
    are closest to the disease locus
  • Use this to pinpoint (localize, fine-map) the
    disease gene region
  • E.g.
  • Fine-mapping
  • From linkage analysis, may have 10 cM candidate
    region. Next add dense set of markers in
    significant region and perform LD analysis to
    narrow region much further.
  • What about whole-genome approach?
  • Some have suggested genome-wide LD studies are
    feasible with densely spaced markers.

15
(No Transcript)
16
Multiple testing
  • Multiple comparisons - 30,000 genes. Even if only
    one functional locus/gene tested, very high
    number of false s
  • Solutions
  • Simulation Empirical p-values
  • Replication

17
(No Transcript)
18
Sharing identical by descent
2
1
0
Expected ratio
0.25
0.5
0.25
19
Identity by state (IBS) is not the same as
identity by descent (IBD)
20
What is a haplotype?
  • Some definitions
  • Haplotype Set of particular alleles at separate
    loci on the same transmitted chromosome
  • Linkage Disequilibrium (LD) Association between
    those particular alleles due to their proximity
    on the same chromosome (due to linkage)
  • Haplotype-based analyses provide increased
    informativity
  • Each allele (or mutation) is associated with a
    particular evolutionary history and will thus
    have a unique chromosomal background, or
    haplotype.
  • More Powerful

18/38
21
Motivation for Haplotype-based analysis
  • Advantage of combinatorial approach
  • Haplotypes important from population genetics
    standpoint
  • Increase ability to identify regions that are IBD
  • Biologically, combinations of alleles in a region
    may be functionally important, so set of variants
    on a chromosome may be the causative composite
    allele rather than a particular nucleotide at a
    particular SNP
  • Haplotype analyses can be more powerful than
    single-locus analyses when LD is exploited

22
Haplotype vs Single-locus Analyses
  • Consider a 2-locus system with a disease-bearing
    haplotype
  • A-dx-B

ORA-B 2.0
23
Haplotype Determination Options
  • Collect and genotype family members
  • ?150 effort, cost
  • Family members not available
  • Laboratory-based techniques
  • Chromosome isolation
  • Long-range PCR
  • Limited results
  • Time consuming
  • Cost-prohibitive
  • Statistical estimation
  • Sequential rules (Clark, 1990)
  • Likelihood-based E-M algorithms
  • (Hill, 1974, Long et al, '95, Hawley Kidd '95,
    Excoffier Slatkin '95 Fallin Schork, 2000)

24
Relative importance of low risk alleles
  • Population attributable fraction the proportion
    of disease that would be eliminated if the allele
    was eliminated from the population
  • For GRRlt2, alleles with frequency lt.15 have very
    little impact on disease in the population.

25
Study design
  • Targeting SNPs likely to be important in the
    population
  • For GRRlt2, alleles/haplotypes with frequency
    lt.10-.15 have little impact on disease in the
    population
  • If the goal is to develop predictive or
    diagnostic tests, such alleles are of little
    commercial interest
  • Studies can be designed to have high power for
    moderate (? 0.10) allele frequencies and GRR ? 2
    sample sizes on the order of 1000 cases and 1000
    controls are a good start

26
Association Studies Potential Causes of
Inconsistent Results
  • Population stratification Differences between
    cases and controls most often cited reason
  • Genetic heterogeneity Different genetic
    mechanisms in different populations
  • Random error False positive/false negative
    results
  • Study design/analysis problems
  • Poorly defined phenotypes
  • Failure to correct for sub-group analysis and
    multiple comparisons
  • Poor control group selection
  • Small sample sizes
  • Failure to attempt replication

27
Population stratification
  • If cases and controls have different genetic
    backgrounds (are from different genetic
    sub-populations),
  • There may be inherent gene frequency differences,
    increasing the possibility of a false positive
    (or negative) result
  • Association is due to ancestral population of
    origin rather than to linkage disequilibrium
    between the disease and marker loci

population of origin
Genetic marker
Disease
28
(No Transcript)
29
Designs for Family-based LD studies
  • Sibling controls (discordant siblings)
  • Case-parent trios
  • Nuclear families
  • Extended families

30
Study Designs used for LD mapping
  • Family-based Designs for Association Studies
  • Advantages
  • Not susceptible to confounding due to population
    substructure
  • Tests for linkage and association
  • Can test for parent-of-origin effects
  • Disadvantages
  • Inefficient recruitment, only heterozygous
    parents informative
  • Often cannot test for environmental main-effects
  • Family members often not available (eg,
    late-onset diseases)

31
TDT (transmission-disequilibrium test)
  • Basic idea of TDT
  • Disease alleles are transmitted from parents to
    offspring
  • Marker alleles in LD with these alleles will also
    be transmitted preferentially to affected
    offspring
  • Test if heterozygous parents transmit a
    particular marker allele to affected offspring
    more frequently than expected
  • Looks for excess transmission of particular
    alleles from parents to affected children
  • Controls are non-transmitted alleles
  • For each individual, have 2x2 table of 0s, 1s, or
    2s
  • Use all such tables to get a matched chi-square
    test for excess occurrence in cells b and c
    McNemars test

32
(No Transcript)
33
Population stratification
  • Random panels of SNPs can be used to test for
    population sub-structure (many new methods).
  • First studies - little empirical evidence of
    stratification in large samples from North
    America, Japan, Latin America and Europe.
  • Problems with Trios and TDT
  • Inefficient in genotyping and sampling
  • Difficult (or impossible) to collect
  • Can only really have a big problem if doing poor
    epidemiology.
  • Potential for bias has been greatly exaggerated.
  • Fear of population stratification led to
    substantial changes in study design and analytic
    methods and widespread adoption of trios design
  • Now no reason to adopt a family-based design
    solely to protect against stratification

Cardon LR, Palmer LJ. Population stratification
and spurious allelic association. The Lancet
2003361598-604.
34
Why Case/Control?
  • Advantages
  • Methodology is well-known
  • Convenient to collect
  • Common
  • Very large samples
  • More efficient recruitment than family-based
    sampling
  • Simultaneous assessment of disease allele
    frequency, penetrance, and AR
  • Unrelated controls can provide increased power
  • Limitations
  • 1. Possible Population Stratification
  • 2. Need for highly dense marker sets (capture LD)
  • Lack of phase information
  • Lack of consistency of results

These can be overcome! 1. Assessment and genomic
control of stratification 2. SNP maps 3.
Imputed haplotypes
35
Sample size requirements for case-control
analyses of SNPs (2 controls per case
detectable difference of OR ?1.5 power80).
Statistical power an increasing concern
Palmer, L. J. and W. O. C. M. Cookson (2001).
Using Single Nucleotide Polymorphisms (SNPs) as
a means to understanding the pathophysiology of
asthma. Respiratory Research 2 102-112.
36
Growing utilization of population-based designs
  • Increasingly apparent for many diseases that
    population-based studies of unrelated
    individuals, in which case-control and cohort
    studies serve as standard designs for genetic
    association analysis, may be a practical and
    powerful approach
  • Power
  • Ease and efficiency of collection
  • Cohort design longitudinal data and prospective
    assessment
  • Birth cohorts
  • Value of historical cohorts
  • e.g., Nurses Health Study 120,000 Nurses
    recruited in mid-70s 98 follow-up of living
    cohort to current day.
  • Pharmacogenetics
  • GxE studies

37
An example UK Biobank
  • Focus on binary outcomes and gene environment
    interactions
  • Large cohort 500,000 individuals
  • Age range at recruitment 45-64 yrs ? 45-69 yrs
  • Comprehensive exposure assessment at recruitment
  • Lifestyle factors, environmental exposures
  • Personal and family history of health and disease
  • Subsequent monitoring via NHS information systems
  • Power set on number of events in 10 years
  • ? 40M-60M

38
Failure to replicate association
  • Failure to replicate genetic association studies
    is a problem of genuine concern
  • But - more often involves poor study design and
    execution, in particular a lack of appreciation
    for the sample sizes required to detect modest
    genetic effects and over-interpretation of
    marginal results, than undetected population
    stratification.
  • Complex human diseases
  • Initial detection and replication will likely be
    very difficult.
  • multiple testing, laboratory/measurement error,
    and positive publication and investigator-reportin
    g biases
  • Population stratification is only one (and
    possibly amongst the least) of many possible
    reasons for non-replication of association
    results.

Cardon LR, Palmer LJ. Population stratification
and spurious allelic association. The Lancet
2003361598-604.
39
Study reproducibility risk allele frequency
  • Risk allele frequency has a much greater impact
    on power than disease prevalence for allele
    frequencies lt.2 and gt.8

40
Outstanding Genetics Issues
  • Extent of linkage disequilibrium across the
    genome
  • Age of expected mutations in populations
  • Effect of selection on specific genes
  • Allele frequency differences between populations
  • Population demographics
  • Whether or when subtle population differences are
    important factors in association studies
  • Effect of stratification on haplotypes

41
Ultimate Goal
Write a Comment
User Comments (0)
About PowerShow.com