Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression) - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Description:

Understand the issue of confounding in statistical ... for confounding you must take the confounding variable out of the picture ... Wedding Reception, 1997 (4) ... – PowerPoint PPT presentation

Number of Views:798
Avg rating:3.0/5.0
Slides: 49
Provided by: UNC52
Category:

less

Transcript and Presenter's Notes

Title: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)


1
Advanced Data AnalysisMethods to Control for
Confounding(Matching and Logistic Regression)
2
Goals
  • Understand the issue of confounding in
    statistical analysis
  • Learn how to use matching and logistic regression
    to control for confounding

3
Confounding
  • Example people in a gastrointestinal outbreak
  • Mostly members of the same dinner club BUT many
    club members also went to a city-wide food
    festival
  • Food handling practices in the dinner club might
    be blamed for the outbreak when food eaten at the
    festival was the cause
  • Membership in the dinner club could be a
    confounder of the relationship between attendance
    at the food festival and illness
  • Analyzing the data to account for both dinner
    club membership and food festival attendance
    could help determine which event was truly
    associated with the outcome

4
Confounding
  • Gastrointestinal outbreak (continued)
  • Stratification methods could be used to calculate
    the risk of illness due to the food festival for
    those in the dinner club vs. those not in the
    dinner club
  • If attending the food festival was a significant
    risk factor for illness in both groups, then the
    festival would be implicated because illness
    occurred whether or not people were members of
    the dinner club

5
Confounding
  • What if there are multiple factors that might be
    confounding the exposure-disease relationship?
  • Using our previous example, what if we had to
    stratify by membership in the dinner club and by
    health status? Or stratify by other potential
    confounders (age, occupation, income, etc.)?
  • Trying to stratify by all of these layers becomes
    difficult
  • At this point more advanced methods are needed
  • Logistic regression controls for many potential
    confounders at one time
  • Matching when incorporated correctly into the
    study design, reduces confounding before analysis
    begins

6
Confounding Confounders
  • In field epidemiology, we commonly compare two
    groups by using measures of association
  • Risk ratio (RR) in cohort studies
  • Odds ratio (OR) in case-control studies
  • May have multiple exposures significantly
    associated with disease or no exposures
    associated
  • In these cases you need to explore whether a
    confounder is present making it appear that
    exposures are associated with the disease (when
    they really are not) or making it appear that no
    association exists (when there really is one)

7
Confounders
  • A confounder is a variable that distorts the risk
    ratio or odds ratio of an exposure leading to an
    outcome
  • Confounding is a form of bias that can result in
    a distortion in the measure of association
    between an exposure and disease
  • Confounding must be eliminated for accurate
    results (1)
  • Confounding can occur in an observational
    epidemiologic study whenever two groups are
    compared to each other
  • Confounding is a mixing of effects when the
    groups are compared (exposure-disease
    relationship can be affected by factors other
    than the relationship)

8
Common Confounders
  • Common confounders include age, socioeconomic
    status and gender.
  • Examples
  • Children born later in the birth order are more
    likely to have Downs syndrome.
  • Does birth order cause Downs syndrome?
  • Norelationship is confounded by mothers age,
    older women are more likely to have children with
    Downs
  • Mothers age confounds the association between
    birth order and Downs syndrome appears there is
    an association when there is not (2)

9
Common Confounders--Examples
  • Womens use of hormone replacement therapy (HRT)
    and risk of cardiovascular disease
  • Some studies suggest an association, others do
    not
  • Women of higher socio-economic status (SES) are
    more likely to be able to afford HRT
  • Women of lower SES are at higher risk of
    cardiovascular disease
  • Differences in SES may thus confound the
    relationship between HRT and cardiovascular
    disease
  • Need to control for SES among study participants
    (3)

10
Common Confounders--Examples
  • Hypothetical outbreak of gastroenteritis at a
    restaurant
  • Study shows women were at much greater risk of
    the disease than men
  • Association is confounded by eating saladwomen
    were much more likely to order salad than men
  • Salad was contaminated with disease-causing agent
  • Relationship between gender and disease was
    confounded by salad consumption (which was the
    true cause of the outbreak)

11
Characteristics of Confounders
  • Confounders must have two key characteristics
  • A confounder must be associated with the disease
    being studied
  • A confounder must be associated with the exposure
    being studied

12
Controlling for Confounding
  • To control for confounding you must take the
    confounding variable out of the picture
  • There are 3 ways to do this
  • Restrict the analysisanalyze the
    exposure-disease relationship only among those at
    one level of the confounding variable
  • Example look at the relationship between HRT and
    cardiovascular disease ONLY among women of high
    SES
  • Stratifyanalyze the exposure-disease
    relationship separately for all levels of the
    confounding variable
  • Example look at the relationship between HRT and
    cardiovascular disease separately among women of
    high SES and low SES
  • Conduct logistic regressionregression puts all
    the variables into a mathematical model
  • Makes it easy to account for multiple confounders
    that need to be controlled

13
Controlling for ConfoundingStratification
  • Stratification can be used to separate the
    effects of exposures and confounders
  • Example tuberculosis (TB) outbreak among
    homeless men
  • Homeless shelter and soup kitchen implicated as
    the place of transmission
  • Men likely to spend time in both places
  • To determine which site is most likely, could
    examine the association between the homeless
    shelter and TB among men who did NOT go to the
    soup kitchen and among men who DID go to the soup
    kitchen

14
Stratification--Example
  • Outbreak at a reception, cookies and punch have
    both been implicated
  • Suspicion that one food item is confounding the
    other
  • Cannot tease out the effects without stratifying
    because many people consumed both cookies and
    punch

15
Stratification--Example
  • After conducting a case-control study, overall
    data show the following

Cookie Exposure
  • OR (37x29)/(21x13) 3.93 95 CI, 1.69 9.15
  • p 0.001  

16
Stratification--Example
  • Data continued..

Punch Exposure
  • OR (40x30)/(20x10) 6.00 95 CI, 2.83 12.71
  • p 0.0004

17
Stratification--Example
  • Both cookies and punch have a high odds ratio for
    illness a confidence interval that does not
    include 1
  • OR (cookies) 3.93 95 CI, 1.69 9.15, p
    0.001
  • OR (punch) 6.00 95 CI, 2.83 12.71, p
    0.0004
  • To stratify by punch exposure, we want to know
  • Among those who did not drink punch, what is the
    odds ratio for the association between cookies
    and illness?
  • Among those who did drink punch, what is the odds
    ratio for the association between cookies and
    illness?
  • If cookies are the culprit, there should be an
    association between cookies and illness,
    regardless of whether anyone drank punch

18
Stratification--Example
  • Stratification of the cookie association by punch
    exposure

Did have punch
  • OR (35x3)/(17x5) 1.3 95 CI, 0.17 7.22
  • p 1.0 

19
Stratification--Example
  • Stratification of the cookie association by punch
    exposure

Did not have punch
  • OR (2x26)/(4x8) 1.63 95 CI, 0.12 13.86
  • p 0.63 

20
Stratification--Example
  • To stratify by cookie exposure, we want to know
  • Among those who did not eat cookies, what is the
    odds ratio for the association between punch and
    illness?
  • Among those who did eat cookies, what is the odds
    ratio for the association between punch and
    illness?
  • If punch is the culprit, there should be an
    association between punch and illness, regardless
    of whether anyone ate cookies

21
Stratification--Example
  • Stratification of the punch association by cookie
    exposure

Did have cookies
  • OR (35x4)/(17x2) 4.12 95 CI, 0.52 48.47
  • p 0.18

22
Stratification--Example
  • Stratification of the punch association by cookie
    exposure

Did not have cookies
  • OR (5x26)/(3x8) 5.42 95 CI, lt 0.80 40.95
  • p 0.08

23
Stratification
  • Stratification allows us to examine two risk
    factors independently of each other
  • In our cookies and punch example we can see that
    cookies were not really a risk factor independent
    of punch (stratified ORs 1)
  • Punch remained a potential risk factor
    independent of cookies (large ORs and p-values
    close to significant)

24
More on Stratification
  • Mantel-Haenszel odds ratio
  • Method of controlling for confounding using
    stratified analysis
  • Takes an association, stratifies it by a
    potential confounder and then combines these by
    averaging them into one estimate that is
    controlled for the stratifying variable
  • Cookies and punch example
  • 2 stratum-specific estimates of the association
    between punch and illness (ORs of 4.1 and 5.4)
  • More convenient to have only one estimatecan
    average two estimates into a pooled or common
    odds ratio

25
Stratification and Effect Measure Modifiers
  • Effect measure modification
  • One stratum shows no association (OR 1) while
    another stratum does have an association
  • No confounding third variable present, rather,
    need to identify and present estimates separately
    for each level or stratum
  • Example if gender is an effect measure modifier,
    you should give 2 odds or risk ratios, 1 for men
    and 1 for women
  • You identify effect measure modification by
    stratification (same technique used to identify
    confounding) but you are looking for the measure
    of effect to be different between the 2 or more
    strata

26
Effect Measure Modifiers--Examples
  • Among the elderly, gender is an effect modifier
    of the association between nutritional intake and
    osteoporosis
  • Nutritional intake (calcium) is associated with
    osteoporosis among women
  • Among men this association is not so strong
    because mens bone mineral content is not
    affected as much by nutritional intake
  • In developing countries, sanitation is an effect
    modifier of the association between breastfeeding
    and infant mortality
  • In unsanitary conditions, breastfeeding has a
    strong effect in reducing infant mortality
  • In cleaner conditions infant mortality is not
    very different between breastfed and bottle-fed
    infants

27
Matching
  • Matching can reduce confounding
  • In case-control studies cases are matched to
    controls on desired characteristics
  • In cohort studies unexposed persons are matched
    to exposed persons on desired characteristics
  • You must account for matching when analyzing
    matched data
  • Important that the matched variables not be
    exposures of interest

28
Matching--Example
  • Hypothetical study where students in a high
    school have reported a strange smell and sudden
    illness
  • Test the association between smelling an unusual
    odor and a set of symptoms
  • Match cases and controls on gender, grade and
    hallway
  • Precedents for outbreaks of illness related to
    unusual odors in buildings, possibly psychogenic
    (ie. illness spread by panic rather than true
    cause)
  • Women are more reactive in this situation, grade
    level controls for age (different ages may react
    differently) and matching on hallway controls for
    actual odor observed (different locations may
    produce different odors)

29
Matching--Example
  • With matched case-control pairs, a 2x2 table is
    set up to examine pairs

Table 1 Analysis of matched pairs for a case
control study
  • Cells e and h are concordant cells because the
    case and the control have the same exposure
    status
  • Cells f and g are discordant because the case and
    control have a different exposure status
  • Only the discordant cells give us useful data to
    contrast the exposure between cases and controls

30
Matching--Example
  • A chi-square for matched data (McNemars
    chi-square) can be calculated using a statistical
    computing program
  • Calculation examines discordant pairs and results
    in a McNemar chi-square value and p-value
  • If the p-value lt0.05, you can conclude that there
    is a statistically significant difference in
    exposure between cases and controls

31
Matching--Example
  • A table of discordant pairs can also be used to
    calculate a measure of association

Table 2 Sample data for sudden illness in a high
school. Controls matched to cases on gender,
grade, and hallway in the school
32
Matching--Example
  • Calculating the odds ratio
  • OR ( pairs with exposed cases and
    unexposed cases)
  • ( pairs with unexposed cases and exposed
    controls)
  • f / g 12/4 3.0
  • Interpretation
  • The odds of having a sudden onset of nausea,
    vomiting, or fainting if students smelled an
    unusual odor in the school were 3.0 times the
    odds of having a sudden onset of these symptoms
    if students did not smell an unusual odor in the
    school, controlling for gender, grade, and
    location in the school.

33
Matching
  • An important note about matching
  • Once you have matched on a variable, you cannot
    use that variable as a risk factor in your
    analysis
  • Cases and controls will have the exact same
    matched variables so they are useless as risk
    factors
  • Do not match on any variable you suspect might be
    a risk factor

34
An Introduction to Logistic Regression
  • Logistic regression is a mathematical process
    that results in an odds ratio
  • Logistic regression can control for numerous
    confounders
  • The odds ratio produced by logistic regression is
    known as the adjusted odds ratio because its
    value has been adjusted for the confounders

35
An Introduction to Logistic Regression
  • Outcome variable (sick or not sick) and exposure
    variable (exposed or not exposed) must both be
    dichotomous
  • Other variables (the confounders) can be
    dichotomous, categorical, or continuous

36
An Introduction to Logistic Regression
  • Logistic regression uses an equation called a
    logit function to calculate the odds ratio
  • Using our earlier punch and cookies example, we
    suspect one of these food items is confounding
    the other
  • Variables would be
  • SICK (value is 1 if ill, 0 if not ill)
  • PUNCH (1 if drank punch, 0 if did not drink
    punch)
  • COOKIES (1 if ate cookies, 0 if did not eat
    cookies)

37
Logistic Regression--Example
  • General equation is
  • Logit (OUTCOME) EXPOSURE CONFOUNDER1
    CONFOUNDER2 CONFOUNDER3 (etc)
  • For our example
  • Outcome variable SICK
  • Exposure variable PUNCH
  • Confounder variable COOKIES
  • Equation is Logit (SICK) PUNCH COOKIES

38
Logistic Regression--Example
  • Computer uses the math behind logistic regression
    to give the results as odds ratios
  • Each variable on the right side will have its own
    odds ratio
  • Odds ratio for PUNCH would be the odds of
    becoming ill if punch was consumed compared to
    the odds of becoming ill if punch was not
    consumed, controlling for COOKIES
  • Odds ratio for COOKIES is the odds of becoming
    ill if cookies were consumed compared to the odds
    of becoming ill if cookies were not consumed,
    controlling for PUNCH

39
Logistic Regression Important Points
  • Each variable on the right side of the equation
    is controlling for all the other variables on the
    right side of the equation
  • If you are not sure whether one of several
    variables is a confounder, you can examine them
    all at the same time
  • Two important warnings
  • Do not put too many variables in the equation (a
    loose rule of thumb is you can add one variable
    for every 25 observations)
  • You cannot control for confounders you did not
    measure (Example if a childs attendance at a
    particular daycare was a confounder of the
    SICK-PUNCH relationship, but you do not have data
    on childrens daycare attendance, you cannot
    control for it.)

40
Logistic Regression Matching
  • Logistic regression can also account for matching
    in the data analysis
  • Known as conditional logistic regression
  • Computer calculates odds ratios similar to
    McNemars test but the results are conditioned
    on the matching variables
  • Can be done using Epi Info
  • Interpretation of matched odds ratios (MORs)
    using conditional logistic regression is the same
    as interpretation of MORs calculated from tables

41
Logistic Regression
  • For many investigations you may not need to use
    logistic regression
  • Logistic regression is helpful in managing
    confounding variables, useful with large datasets
    and in studies designed to establish risk factors
    for chronic conditions, cancer cluster
    investigations or other situations with numerous
    confounding factors
  • Many software packages can simplify data analysis
    using logistic regression
  • SAS, SPSS, STATA and Epi Info are a few examples

42
Logistic Regression Software Packages
  • Common software packages used for data analysis,
    including logistic regression
  • SAS Cary, NC http//www.sas.com/index.html
  • SPSS Chicago, IL http//www.spss.com/
  • STATA College Station, TX http//www.stata.com
  • Epi Info Atlanta, GA http//www.cdc.gov/EpiInfo
    /
  • Episheet Boston, MA http//members.aol.com/kro
    thman/modepi.htm
  • (Episheet cannot do logistic regression but is
    useful for simpler analyses, e.g., 2x2 tables and
    stratified analyses.)
  • This is not a comprehensive list, and UNC does
    not specifically
  • endorse any particular software package.

43
Logistic Regression--Examples
  • Wedding Reception, 1997 (4)
  • Guests complained of a diarrheal illness
    diagnosed as cyclosporiasis
  • Univariate analysis (using 2x2 tables) showed
    eating raspberries was the exposure most strongly
    associated with risk for illness
  • Multivariate logistic regression showed same
    results
  • Investigators determined raspberries had not been
    washed

44
Logistic Regression--Examples
  • Assessing the relationship between obesity and
    concern about food security (5)
  • Washington State Dept. of Health analyzed data
    from the 1995-99 Behavioral Risk Factor
    Surveillance System
  • A variable indicating concern about food security
    was analyzed using a logistic regression model
    with income and education as potential
    confounders
  • Persons who reported being concerned about food
    security were more likely to be obese than those
    who did not report such concerns (adjusted OR
    1.29, 95 CI 1.04-1.83)

45
Matching Conditional Logistic
Regression--Examples
  • Foodborne Salmonella Newport outbreak, 2002 (6)
  • Affected 47 people from 5 different states
  • Case-control study carried out, controls matched
    by age-group
  • Logistic regression conducted to control for
    confounders
  • Cases were more likely than controls to have
    eaten ground beef (MOR 2.3, 95 CI 0.9-5.7)
    and more likely to have eaten raw or undercooked
    ground beef (MOR 50.9, 95 CI 5.3-489.0)
  • No specific contamination event identified but
    public health alert issued to remind consumers
    about safe food-handling practices

46
Matching Conditional Logistic
Regression--Examples
  • Outbreak of typhoid fever in Tajikistan, 1996-97
    (7)
  • 10,000 people affected in outbreak, case-control
    study conducted
  • Cases were culture positive for the organism
    (Salmonella serotype Typhi)
  • Using 2x2 tables, illness was associated with
  • Drinking unboiled water in the 30 days before
    onset (MOR 6.5, 95 CI 3.0-24.0)
  • Using drinking water from a tap outside the home
    (MOR 9.1, 95 CI 1.6-82.0)
  • Eating food from a street vendor (MOR 2.9, 95
    CI 1.4-7.2)
  • When all variables were included in conditional
    logistic regression, only drinking unboiled water
    (MOR 9.6, 95 CI 2.7-334.0) and obtaining
    water from an outside tap (MOR 16.7, 95 CI
    2.0-138.0) were significantly associated with
    illness
  • Routinely boiling drinking water was protective
    (MOR 0.2, 95 CI 0.05-0.5)

47
Conclusion
  • Controlling for confounding can be done using
    matched study design and logistic regression
  • While complicated, with practice these methods
    can be as easy to use as 2x2 tables

48
References
  • 1. Gregg MB. Field Epidemiology. 2nd ed. New
    York, NY Oxford University Press 2002.
  • 2. Hecht CA, Hook EB. Rates of Down syndrome at
    livebirth by one-year maternal age intervals in
    studies with apparent close to complete
    ascertainment in populations of European origin
    a proposed revised rate schedule for use in
    genetic and prenatal screening. Am J Med Genet.
    199662376-385.
  • 3. Humphrey LL, Nelson HD, Chan BKS, Nygren P,
    Allan J, Teutsch S. Relationship between hormone
    replacement therapy, socioeconomic status, and
    coronary heart disease. JAMA. 200328945. 
  • 4. Centers for Disease Control and Prevention.
    Update Outbreaks of Cyclosporiasis -- United
    States, 1997. MMWR Morb Mort Wkly Rep.
    199746461-462. Available at http//www.cdc.gov/
    mmwr/PDF/ wk/mm4621.pdf. Accessed December 12,
    2006.
  • 5. Centers for Disease Control and Prevention.
    Self-reported concern about food security
    associated with obesity --- Washington,
    19951999. MMWR Morb Mort Wkly Rep.
    200352840-842. Available at http//www.cdc.gov/
    mmwr/preview/mmwrhtml/mm5235a3.htm. Accessed
    December 12, 2006.
Write a Comment
User Comments (0)
About PowerShow.com