Title: 2004 Public Health Training and Information Network (PHTIN) Series
12004 Public Health Training and Information
Network (PHTIN) Series
2Site Sign-in Sheet
- Please mail or fax your sites sign-in sheet to
- Linda White
- NC Office of Public Health Preparedness
- and Response
- Cooper Building
- 1902 Mail Service Center
- Raleigh, NC 27699
- FAX (919) 715 - 2246
3Outbreak Investigation Methods
4 52004 PHTIN Training Development Team
- Pia MacDonald, PhD, MPH - Director, NCCPHP
- Jennifer Horney, MPH - Director, Training and
Education, NCCPHP - Anjum Hajat, MPH Epidemiologist, NCCPHP
- Penny Padgett, PhD, MPH
- Amy Nelson, PhD - Consultant
- Sarah Pfau, MPH - Consultant
- Amy Sayle, PhD, MPH - Consultant
- Michelle Torok, MPH - Doctoral student
- Drew Voetsch, MPH - Doctoral Candidate
- Aaron Wendelboe, MSPH - Doctoral student
6Upcoming PHTIN Sessions
- November 9th. . . Techniques for Review of
Surveillance Data - December 14th. . . Risk Communication
- 1000 am - 1200 pm
- (with time for discussion)
7Session I VI Slides
- After the airing of each session, NCCPHP will
post PHTIN Outbreak Investigation Methods series
slides on the following two web sites - NCCPHP Training web site
- http//www.sph.unc.edu/nccphp/phtin/index.htm
- North Carolina Division of Public Health, Office
of Public Health Preparedness and Response - http//www.epi.state.nc.us/epi/phpr/
8Session V
9Todays Presenters
- Michelle Torok, MPH
- Graduate Research Assistant and Doctoral Student,
NCCPHP - Sarah Pfau, MPH
- Consultant, NCCPHP
10Analyzing Data Learning Objectives
- Upon completion of this session, you will
- Understand what an analytic study contributes to
an epidemiological outbreak investigation - Understand the importance of data cleaning as a
part of analysis planning
11Analyzing Data Learning Objectives
- Know why and how to generate descriptive
statistics to assess trends in your data - Know how to generate and interpret epi curves to
assess trends in your outbreak data - Understand how to interpret measures of central
tendency
12Analyzing Data Learning Objectives (contd.)
- Know why and how to generate measures of
association for a cohort or case-control study - Understand how to interpret measures of
association (risk ratios, odds ratios) and
corresponding confidence intervals - Know how to generate and interpret selected
descriptive and analytic statistics in Epi Info
software
13Analyzing Data
14Analyzing Data Session Overview
- Analysis planning
- Descriptive epidemiology
- Epi curves
- Spot maps
- Measures of central tendency
- Attack rates
- Analytic epidemiology
- Measures of association
- Case study analysis using Epi Info software
15Analysis Planning
16Analysis Planning
- Regardless of the data analysis software program
you use, you will have access to numerous data
manipulation and analysis commands - However, you need to understand the function of
each command to determine when and why to use one
17Analysis Planning
- Several factors influenceand sometimes
limityour approach to data analysis - Your research question
- Which variables will function as exposure and
outcome - Which study design you use
- How you select your sample population
- How you collect and code information obtained
from study participants -
18Analysis Planning
- Analysis planning can
- Be an invaluable investment of time
- Help you select the most appropriate
epidemiologic methods - Help assure that the work leading up to analysis
yields a database structure and content that your
preferred analysis software needs to successfully
run analysis programs
19Analysis Planning
- Three key considerations as you plan your
analysis - Work backwards from the research question(s) to
design the most efficient data collection
instrument - Study design will determine which statistical
tests and measures of association you evaluate in
the analysis output - Consider the need to present, graph, or map data
20Analysis Planning
- Work backwards from the research question(s) to
design the most efficient data collection
instrument - Develop a sound data collection instrument
- Collect pieces of information that can be
counted, sorted, and recoded or stratified - Analysis phase is not the time to realize that
you should have asked questions differently!
21Analysis Planning
- Study design will determine which statistical
tools you will use. - Use risk ratio (RR) with cohort studies and odds
ratio (OR) with case-control studies need to
know which to evaluate, because both are
generated simultaneously in Epi Info and SAS - Some sampling methods (e.g., matching in
case-controls studies) require special types of
analysis
22Analysis Planning
- Consider the need to present, graph, or map data
- Even if you collect continuous data, you may
later categorize it so you can generate a bar
graph and assess frequency distributions - If you plan to map data, you may need X-and
Y-coordinate or denominator data
23Basic Steps of an Outbreak Investigation
- Verify the diagnosis and confirm the outbreak
- Define a case and conduct case finding
- Tabulate and orient data time, place, person
- Take immediate control measures
- Formulate and test hypotheses
- Plan and execute additional studies
- Implement and evaluate control measures
- Communicate findings
24Descriptive Epidemiology
25Step 3 Tabulate and orient data time, place,
person
- Descriptive epidemiology
- Familiarizes the investigator with the data
- Comprehensively describes the outbreak
- Is essential for hypothesis generation (step 5)
26Data Cleaning
- Check for accuracy
- Outliers
- Check for completeness
- Missing values
- Determine whether or not to create or collapse
data categories - Get to know the basic descriptive findings
27Data CleaningOutliers
- Outliers can be cases at the very beginning and
end that may not appear to be related - First check to make certain they are not due to a
collection, coding or data entry error - If they are not an error, they may represent
- Baseline level of illness
- Outbreak source
- A case exposed earlier than the others
- An unrelated case
- A case exposed later than the others
- A case with a long incubation period
28Data CleaningDistribution of Variables
Outlier
29Data CleaningMissing Values
- The investigator can check into missing values
that are expected versus those that are due to
problems in data collection or entry - The number of missing values for each variable
can also be learned from frequency distributions
30Data CleaningFrequency Distributions
31Data CleaningData Categories
- Which variables are continuous versus
categorical? - Collapse existing categories into fewer?
- Create categories from continuous? (e.g., age)
32Descriptive Epidemiology
- Comprehensively describes the outbreak
- Time
- Place
- Person
33Descriptive Epidemiology
34Descriptive Epidemiology Time
- Time
- Display time trends
- Epidemic curves
35Descriptive Epidemiology Time
36Descriptive EpidemiologyTime
- What is an epidemic curve and how can it help in
an outbreak? - An epidemic curve (epi curve) is a graphical
depiction of the number of cases of illness by
the date of illness onset
37Descriptive EpidemiologyTime
- An epi curve can provide information on the
following characteristics of an outbreak - Pattern of spread
- Magnitude
- Outliers
- Time trend
- Exposure and / or disease incubation period
38Epidemic Curves
39Epidemic Curves
- The overall shape of the epi curve can reveal the
type of outbreak - Common source
- Intermittent
- Continuous
- Point source
- Propagated
40Epidemic CurvesCommon Source
- People are exposed to a common harmful source
- Period of exposure may be brief (point source),
long (continuous) or intermittent
41Epi Curve Common Source Outbreak with
Intermittent Exposure
42Epi Curve Common Source Outbreak with
Continuous Exposure
43Epi Curve Point Source Outbreak
44Epi Curve Propagated Outbreak
45Epidemic Curves
46Epidemic Curves
47Epidemic Curves
48Epidemic Curves
- Provide information about the time trend of the
outbreak - Consider
- Date of illness onset for the first case
- Date when the outbreak peaked
- Date of illness onset for the last case
49Epidemic Curves
50Epidemic Curves
- Period of Exposure / Incubation Period
51Epidemic Curves
- If the timing of the exposure is known, epi
curves can be used to estimate the incubation
period of the disease - The time between the exposure and the peak of the
epi curve represents the median incubation period
52Epidemic Curves
- In common source outbreaks with known incubation
periods, epi curves can help determine the
average period of exposure - Find the average incubation period for the
organism and count backwards from the peak case
on the epi curve
53Epidemic Curves
- This can also be done to find the minimum
incubation period - Find the minimum incubation period for the
organism and count backwards from the earliest
case on the epi curve
54Exposure / Outbreak Incubation Period
- Average and minimum incubation periods should be
close and should represent the probable period of
exposure - Widen the estimated exposure period by 10 to 20
55Calculating Incubation Period
Onset of illness among cases of E. coli O157H7
Infection, Massachusetts, December, 1998.
56Epidemic Curves
- Creating an Epidemic Curve
57Creating an Epidemic Curve
- Provide a descriptive title
- Label each axis
- Plot the number of cases of disease reported
during an outbreak on the y-axis - Plot the time or date of illness onset on the
x-axis - Include the pre-epidemic period to show the
baseline number of cases
58Epi Curve for a Common Source Outbreak with
Continuous Exposure
Y- Axis
X - Axis
59Creating an Epidemic Curve
- X-axis considerations
- Choice of time unit for x-axis depends upon the
incubation period - Begin with a unit approximately one quarter the
length of the incubation period - Example
- 1. Mean incubation period for influenza 36
hours - 2. 36 x ¼ 9
- 3. Use 9-hour intervals on the x-axis for an
outbreak of influenza lasting several days
60Creating an Epidemic Curve
- X-axis considerations
- If the incubation period is not known, graph
several epi curves with different time units - Usually the day of illness onset is the best unit
for the x-axis
61Epi Curve X-Axis Considerations
X-axis unit of time 1 week
X-axis unit of time 1 day
62Descriptive Epidemiology
63Descriptive Epidemiology Place
- Provides information on the geographic boundaries
of the outbreak - May highlight outbreak patterns
64Descriptive Epidemiology Place
- Spot map
- Shows where cases live, work, spend time
- If population size varies between locations being
compared, use location-specific attack rates
instead of number of cases
65Descriptive Epidemiology Place
Source http//www.phppo.cdc.gov/PHTN/catalog/pdf-
file/LESSON4.pdf
66Descriptive Epidemiology
67Descriptive Epidemiology Person
- Data summarization for descriptive epidemiology
of the population - Line listings
- Graphs
- Bar graphs
- Histograms
68Line Listing
69Bar Graph
70Histogram
Epidemic Curve for Outbreak of Gastrointestinal
Illness in a Nursing Home, 2002
715 minute break
72Descriptive Epidemiology
- Measures of Central Tendency
73Descriptive Epidemiology
- Measures of central tendency
- Mean
- Median
- Mode
- Range
74Measures of Central Tendency
- Mean (Average)
- The sum of all values divided by the number of
values - Example
- Cases 7,10, 8, 5, 5, 37, 9 years old
- Mean (710855379)/7
- Mean 11.6 years of age
75Measures of Central Tendency
- Median (50th percentile)
- The value that falls in the middle position when
the measurements are ordered from smallest to
largest - Example
- Ages 7,10, 8, 5, 5, 37, 9
- Ages sorted 5, 5, 7, 8, 9,10, 37
- Median age 8
76Calculate a Median Value
- If the number of measurements is odd
- Median value with rank (n1) / 2
- 5, 5, 7, 8, 9,10, 37
- n 7, (n1) / 2 (71) / 2 4
- The 4th value 8
- Where n the number of values
77Calculate a Median Value
- If the number of measurements is even
- Medianaverage of the two values with
- rank of n / 2 and
- (n / 2) 1
- Where n the number of values
- 5, 5, 7, 8, 9,10, 37
- n 7 (7 / 2) 3.5. So 8 is the first value
- (7 / 2) 1 4.5, so 9 is the second value
- (8 9) / 2 8.5
- The Median value 8.5
78Measures of Central Tendency
- Mode Modal Value
- The value that occurs the most frequently
- Example 5, 5, 7, 8, 9,10, 37
- Mode 5
- It is possible to have more than one mode
- Example 5, 5, 7,8,10,10, 37
- Modes 5 and 10
79Measures of Central Tendency
- Mode Modal Value
- The value for the variable in which the greatest
frequency of records fall - Epi Info limitation
- If multiple values share the same frequency that
is also the highest frequency, Epi Info will
identify only the first value it encounters as
Mode as it scans the table in ascending order
80Measures of Central Tendency Mode Software
Limitation
Modal Values
The ages 11, 17, 35, and 62 all qualify for the
status of mode, but Epi Info identifies Age 11
as the mode in analysis output for MEANS AGE in
viewOswego.
81Measures of Central Tendency
50th percentile
3
77
11
36.8
36.0
Min
Max
Mode
Median
Mean (average)
82ActivityCalculate Mean and Median
- Completion time 5 minutes
83Calculate Mean and Median Age
- For an even number of measurements,
- Median the average of two values ranked
- N / 2
- (n / 2) 1
84Calculate Mean and Median Age
- Mean age
- 59768540
- 40 / 6 6.67 years
- Median age
- 5,5,6,7,8,9
- Average of values ranked (n/2) and (n/2)1
- (6/2) and (6/2) 1 average of 6 and 7
- (67) / 2 6.5 years
85Attack Rates
86Attack Rates (AR)
- AR
- of cases of a disease
- of people at risk (for a limited period of
time) - Food-specific AR
- people who ate a food and became ill
- people who ate that food
87Food-Specific Attack Rates
CDC. Outbreak of foodborne streptococcal disease.
MMWR 23365, 1974.
88Stratified Attack Rates
Attack rate in women 13 / 29 45 Attack rate
in men 5 / 32 16
89Question Answer Opportunity
90Hypothesis Generation vs. Hypothesis Testing
91Hypothesis Generation vs. Hypothesis Testing
- Step 5a. Formulate hypotheses
- Occurs after having spoken with some case
patients and public health officials - Based on information form literature review
- Based on descriptive epidemiology (step 3)
- Step 5b. Test hypotheses
- Occurs after hypotheses have been generated
- Based on analytic epidemiology
92(No Transcript)
935 minute break
94Analytic Epidemiology
95Analytic Epidemiology
- Measures of Association
- Risk Ratio (cohort study)
- Odds Ratio (case-control study)
96Cohort versus Case-Control Study
97Cohort versus Case-Control Study
98Cohort Study
99Risk Ratio
100Risk Ratio
101Risk Ratio Example
RR (43 / 54) / (3 / 21) 5.6
102Interpreting a Risk Ratio
-
- RR1.0 no association between exposure and
disease - RRgt1.0 positive association
- RRlt1.0 negative association
103Case-Control Study
104Odds Ratio
105Odds Ratio
106Odds Ratio Example
OR (60 / 18) / (25 / 55) 7.3
107Interpreting an Odds Ratio
- The odds ratio is interpreted in the same way as
a risk ratio - OR1.0 no association between exposure and
disease - ORgt1.0 positive association
- ORlt1.0 negative association
108What to do with a Zero Cell
- Try to recruit more study participants
- Add 1 to each cell
- Remember to document / report this!
109Confidence Intervals
110Confidence Intervals
- Allow the investigator to
- Evaluate statistical significance
- Assess the precision of the estimate (the odds
ratio or risk ratio) - Consist of a lower bound and an upper bound
- Example RR1.9, 95 CI 1.1-3.1
111Confidence Intervals
- Provide information on precision of estimate
- Narrow confidence intervals more precise
- Wide confidence intervals less precise
- Example OR10, 95 CI 0.9 - 44.0
- Example OR10, 95 CI 9.0 - 11.0
112Analysis Output
113Step 6 Plan and Execute Additional Studies
- To gather more specific info
- Example Salmonella muenchen
- Interventional study
- Example implement intensive hand-washing
114Question Answer Opportunity
115Epi Info Analysis
- Case Study
- Download Epi Info software for free at
http//www.cdc.gov/epiinfo
116Case Study Overview
- Oswego County, New York 1940
- 80 people attended a church supper on 4/18
- 46 people who attended the supper suffered from
gastrointestinal illness beginning 4/18 and
ending 4/19 - 75 people (ill and non-ill) interviewed
- Investigation focus church supper as source of
infection
117Church Supper Menu
118Case Study
119Case Study
- Investigators needed to determine
- The type of outbreak occurring
- The pathogen causing the acute gastrointestinal
illness and - The source of infection
120Data Cleaning
- Know your data! Know the
- Number of records
- Field formats and contents
- Special properties
- Table relationships
121Data Cleaning
Tell Epi Info which records to include in
analyses
122Case Study Line Listing
- Organize and review data about time, person, and
place that were collected via hypothesis
generating interviews.
123Epi Info Demonstration
- Display Variables
- Line Listing
- Means
124Case Study Line Listing
DO try this at home!
125Case Study Means
126Case Study Frequency Distributions
127Epi Info Demonstration
- Frequency Table
- Recode data
- Graph data
128Frequency by Gender
129Frequency by AGE Category
130AGE Distribution among Cases
131Case StudyEpidemic Curve
- Variable of Interest
- DATEONSET (date of onset of illness)
- Entered into database in mm/dd/yyyy/hh/mm/ss/AM
PM field format
132Case Study Epidemic Curve
133Point-Source Outbreak
Textbook distribution
Case Study distribution
134Case Study Epidemic Curve
Average incubation period
Maximum incubation period
Overlap
135Using Epi Info to Create Epi Curves
- To create an epi curve with Epi Info
- Open the Analyze data component
- Use the Read command to use the outbreak data
- Click on the Graph command
- Choose Histogram as the Graph Type
- Choose date / time of illness onset variable as
the x- axis main variable
136Using Epi Info to Create Epi Curves
- To create an epi curve with Epi Info
- Choose count from the Show value of option
beneath the y-axis option - Choose weeks, days, hours, or minutes for the
x-axis interval from the interval dropdown menu - Type in graph title where it says Page title
- Click OK
137Determine Incubation Period
- Create a temporary variable called Incubation
in Analyze Data - INCUBATION DATEONSET TIMESUPPER
- Where field format is identical
- Date / time mm/dd/yyyy/hh/mm/ss/AM PM
138Mean Incubation
139Calculate Mean Incubationin Epi Info
140Identify the Pathogen. . .
141Potential Enteric Agents
142Pathogen IdentificationResource
- CDCs Foodborne Outbreak Response and
Surveillance Unit - Guide to Confirming the Diagnosis in Foodborne
Diseases - http//www.cdc.gov/foodborneoutbreaks/guide_fd.htm
143Verify the Diagnosis Find Plausible Agents
- Evaluate
- predominant signs and symptoms
- incubation period
- duration of symptoms
- suspected food
- laboratory testing of stool, blood, or vomitus
144Case StudyAttack Rates
- Obtain the information that you need to
calculate food-specific attack rates via - Stratified Frequency Tables
- Line Listings
- Food-specific AR
- people who ate a food and became ill
- people who ate that food
145Stratified Frequency Tables
40 people ate cake 27 people who ate cake are
ill.
AR for people who consumed cake 27 / 40 67.5
35 people did not eat cake 19 of those people
are ill.
AR for people who did not consume cake 19 / 35
54.2
146Line Listings
13 27 people ate cakes
27 people who ate cake are ill
AR for people who Consumed cake 27 / 40 67.5
147Case Study Attack Rates
148Generate and Testa Hypothesis!
149Generate and Test a Hypothesis!
- The epi curve is indicative of a Point-Source
outbreak - Based on the incubation period, we suspect
Staphylococcus aureus as the pathogen - The food-specific attack rates lead us to believe
that vanilla ice cream may be the source of
infection
150Case Study
151Case Study
152Epi Info Demonstration
Tables command
153Tables Analysis Output
Epi Info 2 x 2 Table
2 x 2 Table Shell
154Tables Analysis Output
The risk of becoming ill was more than five
times greater for people who consumed vanilla ice
cream than for people who did not consume
vanilla ice cream.
155Case StudyAnalytic Results
- - Point-Source Outbreak
- - Staphylococcus aureus is suspected pathogen
based on 4.3 hour average incubation period - - Vanilla ice cream as suspected source of
infection (highest food-specific AR of 80) - - Vanilla ice cream RR 5.6
- - Vanilla ice cream C.I. 1.9 16.0
156Question AnswerOpportunity
157Session V Summary
- Analysis planning can be an invaluable
investment of time help you select the most
appropriate epidemiologic methods and help
assure that the work leading up to analysis
yields a database structure and content that your
preferred analysis software needs to successfully
run analysis programs. - As you plan your analysis 1) Work backwards
from the research question(s) to design the most
efficient data collection instrument 2) Consider
your study design to guide which statistical
tests and measures of association you evaluate in
the analysis output and 3) Consider the need to
present, graph, or map data
158Session V Summary
- Descriptive epidemiology 1) Familiarizes the
investigator with data about time, place, and
person 2) Comprehensively describes the
outbreak and 3) Is essential for hypothesis
generation. - Data cleaning is the first step in preparing to
generate descriptive statistics, as it
contributes to the accuracy and completeness of
the data. - Measures of central tendency provide a means of
assessing the distribution of data. Measures
include mean, median, mode, and range. - Epi curves, spot maps, and line listings are all
ways in which you can generate and review the
time, place, and person elements respectively
of descriptive statistics. -
-
-
159Session V Summary
- Attack rates are descriptive statistics that are
useful for comparing the risk of disease in
groups with different exposures (e.g.,
consumption of individual food items). - Analytic epidemiology allows you to test the
hypotheses generated via review of descriptive
statistics and the medical literature. - The measures of association for case control and
cohort analytic studies, respectively, are odds
ratios and risk ratios. - Confidence intervals that accompany measures of
association evaluate the statistical significance
of the measures and assess the precision of the
estimates.
160Next Session November 9th1000 a.m. - Noon
- Topic Techniques for Review of Surveillance
Data
161Session V Slides
- Following this program, please visit one of the
web sites below to access and download a copy of
todays slides - NCCPHP Training web site
- http//www.sph.unc.edu/nccphp/phtin/index.htm
- North Carolina Division of Public Health, Office
of Public Health Preparedness and Response - http//www.epi.state.nc.us/epi/phpr/
162Site Sign-in Sheet
- Please mail or fax your sites sign-in sheet to
- Linda White
- NC Office of Public Health Preparedness
- and Response
- Cooper Building
- 1902 Mail Service Center
- Raleigh, NC 27699
- FAX (919) 715 - 2246
163References and Resources
- Centers for Disease Control and Prevention
(1992). Principles of Epidemiology, 2nd ed.
Atlanta, GA Public Health Practice Program
Office. - Division of Public Health Surveillance and
Informatics, Epidemiology Program Office, Centers
for Disease Control and Prevention (January
2003). Epi Info Support Manual. included with
installation of the software, which can be found
at http//www.cdc.gov/epiinfo/index.htm - Gordis L. (1996). Epidemiology. Philadelphia, WB
Saunders.
164References and Resources
- Rothman KJ. Epidemiology An Introduction. New
York, Oxford University Press, 2002. - Stehr-Green, J. and Stehr-Green, P. (2004).
Hypothesis Generating Interviews. Module 3 of a
Field Epidemiology Methods course being developed
in the NC Center for Public Health Preparedness,
UNC Chapel Hill. - Torok, M. (2004). FOCUS on Field Epidemiology.
Epidemic Curves. Volume 1, Issue 5. NC Center
for Public Health Preparedness