Title: The DIMACS Working Group on Disease and Adverse Event Surveillance
1The DIMACS Working Group on Disease and Adverse
Event Surveillance
- Henry Rolka and David Madigan
2Background
- WG Objective Bring together researchers in
adverse event monitoring and disease surveillance - Part of a 5-year special focus on computational
and mathematical epidemiology - 50 WG members epidemiologists, public health
professionals, biostatisticians, etc. - Focus on analytic/statistical methods
- Two WG meetings plus week-long tutorial (02-03)
- Coordinated closely with National Syndromic
Surveillance Conferences
3Areas of Common Interest
4Representation
- Carnegie-Mellon University
- FDA
- Quintiles Inc.
- CDC
- Rutgers University
- Emergint, Inc.
- ATT Labs
- NJ State
- NYC Dept. of Health
- University of Pennsylvania
- Aventis
- ATSDR
- University of Connecticut
- Los Alamos National Lab
- Lincoln Technologies
- SAS Institute
5Background, cont.
- WG conceived before September 11, 2001
- Surveillance landscape has changed drastically
- Major public health effort directed at
bioterrorism detection - Proliferation of novel surveillance projects in
response to national threat - Good for detecting outbreaks of various kinds
6New Data Types for Public Health Surveillance
- Managed care patient encounter data
- Pre-diagnostic/chief complaint (text data)
- Over-the-counter sales transactions
- Drug store
- Grocery store
- 911-emergency calls
- Ambulance dispatch data
- Absenteeism data
- ED discharge summaries
- Prescription/pharmaceuticals
- Adverse event reports
7New Analytic Methods and Approaches
- Spatial-temporal scan statistics
- Statistical process control (SPC)
- Bayesian applications
- Market-basket association analysis
- Text mining
- Rule-based surveillance
- Change-point techniques
8ANALYTIC METHODS IN USE
- Scan statistics (e.g., Kulldorffs SaTScan)
- Statistical process control (e.g., Hutwagners
EARS) - Association rule mining (e.g., Moores WSARE)
- Bayesian shrinkage (e.g., DuMouchels MGPS)
- Generalized linear mixed models (e.g.,
Kleinman) - Sequential probability ratio tests (e.g.,
Spiegelhalter, Evans)
9SCAN STATISTICS
- Martin Kulldorffs SaTScan - Spatial and
Space-Time Scan Statistics - software. - e.g., spatial scan using Poisson model
computes likelihood of all possible circles
compared with likelihood under the null
distribution - Picks the circle with the biggest likelihood
ratio - P-value computed via Monte Carlo
- Big literature on disease clustering Besag
Newell, Diggle, Moran test, Turnbulls method,
Cuzick Edwards, etc. - Need methodology for multiple sources
10Farzad Mostashari
11BAYESIAN SHRINKAGE ESTIMATION
- DuMouchels GPS/MGPS
- Compares observed counts of market baskets
to expected counts under some (simple) model.
For example, saw 30 cases in the ER today with
G.I. syndrome AND fever AND work in Newark
compared with an expectation of 3 cases - 30-to-3 is more convincing than 3-to-0.3 but
less convincing that 300-to-30. Idea shrink the
smaller ones towards one.
12GPS SHRINKAGE AERS DATA
number of reports
13BAYESIAN SHRINKAGE ESTIMATION
- Issues
- Appropriate amount of shrinkage?
- Where do the expected values come from?
- Temporal dimension?
- Covariate information
- Simpsons paradox (innocent bystander)
-
14SEQUENTIAL PROBABILITY RATIO TESTS
- Classical much-studied statistical method
dating back to Wald (1948)
15NATURAL LANGUAGE
- Important sources of health data begin life as
free text chief complaints (ED visits, primary
care encounters, adverse event reports, e-mail,
etc.) - Approximately 5 minutes after receiving flu and
pneumonia vaccine pt began hollering, "Oh, Oh my
neck is hurting. Feels like a knot in my throat,
a medicine taste." Complained of chest pain
moving to back and leg numbness. - Some (successful) work on automated coding of
free text. - Little work on direct surveillance of text
data
16CONCLUSION
- Analytic methods for surveillance have a long
history in Statistics but currently attract
substantial new interest from researchers in both
CS and Statistics - Urgently need new methods for multivariate,
multi-data type streams - Data availability a bottleneck simulation
non-trivial. - DARPA currently staging a competition
17THE IDEA OF A COMPETITION
Thesis Rapid growth in the number of deployed
health surveillance systems and increasing
complexity require new analytic methodologies
Goal Stimulate mainstream Computer Science and
Statistics researchers to focus on this area
How A signal detection competition Examples
the Message Understanding Conferences (MUC), Text
Retrieval Conferences (TREC), KDD Cup, M3 Time
Series competition
18COMPETITION STATUS
- DIMACS Working Group on Adverse Event and Disease
Reporting, Surveillance, Analysis - Subgroup focused on competition applied for
funding identified data sources - Key challenge appropriate methods for inserting
signals into real data (spiking) - Other groups face the same challenge (e.g.
BioStorm)
19ANALYTIC METHODS IN USE
- Scan statistics (e.g., Kulldorffs SaTScan)
- Statistical process control (e.g., Hutwagners
EARS) - Association rule mining (e.g., Moores WSARE)
- Bayesian shrinkage (e.g., DuMouchels MGPS)
- Generalized linear mixed models (e.g.,
Kleinman) - Sequential probability ratio tests (e.g.,
Spiegelhalter, Evans)
20SCAN STATISTICS
- Martin Kulldorffs SaTScan - Spatial and
Space-Time Scan Statistics - software. - e.g., spatial scan using Poisson model
computes a likelihood ratio for all possible
circles comparing event counts inside and outside - Picks the circle with the biggest likelihood
ratio - P-value computed via Monte Carlo
- Big literature on disease clustering Besag
Newell, Cuzick Edwards, Diggle, Moran test,
Pagano, Turnbulls method,, etc. - Need methodology for multiple sources
21Farzad Mostashari
22BAYESIAN SHRINKAGE ESTIMATION
- DuMouchels GPS/MGPS
- Compares observed counts of market baskets
to expected counts under some (simple) model.
For example, saw 30 cases in the ER today with
G.I. syndrome AND fever AND work in Newark
compared with an expectation of 3 cases - 30-to-3 is more convincing than 3-to-0.3 but
less convincing that 300-to-30. Idea shrink the
smaller ones towards one.
23GPS SHRINKAGE AERS DATA
number of reports
24BAYESIAN SHRINKAGE ESTIMATION
- Issues
- Appropriate amount of shrinkage?
- Where do the expected values come from?
- Temporal dimension?
- Covariate information
-
25SEQUENTIAL PROBABILITY RATIO TESTS
- Classical much-studied statistical method
dating back to Wald (1948). Mostly univariate.
26NATURAL LANGUAGE
- Important sources of health data begin life as
free text chief complaints (ED visits, primary
care encounters, adverse event reports, e-mail,
etc.) - Approximately 5 minutes after receiving flu and
pneumonia vaccine pt began hollering, "Oh, Oh my
neck is hurting. Feels like a knot in my throat,
a medicine taste." Complained of chest pain
moving to back and leg numbness. - Some (successful) work on automated coding of
free text. - Little work on direct surveillance of text
data
27THE IDEA OF A COMPETITION
Thesis Rapid growth in the number of deployed
health surveillance systems and increasing
complexity require new analytic methodologies
Goal Stimulate mainstream Computer Science and
Statistics researchers to focus on this area
How A signal detection competition Examples
the Message Understanding Conferences (MUC), Text
Retrieval Conferences (TREC), KDD Cup, M3 Time
Series competition
28HOW CAN THIS BE ACCOMPLISHED
- Definitions of signals.
- Test data sets for refining signal detection
procedures. - Modular, interoperable signal generation
algorithms. - Computing efficiencies for Monte Carlo
simulations of signal detection events in large
complex data. - Multidimensional graphical displays to interpret
results and evaluate algorithms. - Multivariate statistical techniques for
evaluating signal detection profiles across
multiple data sources.
29COMPETITION STATUS
- DIMACS Working Group on Adverse Event and Disease
Reporting, Surveillance, Analysis - Subgroup focused on competition applied for
funding identified data sources - Key challenge appropriate methods for inserting
signals into real data (spiking) - Other groups face the same challenge (e.g.
BioStorm)
30CONCLUSION
- Short-term goals/benefits
- Promote coordination and collaboration
- Long-term goals/benefits
- Stimulate methodological research
- Provide objective evaluation of competing
algorithms - Produce high quality spiking algorithms
31ANALYTICAL METHODS FOR HEALTH SURVEILLANCE
DAVID MADIGAN DEPARTMENT OF STATISTICS RUTGERS
UNIVERSITY
32Novel Surveillance Applications Methodologies
- Early Aberration Reporting System (EARS), CDC
- Whats Strange About Recent Events? (WSARE), U of
Pittsburgh and Carnegie-Mellon U - Spatial and Space-Time Scan Statistics (SaTScanTM
Kulldorff) - Web Visual Data Mining Environment (WebVDME),
Lincoln Technologies, Inc.
33Novel Surveillance Applications Projects
- Electronic Surveillance System for the Early
Notification of Community-based Epidemics
(ESSENCE III), DOD - Real-time Outbreak and Disease Surveillance
(RODS), U of Pittsburgh - Biological Spatio-Temporal Outbreak Reasoning
Module (BioSTORM), Stanford U - Rapid Syndrome Validation Project (RSVP), Sandia
NL, NM - Alternative Surveillance Alert Program (ASAP),
Health Canada - Syndromic Surveillance Project, NYC
- Bioterrorism Syndromic Surveillance Demonstration
Program, CDC/Harvard
34Conceptual Taxonomy
Public Health Surveillance
Adverse event (to intervention exposure)
Disease
Traditional
Syndromic
Drug
Vaccine
Other
Infectious disease
Birth defect
Injuries
Etc.