Title: A ZScore Based Multilevel Spatial Clustering Algorithm for the Detection of Disease Outbreaks
1A Z-Score Based Multi-level Spatial Clustering
Algorithm for the Detection of Disease Outbreaks
Department of Biomedical Informatics University
of Pittsburgh School of Medicine http//www.dbmi.
pitt.edu
- Jialan Que, Fu-Chiang Tsui, PhD and Jeremy
Espino, MD
2Outline
- Introduction
- Temporal algorithms
- Spatial algorithms
- Methods
- Z-Score Based Multi-level Clustering Algorithm
(ZMSC) - Evaluation
- Results
- Discussion
- Future work
3Temporal Detection Algorithms
- Cumulative Sum (CuSUM)
- Recursive least squares
- Bayesian change-point detector
- Wavelet anomaly detector (WAD)
- Cumulative Sum (CuSUM)
- Recursive least squares
- Bayesian change-point detector
- Wavelet anomaly detector (WAD)
Siegrist D, Pavlin JA. BioALIRT biosurveillance
testbed evaluation. In Syndromic
surveillance reports from a national conference,
New York, NY. MMWR 2004 53(Suppl.) 152-8.
4Spatial Detection Algorithms
- Kulldorffs spatial scan statistic (KSSS)
- Bayesian spatial scan statistic (BSSS)
- Kulldorffs spatial scan statistic (KSSS)
- Bayesian spatial scan statistic (BSSS)
Martin K, A spatial scan statistic.
Communications in Statictics theory and
methods. 199726(6)1481-96. Neill DB, Moore
AW, Cooper, GF. A Bayesian spatial scan
statistic. Advances in neural information
processing systems. 2005181003-10.
5Kulldorffs Spatial Scan Statistic
scanning window
p0.001
Having an outbreak in a cluster (H1) vs. no
outbreaks (H0)
- Scan study region with circular or elliptic
windows - Compute likelihood ratios
- Locate a cluster with maximum likelihood ratio
- Compute p-value using randomization test
6Kulldorffs Spatial Scan Statistic
- Advantages
- Close to complete search
- Disadvantages and limitations
- Computationally intensive (O(n3),n is of unit
areas) - Only find clusters with circular/elliptic shapes
7Bayesian Spatial Scan Statistic
scanning window
p0.03
p0.2
- Divide a study region into an mm grid
- Scan the study region with rectangular windows
- Compute posterior probabilities, P(Having
outbreak in cluster Data) - Locate a cluster with highest posterior
probability
8Bayesian Spatial Scan Statistic
- Advantages
- No significance testing needed
- Disadvantages and limitations
- Clusters with rectangular shapes
- Still nearly exhaustive search, time consuming
- (O(m4), mgrid size)
9MethodsZ-Score Based Multi-level Spatial
Clustering
- Only look at the subsets of the areas having high
risks of outbreak occurrence - Risk rate z-score
- Compute the z-score for each cluster by combining
all the normalized time series of its inclusive
areas.
- Compute the p-values of the cluster z-scores
- Output the most significant cluster
Top 20
10Evaluation
- Data
- Over-the-counter anti-diarrhea medication sales
received from Pennsylvania (Jan.1,2004-Aug.31,2007
) - Semi-synthetic outbreaks (size K strength )
- Evaluation metrics
- ROC
- AMOC
- Running time
- Cluster positive predictive value (ppv)
- of the detected outbreak ZIP codes / output
cluster size - Cluster sensitivity
- of the detected outbreak ZIP codes /
outbreak size
11ROC Curves
12AMOC Curves
13Areas under ROC and AMOC
14Running Time
The KSSS method was executed using SaTScan
(implemented in C), ZMSC, KSSS and BSSS were
implemented in JAVA and executed under JRE-1.5.
We configured the analysis of KSSS in SaTScan
as prospective, and the time window as 1-day
(time precision was on a daily basis), which
analyzes only the most current day to make it
comparable to ZMSC. The number of Monte-Carlo
replications was set to be 999.
For BSSS, we calculated the expected values
using a 28-day moving average. The grid was
defined as 32 by 32.
15Cluster Sensitivities
16Cluster Sensitivities
17Cluster PPVs
18Discussion
- Advantages
- ZMSC can be applied to larger scale of study
areas because it runs fast - ZMSC tends to identify more compact clusters
which may help field epidemiologists - ZMSC is not limited to any artificial cluster
shapes - Disadvantages
- ZMSC detected clusters later than BSSS at low
false alarm rates
19Limitations
- Adjacency threshold ( ) was set to 0
- Injected outbreaks were artificial and simplified
20Future Work
- Different outbreak models
- BARD outbreaks
- Other outbreak models
- Other data types
- ED data
- Water quality data, etc..
21Acknowledgments
- NSFIIS-0325581
- CDC-5R01PH00026-02
- NLM-5R21LM008278-03
- PADOH-ME-01737
- AFRL-F30602-01-2-0550
22(No Transcript)
23Cluster Z-Score
Normalize and average
WAD