Empirical/Asymptotic P-values for Monte Carlo-Based Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic - PowerPoint PPT Presentation

About This Presentation
Title:

Empirical/Asymptotic P-values for Monte Carlo-Based Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

Description:

Empirical/Asymptotic P-values for Monte Carlo-Based Hypothesis ... Harvard Medical School and Harvard Pilgrim Health Care. Presented at EVA, August 15, 2005 ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 25
Provided by: HPHC6
Category:

less

Transcript and Presenter's Notes

Title: Empirical/Asymptotic P-values for Monte Carlo-Based Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic


1
Empirical/Asymptotic P-values for Monte
Carlo-Based Hypothesis Testing an Application to
Cluster Detection Using the Scan Statistic
  • Allyson Abrams, Martin Kulldorff, Ken Kleinman
  • Department of Ambulatory Care and Prevention,
  • Harvard Medical School and Harvard Pilgrim Health
    Care
  • Presented at EVA, August 15, 2005
  • This work was funded by the United States
    National Cancer Institute, grant number
    RO1-CA95979.

2
Background Scan Statistics
  • Spatial scan statistic used to identify
    geographic clusters
  • Use moving circular window on map
  • Any point on map can be the center of a cluster
  • Each circle includes a different set of points
  • If the centroid of a region is included in the
    circle, the whole region is included

3
Background Scan Statistics
  • For each distinct window, calculate the
    likelihood, proportional to
  • n number of cases inside circle
  • N total number of cases
  • ? expected number of cases inside circle

4
Background Scan Statistics
  • The scan statistic is the maximum likelihood over
    all possible circles
  • Identifies the most unusual cluster
  • To find p-value, use Monte Carlo hypothesis
    testing
  • Redistribute cases randomly and recalculate the
    scan statistic many times
  • Proportion of scan statistics from the Monte
    Carlo replicates which are greater than or equal
    to the scan statistic for the true cluster is the
    p-value

5
Background Scan Statistics
6
Background Scan Statistics
  • That discussion only considered spatial
    clustering
  • To extend to clustering in space and time, use
    cylinders instead of circles
  • The height of the cylinder represents time
  • The rest of the process is unchanged
  • SaTScan is a freely available software that uses
    the scan statistic to detect clusters in space,
    time, or space-time (www.satscan.org)

7
Background SaTScan
  • Main drawback to Monte Carlo hypothesis testing
    increased precision for p-values can only be
    obtained through greatly increasing the number of
    Monte Carlo replicates
  • A big problem for small p-values
  • SaTScan can take anywhere from seconds to hours
    to run, depending on the data, the type of
    analysis, and the number of Monte Carlo replicates

8
Background
  • We use SaTScan for 2 main reasons
  • Daily surveillance for disease outbreaks
  • Evaluating systems that use SaTScan for
    surveillance
  • In both cases, we need to limit the amount of
    time it takes to generate each p-value while
    still retaining enough precision in the p-value
    to determine how unusual a cluster is

9
Goal
  • Estimate distribution of the scan statistic using
    fewer Monte Carlo replicates
  • See how the p-values obtained from the
    distributional parameters compares with the true
    p-value

10
Methods
  • Sample map 245 counties in the northeast United
    States with 600 cases
  • Ran SaTScan on the sample map using 100,000,000
    Monte Carlo replicates to find the 'true'
    log-likelihood needed to obtain p-values of 0.01,
    0.001, 0.0001, 0.00001
  • Corresponds to the following order statistics
    from the 100,000,000 Monte Carlo replicates
    1,000,000 100,000 10,000 1,000

11
Methods
  • Ran SaTScan 1000 times on the same map, each time
    generating 999 Monte Carlo replicates
  • For each of the 1000 SaTScan runs
  • Found maximum likelihood estimates of the
    parameters for each distribution based on the 999
    Monte Carlo replicates
  • Distributions used Normal, Lognormal, Gamma,
    Gumbel

12
Methods
  • The empirical/asymptotic p-value for each
    distribution is the area to the right of the
    observed log-likelihood for a given distribution
  • For each distribution, we generated
  • empirical/asymptotic p-values based on the 'true'
    log-likelihood value
  • the log-likelihoods that would have been required
    to generate p-values of 0.01, 0.001, 0.0001,
    0.00001
  • The usual Monte Carlo-based p-value reported in
    SaTScan

13
Methods
  • Repeated the entire process using 60 and 6000
    cases
  • Results were almost identical
  • Using 600 cases, repeated entire process with 99
    and 9999 Monte Carlo replicates in each of the
    1000 simulations
  • Again, very similar results

14
Results
15
Results
16
Results
17
Results
18
Results
19
Results
20
Results
21
Results
22
Results
  • The empirical/asymptotic p-values from the Gumbel
    distribution appear only slightly conservatively
    biased
  • Other tested distributions all resulted in
    anti-conservatively biased p-values
  • The ordinary Monte Carlo p-values reported from
    SaTScan had greater variance than the
    Gumbel-based p-values

23
Conclusions
  • Empirical/asymptotic p-values based on the Gumbel
    distribution can be preferable to true Monte
    Carlo p-values
  • Empirical/asymptotic p-values can accurately
    generate p-values smaller than is possible with
    Monte Carlo p-values with a given number of
    replicates
  • We suggest empirical/asymptotic p-values as a
    hybrid method to accurately obtain small p-values
    with a relatively small number of Monte Carlo
    replicates

24
Future work
  • Results shown today are based on purely spatial
    analyses we will also look at space-time
    analyses
  • An option will be added in SaTScan to allow the
    user to request the Gumbel-based p-value
Write a Comment
User Comments (0)
About PowerShow.com