Summary of - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Summary of

Description:

Spatial Scan Statistic (SSS) Kulldorff (1997) used SSS to detect clusters ... An important result on most likely cluster based on these models is given in the ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 40
Provided by: profi181
Category:
Tags: scan | summary

less

Transcript and Presenter's Notes

Title: Summary of


1
Summary of A Spatial Scan Statistic by M.
Kulldorff
  • Presented by Gauri S. Datta
  • gauri_at_stat.uga.edu
  • Mid-Year Meeting
  • February 3, 2006

2
Background
  • Scan Statistic
  • A tool to detect cluster in a Point Process
  • Naus (1965 JASA) studied in one dimension
  • tests if a 1-dim point process is purely random
  • Point Process
  • Consider a time interval a,b and a window
    At,tw of fixed width w
  • ?(A) of e-mails arrived in the time window A
  • n(A) nA of junk e-mails number of
    points
  • Arrival times of junk e-mails define a Point
    Process

3
Main Idea in Scan Statistic
  • Move a window t,tw of size w lt b-a over a time
    interval a,b
  • Over all possible values of t, record the maximum
    number of points in the window
  • Compare this number with cut off points under the
    the hypothesis of a purely Poisson Process

4
(No Transcript)
5
p
p
q
6
Building block of Scan Test
  • Repeated use of tests for equality of two
    Binomial or Poisson populations
  • Two populations are defined by the scanning
    window A and its complement Ac
  • As in multiple comparison, these tests are
    dependent as one moves the scanning window

7
Spatial Scan Statistic (SSS)
  • Kulldorff (1997) used SSS to detect clusters in
    spatial process
  • SSS can be used
  • In multi-dim point process
  • With variable window size
  • With baseline process an inhomogeneous Poisson
    process or Bernoulli Process

8
SSS (continued)
  • Scanning window can be any predefined shape
  • SSS is on a geographical space G with a measure ?
  • In traditional point process, G is a line, ? is a
    uniform measure
  • In 2-dim, G is a plane, ? a Lebesgue measure

9
p
p
q
10
Examples
  • Forestry
  • Spatial clustering of trees.
  • Want to see for clusters of a specific kind of
    trees after adjusting for uneven spatial
    distribution of all trees
  • ?(A)Total of trees in region A
  • nA of trees in A of specific kind

11
Examples (continued)
  • Epidemiology
  • Interest in detecting geographical clusters of
    disease
  • Need to adjust for uneven population density
  • Rural vs. urban population
  • For data aggregated into census districts,
    measure is concentrated at the central
    coordinates of districts

12
Examples (continued)
  • If interest is in space-time clusters of a
    disease, the measure will still be concentrated
    in the geographical region as in the prior
    example
  • Adjusting for uneven population distribution is
    not always enough. Should take confounding
    factors into account. E.g., in epidemiology
    measure can reflect standardized expected
    incidence rate

13
SS LR statistic
  • For a fixed size window, scan statistic is the
    maximum of points in the window at any given
    time/geographical region
  • Test Stat is equivalent to LR test statistic for
    testing H0?1?2 vs. Ha?1gt?2
  • Generalization to LR test is important for
    variable window

14
Generalized SS Notation/Models
  • G Geographical area / study space
  • A Window ½ G
  • N(A) Random of points in A
  • A spatial point process
  • Goal to find the prominent cluster
  • Two useful models for point process
  • (a) Bernoulli model
  • (b) Poisson model

15
Standard Models for SS
  • For Bernoulli model, measure ? is such that ?(A)
    is an integer for all subsets A of G
  • Two states (disease point or no disease) for
    each unit
  • Location of the points define a point process

16
(No Transcript)
17
LR Test Bernoulli Model
18
LR Test Bernoulli Model
19
Poisson Model
  • Under Poisson model, points generated by inhom.
    Poiss. Proc. There is exactly one zone Z ? G s.t.
    N(A) ? Po(pµ(A??Z) qµ(A?Zc)) for all A.
  • Null hypothesis H0pq
  • Alternative hypo H1 pgtq, Z ??.
  • Under H0, N(A) ? Po(pµ(A)) for all A.
  • - the parameter Z disappears under H0

20
Poisson Model (continued)
21
Poisson Model (continued)
22
Poisson Model (continued)
23
Choice of Zones
  • How is ? selected? Possibilities
  • All circular subsets
  • All circles centered at any of several foci on a
    fixed grid, with a possible upper limit on size
  • Same as (2) but with a fixed size
  • All rectangles of fixed size and shape
  • If looking for space-time clusters, use
    cylinders scanning circular geographical areas
    over variable time intervals

24
Bernoulli vs. Posson Model
  • Choice between a Bernoulli or Poisson model does
    not matter much if
  • n(G) ltlt ?(G)
  • In other cases, use the model most appropriate
    for application

25
A Useful Result
  • An important result on most likely cluster
    based on these models is given in the paper. It
    states that as long as the points within the zone
    constituting the most likely cluster are located
    where they are, H_0 will be rejected irrespective
    of the other points in G. If a cluster is located
    in Seattle, locations of the points in the east
    coast of U.S. do not matter (Theorem 1)

26
Computations and MC
  • To find the value of ?, we need to calculate LR
    maximized over collection of zones in H1. Seems
    like a daunting task since of zones could be
    infinite.
  • of observed points finite
  • For a fixed of points, likelihood decreases as
    µ(Z) increases

27
Computations (contd)
  • If the circle size increases for a fixed foci,
    need to recalculate likelihood whenever a new
    point enters the circle. For a finite points,
    of recalcing likelihood for each foci is finite.
  • Distribution of ? is difficult. MC simulation
    used to generate histogram of ? . Under H0,
    replicate the data sets conditional on nG .

28
Application of SSS to SIDS
  • Bernoulli and Poisson models are illustrated
    using the SIDS data from NC
  • For 100 counties in NC, total of live births
    and of SIDS cases for 1974-84.
  • Live births range from 567 to 52345
  • Location of county seats are the coordinates.
    Measure is the of live births in a county

29
Application to SIDS (continued)
  • Zones for scanning window are circles centered at
    a county coordinate point including at most half
    of the total population
  • Zones are circular only wrt the aggregated data.
    As circles around a county seat are drawn, other
    counties will either be completely part of a zone
    or else not at all, depending on whether its
    county seat is within the circle or not

30
Bernoulli model for SIDS
  • Bernoulli model is very natural. Each birth can
    correspond to at most one SID. Table 1 summarizes
    the results of the analysis.
  • From Figure 1, the most likely cluster A,
    consists of Bladen, Columbus, Hoke, Robeson, and
    Scotland.
  • Using a conservative test, a secondary cluster is
    B, consists of Halifax, Hartford and Northampton
    counties.

31
Poisson model for SIDS
  • For a rare disease SIDS, Poisson model gives a
    close approximation to Bernoulli. Results are
    reported in Table 1
  • Both models detect the same cluster
  • P-values for the primary cluster are same for
    both the models p-values for the secondary
    cluster are very close

32
Application to SIDS (continued)
33
Two significant clusters based on SSS
34
SSS adjusted for Race
  • For SIDS one useful covariate is race
  • Race is related to SIDS through unobserved
    covariates such as quality of housing, access to
    health care
  • Overall incidence of SIDS for white children is
    1.512 per 1000 and for black children is 2.970
    per 1000.

35
SSS race-adjusted (continued)
  • Racial distribution differs widely among the
    counties in NC
  • This analysis leads to the same primary cluster
    (see Figure 2)
  • Previous secondary cluster disappeared but a
    third secondary cluster C emerges. Cluster C
    consists of a bunch of counties in the western
    part of the state

36
Application to SIDS (continued)
37
SSS to SIDS adjusted for race
38
A Bayesian alternative to SSS
  • Scott and Berger (2006) Idea of Bayesian
    multiple testing.
  • Observe Xj ? N(µj, s2), j1,,M,
  • To determine which µj are nonzero ? we have M
  • (conditionally) independent tests, each
    testing
  • H0jµj 0 vs. H1j µj ? 0
  • p0 prior probability that µj is zero
  • Crucial point here let data estimate p0 .
  • SB use the hierarchical model
  • 1. Xjµj , s2, ?j N(?jµj, s2),
    independently
  • 2. µj t2 I.I.D. N(0, t2 ), ?j p0
    I.I.D. Bern (1-p0)
  • 3. (t2 , s2) p (t2 , s2) (t2 s2)-2, p0
    p(p0)
  • Several choices for p(p0) Uniform,
    Beta(a,1)
  • SB computed posterior probability ?j 1.

39
Modification of SB Model
  • Assume Xj ? N(µj, s2), j1,,M,
  • To determine which µj are positive ? we have M
  • (conditionally) independent tests, each
    testing
  • H0jµj 0 vs. H1j µj gt 0
  • As before
  • 1. Xjµj , s2, ?j N(?jµj, s2),
    independently
  • 2. µj µ(-j), ?, t2 N(??qjkµk, t2 ),
    CAR
  • ?j pj Ind. Bern (1-pj)
  • 3. (t2, s2, ?) p (t2 , s2, ?) (t2 s2)-2
  • 4. CAR model on logit(pj)
  • Compute posterior probability of µj gt0.
Write a Comment
User Comments (0)
About PowerShow.com