Title: Some Ideas for Detecting Spurious Observations Based on Mixture Models
1Some Ideas for Detecting Spurious Observations
Based on Mixture Models
- Jim Lynch
- NISS/SAMSI University of South Carolina
2Some Ideas for Detecting Spurious Observations
- Work with Dave Dickey and Francisco Vera
- Very Preliminary Ideas
- Primarily Motivated by Daves American Airlines
Data and Proschans (1963) paper on pooling to
explain a decreasing failure rate and, to a
lesser extent, M. J. Bayarri talk on Multiple
testing
3Outline
- 1. Introduction
- 2. Mixture Models
- 3. Some Ideas
- 4. Simulations
- 5. The American Airlines Data
4IntroductionSome Motivation AA Data(Largest
Log Vol Removed)
- Some Time Series Diagnostics Suggest That Log
Volume Ratio is an MA(1) - Fit an MA(1) to the log Vol Ratio to the AA Data
- Look At The Residuals
5Introduction
- Detecting spurious observations is an important
area of research and has implications for anomaly
detection (AD). - The term spurious observation is used to
distinguish it from an outlier, since outliers
are usually extreme observations in the data
while a spurious observation need not be. - E.g., one could imagine that sophisticated
intruders into computer systems would make
sporadic intrusions and try to mimic as best as
possible normal behavior
6Introduction
- Goal
- To develop approaches to detect very transient
spurious events where the objectives are - To detect when there are spurious events present
and, if possible, - To identify them
7Introduction
- The Basic Data Analytic Model
- X1,, Xn iid fp (1-p) f0 p f1
- f0 is the background model
- f1 models the spurious behavior
- The likelihood is then
8Introduction
- A More Realistic Model
- Generate a configuration C with probability p(C)
- Given C, for ieC, Xi are iid f0 and, for ieCc,
Xi are iid f1 - C and Cc model a spatial or temporal (e.g., a
change-point) pattern - You are pooling observations based on the
configuration C - The likelihood is then
9IntroductionSome Approaches for Analyzing the
MR Model
- Envision that the data are the effects of pooling
observations from f0 and f1. - Treat the data as if it is from a mixture model
and use a mixture model to determine the mle, p,
of the mixing proportion. - Use p to test H0 p0 versus H1 pgt0(Under H0
and the mixture model, n-.5p converges in
distribution to X where X0 with probability .5
and N(0,I0-1) with probability .5) - If H0 is rejected see if the mixture model can
give insights into the configuration Cj - E.g., do an empirical Bayes with prior
p(Cj)(1-p)jpn-j. Then
10IntroductionAnother Approach
- Since f1 models the spurious behavior p0
- p0 suggest using the locally most powerful (LMP)
test statistic for testing H0p0 versus H1pgt0
as the basis of discovering if there are spurious
observations present - The test statistic is related essentially to the
gradient plot introduced by Lindsay (1983) to
determine when a finite mixture mle is the global
mixture mle in the mixed distribution model
11IntroductionAnother Approach
- The basis of this approach
- use the gradient plot to determine if the one
point mixture mle is the global mixture mle - When it isnt, this suggest that some spurious
behavior is present - One can then use the components in the mle mixed
distribution to calculate assignment
probabilities to the data to indicate what
observations might be considered spurious - The examples indicate that detecting the presence
of spurious observations seems to be considerably
simpler than identifying which ones they are
12IntroductionMining Data Graphs
- Data (Maguire, Pearson and Wynn, 1952) Time
Between Accidents with 10 or more fatalities - At the right are the gradient plots for the 2 and
3 point mixture mles and the assignment function
for the 3 pt mle (mixing over exponentials) - The 2 and 3 pt mixture mles
- m 592.9, 166.2 p .175, .825
- m 595.5, 171.6, 29.1 p .171, .806, .023
13Mixture Models
- X1,, Xn iid fp (1-p) f0 p f1
- f0 is the background model
- f1 models the spurious behavior
- Since the spurious observations are
sporadic/transient p0 - Denote the log likelihood by f(f(X1),,
f(Xn)) f(f) log Pif(Xi) - Denote the gradient function of f by
14Mixture Models LMP
- LemmaThe locally most powerful test for testing
H0p0 versus H1pgt0 is based on F0(f1 f0). - ProofThe LMP test for testing H0p p0 versus
H1pgt p0 is based on the statistic - For p0 this reduces to
15Mixture Model
- The Function F(f1 f0)
- Plays a prominent role in the analysis of data
from mixtures models where it is essentially the
gradient function. - Introduced by Lindsay (1983ab and 1995) to
determine when the mle for the mixing
distribution with a finite number of points was
the global mixture mle.
16Mixture ModelFramework
- Family of densities fqq e Q.
- M is the set of probability measures on Q.
- The mixed distribution over the family with
mixing distribution Q by - For X1,, Xn be iid from fQ, the likelihood and
log likelihood are given by - L(Q) PfQ(Xi) and f(fQ) log PifQ(Xi)
- fQ (fQ(X1),, fQ(Xn)).
17Mixture ModelFramework
- The Directional Derivative
18Mixture ModelA Diagnostic
- Theorem 4.1 of Lindsay (1983a)
- A. The following three conditions are equivalent
- Q maximizes L(Q)
- Q minimizes supq D(qQ)
- supq D(qQ)0.
- B. Let ffQ. The point (f,f) is a saddle
point of .i.e., - F(fQf) lt 0 F(ff) lt F(f fQ) for Q,
Q e M. - C. The support of Q is contained in the set of q
for which D(qQ)0.
19Mixture ModelThe Assignment/Membership Function
20Simulations n10 5 points N(0,1), 5 points
N(1,1)
- 0 -0.34964
- 0 -1.77582
- 0 -0.92900
- 0 0.58061
- 0 -0.36032
- 1 2.51937
- 1 0.59549
- 1 1.16238
- 1 0.76632
- 1 1.57752
21Simulations n10 5 points N(0,1), 5 points
N(1,1)
- m p
- -.487880 .388813
- .929969 .611187
22SimulationsThe Assignment Function
23Simulationsn30 25 points N(0,1), 5 points
N(1,1)
- m p
- -0.05537 0.867670
- 2.05801 0.132330
-
24Simulationsn30 25 points N(0,1), 5 points
N(1,1)
25SimulationsAnother n30 25 points N(0,1), 5
points N(1,1)
- m p
- 0.78767 0.921009
- 3.30559 0.078991
26SimulationsAnother n30 25 points N(0,1), 5
points N(1,1)
27AA Data
- Francisco will discuss this and some other
simulations in a moment.
28Closing Comments
- Is there an analogue (or alternative) of these
ideas for the SCAN (or for the SCAN framework)? - As an alternative, view the problem as having
several (two) mechanisms creating observations - background
- infectious material is present.
- Just consider that the data are a pooling from
all these sites. See if the data is a
2-component mixture. If it is, try to assign
the sites to these components. (You might use a
thresh-holding of the assignment function to do
this or p in the LMP Test Statistic.) - Instead of the assignment function, consider the
following based on the LMP test statistic.
Define Li(f1(Xi) - f0(Xi))/f0(Xi). Let L(1)
ltL(2) ltlt L(n) and let j(i) denote the inverse
rank, i.e., L(i) Lj(i). For mixture or scanning
purposes, consider the sets Cij(n),..,j(n-i1)
k L(n-i1) lt Lk. For mixtures with mle p,
assign Ci to f1 and Cic to f0 where npi. For
scanning purposes, look through increasing
sequence of sets Ci for a spatial pattern to
emerge.
29REFERENCES
- Ferguson, T. S. (1967) Mathematical Statistics A
Decision Theoretical Approach. Academic Press,
NY. - Grego, J., Hsi, Hsiu-Li, and Lynch, J. D. (1990).
A strategy for analyzing mixed and pooled
exponentials. Applied Stochastic Models and Data
Analysis, 6, 59-70. - Lindsay, B.G. (1983a). The geometry of mixture
likelihoods a general theory. Ann. Statist.,
11, 86-94. - Lindsay, B.G. (1983b). The geometry of mixture
likelihoods, Part II the exponential family.
Ann. Statist., 11, 783-792. - Lindsay, B.G. (1995). Mixture Models Theory,
Geometry Applications, NSF-CBMS lecture series,
IMS/ASA - Maguire, B.A., Pearson, E.S., and Wynn, A.H.A.
(1952) The time interval between industrial
accidents. Biometrika, 39, 168-180. - Proschan, F. (1963). Theoretical explanation of
decreasing failure rate. Technometrics, 5,
375-383.