Some Ideas for Detecting Spurious Observations Based on Mixture Models - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Some Ideas for Detecting Spurious Observations Based on Mixture Models

Description:

Some Ideas for Detecting Spurious Observations. Work with Dave Dickey and ... Primarily Motivated by Dave's American Airlines Data and Proschan's (1963) paper ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 30

Provided by: francis49

Category:

more less

Transcript and Presenter's Notes

Title: Some Ideas for Detecting Spurious Observations Based on Mixture Models

1
Some Ideas for Detecting Spurious Observations
Based on Mixture Models

Jim Lynch
NISS/SAMSI University of South Carolina

2
Some Ideas for Detecting Spurious Observations

Work with Dave Dickey and Francisco Vera
Very Preliminary Ideas
Primarily Motivated by Daves American Airlines
Data and Proschans (1963) paper on pooling to
explain a decreasing failure rate and, to a
lesser extent, M. J. Bayarri talk on Multiple
testing

3
Outline

1. Introduction
2. Mixture Models
3. Some Ideas
4. Simulations
5. The American Airlines Data

4
IntroductionSome Motivation AA Data(Largest
Log Vol Removed)

Some Time Series Diagnostics Suggest That Log
Volume Ratio is an MA(1)
Fit an MA(1) to the log Vol Ratio to the AA Data
Look At The Residuals

5
Introduction

Detecting spurious observations is an important
area of research and has implications for anomaly
detection (AD).
The term spurious observation is used to
distinguish it from an outlier, since outliers
are usually extreme observations in the data
while a spurious observation need not be.
E.g., one could imagine that sophisticated
intruders into computer systems would make
sporadic intrusions and try to mimic as best as
possible normal behavior

6
Introduction

Goal
To develop approaches to detect very transient
spurious events where the objectives are
To detect when there are spurious events present
and, if possible,
To identify them

7
Introduction

The Basic Data Analytic Model
X1,, Xn iid fp (1-p) f0 p f1
f0 is the background model
f1 models the spurious behavior
The likelihood is then

8
Introduction

A More Realistic Model
Generate a configuration C with probability p(C)
Given C, for ieC, Xi are iid f0 and, for ieCc,
Xi are iid f1
C and Cc model a spatial or temporal (e.g., a
change-point) pattern
You are pooling observations based on the
configuration C
The likelihood is then

9
IntroductionSome Approaches for Analyzing the
MR Model

Envision that the data are the effects of pooling
observations from f0 and f1.
Treat the data as if it is from a mixture model
and use a mixture model to determine the mle, p,
of the mixing proportion.
Use p to test H0 p0 versus H1 pgt0(Under H0
and the mixture model, n-.5p converges in
distribution to X where X0 with probability .5
and N(0,I0-1) with probability .5)
If H0 is rejected see if the mixture model can
give insights into the configuration Cj
E.g., do an empirical Bayes with prior
p(Cj)(1-p)jpn-j. Then

10
IntroductionAnother Approach

Since f1 models the spurious behavior p0
p0 suggest using the locally most powerful (LMP)
test statistic for testing H0p0 versus H1pgt0
as the basis of discovering if there are spurious
observations present
The test statistic is related essentially to the
gradient plot introduced by Lindsay (1983) to
determine when a finite mixture mle is the global
mixture mle in the mixed distribution model

11
IntroductionAnother Approach

The basis of this approach
use the gradient plot to determine if the one
point mixture mle is the global mixture mle
When it isnt, this suggest that some spurious
behavior is present
One can then use the components in the mle mixed
distribution to calculate assignment
probabilities to the data to indicate what
observations might be considered spurious
The examples indicate that detecting the presence
of spurious observations seems to be considerably
simpler than identifying which ones they are

12
IntroductionMining Data Graphs

Data (Maguire, Pearson and Wynn, 1952) Time
Between Accidents with 10 or more fatalities
At the right are the gradient plots for the 2 and
3 point mixture mles and the assignment function
for the 3 pt mle (mixing over exponentials)
The 2 and 3 pt mixture mles
m 592.9, 166.2 p .175, .825
m 595.5, 171.6, 29.1 p .171, .806, .023

13
Mixture Models

X1,, Xn iid fp (1-p) f0 p f1
f0 is the background model
f1 models the spurious behavior
Since the spurious observations are
sporadic/transient p0
Denote the log likelihood by f(f(X1),,
f(Xn)) f(f) log Pif(Xi)
Denote the gradient function of f by

14
Mixture Models LMP

LemmaThe locally most powerful test for testing
H0p0 versus H1pgt0 is based on F0(f1 f0).
ProofThe LMP test for testing H0p p0 versus
H1pgt p0 is based on the statistic
For p0 this reduces to

15
Mixture Model

The Function F(f1 f0)
Plays a prominent role in the analysis of data
from mixtures models where it is essentially the
gradient function.
Introduced by Lindsay (1983ab and 1995) to
determine when the mle for the mixing
distribution with a finite number of points was
the global mixture mle.

16
Mixture ModelFramework

Family of densities fqq e Q.
M is the set of probability measures on Q.
The mixed distribution over the family with
mixing distribution Q by
For X1,, Xn be iid from fQ, the likelihood and
log likelihood are given by
L(Q) PfQ(Xi) and f(fQ) log PifQ(Xi)
fQ (fQ(X1),, fQ(Xn)).

17
Mixture ModelFramework

The Directional Derivative

18
Mixture ModelA Diagnostic

Theorem 4.1 of Lindsay (1983a)
A. The following three conditions are equivalent
Q maximizes L(Q)
Q minimizes supq D(qQ)
supq D(qQ)0.
B. Let ffQ. The point (f,f) is a saddle
point of .i.e.,
F(fQf) lt 0 F(ff) lt F(f fQ) for Q,
Q e M.
C. The support of Q is contained in the set of q
for which D(qQ)0.

19
Mixture ModelThe Assignment/Membership Function
20
Simulations n10 5 points N(0,1), 5 points
N(1,1)

0 -0.34964
0 -1.77582
0 -0.92900
0 0.58061
0 -0.36032
1 2.51937
1 0.59549
1 1.16238
1 0.76632
1 1.57752

21
Simulations n10 5 points N(0,1), 5 points
N(1,1)

m p
-.487880 .388813
.929969 .611187

22
SimulationsThe Assignment Function
23
Simulationsn30 25 points N(0,1), 5 points
N(1,1)

m p
-0.05537 0.867670
2.05801 0.132330

24
Simulationsn30 25 points N(0,1), 5 points
N(1,1)
25
SimulationsAnother n30 25 points N(0,1), 5
points N(1,1)

m p
0.78767 0.921009
3.30559 0.078991

26
SimulationsAnother n30 25 points N(0,1), 5
points N(1,1)
27
AA Data

Francisco will discuss this and some other
simulations in a moment.

28
Closing Comments

Is there an analogue (or alternative) of these
ideas for the SCAN (or for the SCAN framework)?
As an alternative, view the problem as having
several (two) mechanisms creating observations
background
infectious material is present.
Just consider that the data are a pooling from
all these sites. See if the data is a
2-component mixture. If it is, try to assign
the sites to these components. (You might use a
thresh-holding of the assignment function to do
this or p in the LMP Test Statistic.)
Instead of the assignment function, consider the
following based on the LMP test statistic.
Define Li(f1(Xi) - f0(Xi))/f0(Xi). Let L(1)
ltL(2) ltlt L(n) and let j(i) denote the inverse
rank, i.e., L(i) Lj(i). For mixture or scanning
purposes, consider the sets Cij(n),..,j(n-i1)
k L(n-i1) lt Lk. For mixtures with mle p,
assign Ci to f1 and Cic to f0 where npi. For
scanning purposes, look through increasing
sequence of sets Ci for a spatial pattern to
emerge.

29
REFERENCES

Ferguson, T. S. (1967) Mathematical Statistics A
Decision Theoretical Approach. Academic Press,
NY.
Grego, J., Hsi, Hsiu-Li, and Lynch, J. D. (1990).
A strategy for analyzing mixed and pooled
exponentials. Applied Stochastic Models and Data
Analysis, 6, 59-70.
Lindsay, B.G. (1983a). The geometry of mixture
likelihoods a general theory. Ann. Statist.,
11, 86-94.
Lindsay, B.G. (1983b). The geometry of mixture
likelihoods, Part II the exponential family.
Ann. Statist., 11, 783-792.
Lindsay, B.G. (1995). Mixture Models Theory,
Geometry Applications, NSF-CBMS lecture series,
IMS/ASA
Maguire, B.A., Pearson, E.S., and Wynn, A.H.A.
(1952) The time interval between industrial
accidents. Biometrika, 39, 168-180.
Proschan, F. (1963). Theoretical explanation of
decreasing failure rate. Technometrics, 5,
375-383.