Title: Hypotheses and Testing Some Thoughts and questions using ROC as a crutch
1Hypotheses and TestingSome Thoughts and
questions (using ROC as a crutch!)
- CSEP Meeting
- June 6-8, 2006
- Bernard Minster
2ROC on Google..mostly Health Sciences
3ROC DEFINITION
- The Receiver Operating Characteristic curve (ROC)
of a prediction algorithm is defined by plotting
the proportion of successfully predicted events
in the chosen region, relative to the total
number of events, as a function of the volume of
space-time in which an alarm is declared. - or, alternatively, the proportion of missed
events as a function of space-time fraction
4ROC Curve
- The desired characteristic of the ROC curve is
upward curvature above the prime diagonal,
indicating good prediction performance for a
small fraction of the space-time volume. - The ROC is constructed by reckoning the
prediction success rate as a function of the
fraction space-time volume V(alarm)/V(total).
5ROC Curve
100
Desired
Successful Predictions
Uniform Probability
0
0
Space-Time Volume
100
6Methodology
- Use a decision criterion for prediction and test
it against a null hypothesis, using the ROC as a
tool. - Possible null hypothesis involves comparison with
a random sampling according to a particular
probability density. - Obvious candidates include uniform probability
density, or intensity of seismicity.or an
official risk map - This takes care of successful predictions,
failure-to-predict, but we must also assess the
occurence of false alarms.
7Definition of space in space-time
- An acceptable definition of space-time is
critical for computing a useful ROC curve. - Whole Earth ... too generous! Any algorithm that
predicts earthquakes in seismically active
regions will look extremely good. - Small selected subregion ... not enough. Most
current samples will not represent fairly the
performance of algorithm - Etc, etc.
- For most algorithms this is ambiguous
8Effect of Space selection
100
Large Space
Small Space
Successful Predictions
Uniform Probability
0
0
Space-Time Volume
100
9Hypothesis Test
- Decision Rule R A state of alarm is declared if
current likelihood L(t) exceeds a threshold L
o. - Example Null Hypothesis
- Given the decision rule R , algorithm samples
space-time volume consistent with uniform
probability density. - Alternate Hypothesis
- Given the decision rule R , algorithm samples
space-time volume following a non-uniform (very
concentrated) probability density. That is
algorithm is more efficient than throwing darts
at the map at random times.
10ROC, Earthquake Density Criterion
Typically a very small fraction of the S-T volume
in which an alarm is declared ends up containing
an event. The rest measures False Alarms!
Events sampled
Correct positive
False alarms
11Relative seismic intensity (RI) 1932-2000, M 3
12A Minimax Problem
- Successful algorithms should maximize successes
while minimizing false alarm rate. - This tradeoff determines where to operate on the
ROC curve
13Another issue
- Typically, when using observed catalogs, we deal
with small samples - Question Do the results really represent
faithfully the performance of algorithm? - What if we repeated the experiment on many
planets? - How to assign error bars to observed ROC, and use
these in hypothesis testing?
14Comparison of 3 hypotheses
Does the sample represent accurately the
performance of algorithm? What are error
estimates? Use distribution-free confidence
intervals for the ROC based on seismicity
density, and use analytical formulae
(hypergeometric distribution) for the uniform pdf.