Categorical Forecast Verification Issues - PowerPoint PPT Presentation

1 / 44

About This Presentation

Title:

Categorical Forecast Verification Issues

Description:

9/23/09. 1. Categorical Forecast Verification Issues. Eric M. Kemp. 9/23/09. 2. Topics ... Reviewing aspects of verification to select appropriate statistical scores. ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 45

Provided by: cap91

Category:

more less

Transcript and Presenter's Notes

Title: Categorical Forecast Verification Issues

1
Categorical Forecast Verification Issues

Eric M. Kemp

2
Topics

Evaluating aspects of an ARPS forecast (e.g.,
composite reflectivity) using a categorical
forecast approach.
Reviewing aspects of verification to select
appropriate statistical scores.
Investigating use of signal detection theory.

3
Categorical Forecasts

Yes/No forecasts of some type of category (gt40
dBZ reflectivity, occurrence of precipitation,
tornado, etc.)
Categorical forecasts are matched with
observations of events in contingency table.
For this work contingency table is 2?2.

4
Contingency Table
5
Contingency Table

Useful value
Event Frequency (EF) (H M)/N

6
Aspects of Verification

Consistency. Correspondence between judgments
and forecasts.
Quality. Correspondence between forecasts and
observations.
Value. Incremental benefits of forecasts to
users.
Murphy, Wea. Forecasting, 1993.

7
Aspects of Quality Discrimination

Discrimination. Correspondence between
conditional mean forecast and conditioning
observation, averaged over all observations.
Murphy, Wea. Forecasting, 1993.

8
Aspects of Quality Discrimination

Discrimination of Yes Events
Hit Rate (HR) H/(H M)
(Probability of Detection, Prefigurance)
Miss Rate (MR) M/(H M)
(Frequency of Misses)
Note MR 1 HR.

9
Aspects of QualityDiscrimination

Discrimination of No Events
False Alarm Rate (FAR) FA/(FA CN)
(Different from False Alarm Ratio!!!)
(Probability of False Detection)
Correct Null Rate (CNR) CN/(FA CN)
(Probability of Null Detection)
Note CNR 1 FAR

10
Aspects of QualityDiscrimination

Overall Discrimination
Pierce Discrimination Score HR FAR.
(Hanssen-Kuipers Discriminant, True Skill
Score, others)

11
Quality What Should We Look At?

Discrimination scores appear to be more useful
(provided that the scores are consistent from
case to case).
Can be used to optimize a forecast system through
the use of signal detection theory (SDT).

12
Signal Detection Theory

A system (human, guinea pig, computer model,
etc.) responds to a stimulus by discriminating
(correctly or incorrectly) between signal and
noise. In the most simple case, there are two
possible stimuli (noise and signal plus
noise) and two possible categorical responses.

13
Signal Detection Theory

After subjecting the system to a number of
trials, the categorical responses are matched
with the noise and signal plus noise stimuli
to construct a 2?2 contingency table, which is
then used to calculate HR and FAR.
Results vary with decision criterion.

14
Signal Detection Theory

By changing the decision criterion for a
response, we can construct multiple contingency
tables and plot a curve of HR, FAR points based
on the tables. The curve describes the systems
discrimination ability (called the Relative
Operating Characteristic, or ROC curve.)
ROC curves can be used to compare multiple
systems and/or to select optimal decision
criterion.

15
Signal Detection Theory

Idea Use SDT and ROC curves to evaluate how
ARPS forecasts significant reflectivity (gt40
dBZ).
For our purposes, the response is the
categorical forecast of significant reflectivity
for a specific decision criterion, the signal
is the observed significant reflectivity, and the
stimulus is the ARPS reflectivity.

16
Signal Detection Theory

Adjust the decision criterion for model
reflectivity to be considered significant
(triggering yes response/forecast). Criterion
for observations to be treated as signal is kept
constant. Lenient (strict) model criteria
larger (smaller) fields larger (smaller) HR and
FAR scores.

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Beyond SDT

Another idea Keep the model criterion constant,
but adjust the criterion for the observation.
Calculate the HR and MR for each criterion and
plot. The result is a Relative Operating Level
(ROL) curve, which can be used to compare the
reliability of multiple systems.
Mason and Graham, Wea. Forecasting, 1999.

21
Aspects of QualityBias

Bias. Correspondence between mean forecast and
mean observation.
Murphy, Wea. Forecasting, 1993

22
Aspects of QualityBias

Bias (H FA)/(H M)
Can be shown to be equivalent to
Bias (HR FAR) (FAR/EF)
As EF increases (decreases), Bias decreases
(increases).

23
Aspects of Quality Reliability

Reliability. Correspondence between conditional
mean observation and conditioning forecast,
averaged over all forecasts.
Murphy, Wea. Forecasting, 1993.

24
Aspects of Quality Reliability

Reliability of Yes Forecasts
Hit Ratio (HR) H/(H FA)
(Correct Alarm Ratio, Postagreement,
Frequency of Hits)
False Alarm Ratio (FAR) FA/(H FA)
Note FAR 1 HR

25
Aspects of Quality Reliability

Hit Ratio is equivalent to
HR
HR --------------------------------
(HR FAR) (FAR/EF)
Or HR HR/Bias.
As EF increases (decreases), HR increases
(decreases) and FAR decreases (increases).
If Bias 1, HR HR and FAR MR.

26
Aspects of Quality Reliability

Reliability of No Forecasts
Miss Ratio (MR) M/(M CN)
(Detection Failure Ratio)
Correct Null Ratio (CNR) CN/(M CN)
(Frequency of Correct Null Forecasts)
Note CNR 1 MR

27
Aspects of Quality Reliability

Miss Ratio is equivalent to
1 - HR
MR ---------------------------------------
(FAR HR) (1 FAR)/EF
As EF increases (decreases), MR increases
(decreases) and CNR decreases (increases).

28
Aspects of QualityAccuracy and Skill

Accuracy. Average correspondence between
individual pairs of forecasts and observations.
Skill. Accuracy of forecasts of interest
relative to accuracy of forecasts produced by
standard of reference.
Murphy, Wea. Forecasting, 1993

29
Aspects of QualityAccuracy and Skill

Accuracy
Proportion Correct (PC) (H CN)/N
Mean Square Error (MSR) (M FA)/N
Note MSR 1 PC

30
Aspects of QualityAccuracy and Skill

PC is equivalent to
PC (HR FAR 1)EF (1 FAR)
If (HR FAR 1) gt 0 As EF increases, PC
increases
If (HR FAR 1) lt 0 As EF increases, PC
decreases

31
Aspects of QualityAccuracy and Skill

General form of Skill Score
SS (PCs PCr)/(1 PCr)
Where PCs is the PC of a forecast system, and PCr
is the PC of a reference forecast system
(climatology, persistence, chance, etc.)

32
Aspects of QualityAccuracy and Skill

Heidke Skill Score PCr is generated using
random forecasts with the same bias as the
forecast system being evaluated.
HSS
2(HCN-MFA)
------------------------------------------
(HM)(MCN)(HFA)(FACN)

33
Aspects of QualityAccuracy and Skill

The Heidke Skill Score is equivalent to
(2FAR2HR)(EF2)(2HR-2FAR)EF
--------------------------------------------------
-----
(2FAR-2HR)(EF2)(HR-3FAR1)EFFAR

34
Aspects of QualityAccuracy and Skill

Appleman Skill Score PCr is generated using
constant forecasts of the most frequently
observed event (the best unskilled predictor).
A (H CN x)/(N x),
x MAX( (H M), (FA CN) )

35
Aspects of QualityAccuracy and Skill

The Appleman Skill Score is equivalent to
A HR FAR (FAR/EF)
if EF lt 0.5.
A (HR 1)/(1 EF) HR FAR 2
if EF gt 0.5.

36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
Quality What Should We Look At?

Since accuracy, skill, reliability, and bias are
all sensitive to event frequency, it is more
difficult to use these types of scores to compare
two forecast systems. This is especially true if
there is a wide variability in the frequency of
events.

44
The End