Title: Categorical Forecast Verification Issues
1Categorical Forecast Verification Issues
2Topics
- Evaluating aspects of an ARPS forecast (e.g.,
composite reflectivity) using a categorical
forecast approach. - Reviewing aspects of verification to select
appropriate statistical scores. - Investigating use of signal detection theory.
3Categorical Forecasts
- Yes/No forecasts of some type of category (gt40
dBZ reflectivity, occurrence of precipitation,
tornado, etc.) - Categorical forecasts are matched with
observations of events in contingency table. - For this work contingency table is 2?2.
4Contingency Table
5Contingency Table
- Useful value
- Event Frequency (EF) (H M)/N
-
6Aspects of Verification
- Consistency. Correspondence between judgments
and forecasts. - Quality. Correspondence between forecasts and
observations. - Value. Incremental benefits of forecasts to
users. - Murphy, Wea. Forecasting, 1993.
7Aspects of Quality Discrimination
- Discrimination. Correspondence between
conditional mean forecast and conditioning
observation, averaged over all observations. - Murphy, Wea. Forecasting, 1993.
8Aspects of Quality Discrimination
- Discrimination of Yes Events
- Hit Rate (HR) H/(H M)
- (Probability of Detection, Prefigurance)
- Miss Rate (MR) M/(H M)
- (Frequency of Misses)
- Note MR 1 HR.
9Aspects of QualityDiscrimination
- Discrimination of No Events
- False Alarm Rate (FAR) FA/(FA CN)
- (Different from False Alarm Ratio!!!)
- (Probability of False Detection)
- Correct Null Rate (CNR) CN/(FA CN)
- (Probability of Null Detection)
- Note CNR 1 FAR
10Aspects of QualityDiscrimination
- Overall Discrimination
- Pierce Discrimination Score HR FAR.
- (Hanssen-Kuipers Discriminant, True Skill
Score, others)
11Quality What Should We Look At?
- Discrimination scores appear to be more useful
(provided that the scores are consistent from
case to case). - Can be used to optimize a forecast system through
the use of signal detection theory (SDT).
12Signal Detection Theory
- A system (human, guinea pig, computer model,
etc.) responds to a stimulus by discriminating
(correctly or incorrectly) between signal and
noise. In the most simple case, there are two
possible stimuli (noise and signal plus
noise) and two possible categorical responses.
13Signal Detection Theory
- After subjecting the system to a number of
trials, the categorical responses are matched
with the noise and signal plus noise stimuli
to construct a 2?2 contingency table, which is
then used to calculate HR and FAR. - Results vary with decision criterion.
14Signal Detection Theory
- By changing the decision criterion for a
response, we can construct multiple contingency
tables and plot a curve of HR, FAR points based
on the tables. The curve describes the systems
discrimination ability (called the Relative
Operating Characteristic, or ROC curve.) - ROC curves can be used to compare multiple
systems and/or to select optimal decision
criterion.
15Signal Detection Theory
- Idea Use SDT and ROC curves to evaluate how
ARPS forecasts significant reflectivity (gt40
dBZ). - For our purposes, the response is the
categorical forecast of significant reflectivity
for a specific decision criterion, the signal
is the observed significant reflectivity, and the
stimulus is the ARPS reflectivity.
16Signal Detection Theory
- Adjust the decision criterion for model
reflectivity to be considered significant
(triggering yes response/forecast). Criterion
for observations to be treated as signal is kept
constant. Lenient (strict) model criteria
larger (smaller) fields larger (smaller) HR and
FAR scores.
17(No Transcript)
18(No Transcript)
19(No Transcript)
20Beyond SDT
- Another idea Keep the model criterion constant,
but adjust the criterion for the observation.
Calculate the HR and MR for each criterion and
plot. The result is a Relative Operating Level
(ROL) curve, which can be used to compare the
reliability of multiple systems. - Mason and Graham, Wea. Forecasting, 1999.
21Aspects of QualityBias
- Bias. Correspondence between mean forecast and
mean observation. - Murphy, Wea. Forecasting, 1993
22Aspects of QualityBias
- Bias (H FA)/(H M)
- Can be shown to be equivalent to
- Bias (HR FAR) (FAR/EF)
- As EF increases (decreases), Bias decreases
(increases).
23Aspects of Quality Reliability
- Reliability. Correspondence between conditional
mean observation and conditioning forecast,
averaged over all forecasts. - Murphy, Wea. Forecasting, 1993.
24Aspects of Quality Reliability
- Reliability of Yes Forecasts
- Hit Ratio (HR) H/(H FA)
- (Correct Alarm Ratio, Postagreement,
Frequency of Hits) - False Alarm Ratio (FAR) FA/(H FA)
- Note FAR 1 HR
25Aspects of Quality Reliability
- Hit Ratio is equivalent to
- HR
- HR --------------------------------
- (HR FAR) (FAR/EF)
- Or HR HR/Bias.
- As EF increases (decreases), HR increases
(decreases) and FAR decreases (increases). - If Bias 1, HR HR and FAR MR.
26Aspects of Quality Reliability
- Reliability of No Forecasts
- Miss Ratio (MR) M/(M CN)
- (Detection Failure Ratio)
- Correct Null Ratio (CNR) CN/(M CN)
- (Frequency of Correct Null Forecasts)
- Note CNR 1 MR
27Aspects of Quality Reliability
- Miss Ratio is equivalent to
- 1 - HR
- MR ---------------------------------------
- (FAR HR) (1 FAR)/EF
- As EF increases (decreases), MR increases
(decreases) and CNR decreases (increases).
28Aspects of QualityAccuracy and Skill
- Accuracy. Average correspondence between
individual pairs of forecasts and observations. - Skill. Accuracy of forecasts of interest
relative to accuracy of forecasts produced by
standard of reference. - Murphy, Wea. Forecasting, 1993
29Aspects of QualityAccuracy and Skill
- Accuracy
- Proportion Correct (PC) (H CN)/N
- Mean Square Error (MSR) (M FA)/N
- Note MSR 1 PC
30Aspects of QualityAccuracy and Skill
- PC is equivalent to
- PC (HR FAR 1)EF (1 FAR)
- If (HR FAR 1) gt 0 As EF increases, PC
increases - If (HR FAR 1) lt 0 As EF increases, PC
decreases
31Aspects of QualityAccuracy and Skill
- General form of Skill Score
- SS (PCs PCr)/(1 PCr)
- Where PCs is the PC of a forecast system, and PCr
is the PC of a reference forecast system
(climatology, persistence, chance, etc.)
32Aspects of QualityAccuracy and Skill
- Heidke Skill Score PCr is generated using
random forecasts with the same bias as the
forecast system being evaluated. - HSS
- 2(HCN-MFA)
- ------------------------------------------
- (HM)(MCN)(HFA)(FACN)
33Aspects of QualityAccuracy and Skill
- The Heidke Skill Score is equivalent to
- (2FAR2HR)(EF2)(2HR-2FAR)EF
- --------------------------------------------------
----- - (2FAR-2HR)(EF2)(HR-3FAR1)EFFAR
34Aspects of QualityAccuracy and Skill
- Appleman Skill Score PCr is generated using
constant forecasts of the most frequently
observed event (the best unskilled predictor). - A (H CN x)/(N x),
- x MAX( (H M), (FA CN) )
35Aspects of QualityAccuracy and Skill
- The Appleman Skill Score is equivalent to
- A HR FAR (FAR/EF)
- if EF lt 0.5.
- A (HR 1)/(1 EF) HR FAR 2
- if EF gt 0.5.
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43Quality What Should We Look At?
- Since accuracy, skill, reliability, and bias are
all sensitive to event frequency, it is more
difficult to use these types of scores to compare
two forecast systems. This is especially true if
there is a wide variability in the frequency of
events.
44The End
- Comments?
- Ideas?
- Constructive Criticism?