Categorical Forecast Verification Issues - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Categorical Forecast Verification Issues

Description:

9/23/09. 1. Categorical Forecast Verification Issues. Eric M. Kemp. 9/23/09. 2. Topics ... Reviewing aspects of verification to select appropriate statistical scores. ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 45
Provided by: cap91
Category:

less

Transcript and Presenter's Notes

Title: Categorical Forecast Verification Issues


1
Categorical Forecast Verification Issues
  • Eric M. Kemp

2
Topics
  • Evaluating aspects of an ARPS forecast (e.g.,
    composite reflectivity) using a categorical
    forecast approach.
  • Reviewing aspects of verification to select
    appropriate statistical scores.
  • Investigating use of signal detection theory.

3
Categorical Forecasts
  • Yes/No forecasts of some type of category (gt40
    dBZ reflectivity, occurrence of precipitation,
    tornado, etc.)
  • Categorical forecasts are matched with
    observations of events in contingency table.
  • For this work contingency table is 2?2.

4
Contingency Table
5
Contingency Table
  • Useful value
  • Event Frequency (EF) (H M)/N

6
Aspects of Verification
  • Consistency. Correspondence between judgments
    and forecasts.
  • Quality. Correspondence between forecasts and
    observations.
  • Value. Incremental benefits of forecasts to
    users.
  • Murphy, Wea. Forecasting, 1993.

7
Aspects of Quality Discrimination
  • Discrimination. Correspondence between
    conditional mean forecast and conditioning
    observation, averaged over all observations.
  • Murphy, Wea. Forecasting, 1993.

8
Aspects of Quality Discrimination
  • Discrimination of Yes Events
  • Hit Rate (HR) H/(H M)
  • (Probability of Detection, Prefigurance)
  • Miss Rate (MR) M/(H M)
  • (Frequency of Misses)
  • Note MR 1 HR.

9
Aspects of QualityDiscrimination
  • Discrimination of No Events
  • False Alarm Rate (FAR) FA/(FA CN)
  • (Different from False Alarm Ratio!!!)
  • (Probability of False Detection)
  • Correct Null Rate (CNR) CN/(FA CN)
  • (Probability of Null Detection)
  • Note CNR 1 FAR

10
Aspects of QualityDiscrimination
  • Overall Discrimination
  • Pierce Discrimination Score HR FAR.
  • (Hanssen-Kuipers Discriminant, True Skill
    Score, others)

11
Quality What Should We Look At?
  • Discrimination scores appear to be more useful
    (provided that the scores are consistent from
    case to case).
  • Can be used to optimize a forecast system through
    the use of signal detection theory (SDT).

12
Signal Detection Theory
  • A system (human, guinea pig, computer model,
    etc.) responds to a stimulus by discriminating
    (correctly or incorrectly) between signal and
    noise. In the most simple case, there are two
    possible stimuli (noise and signal plus
    noise) and two possible categorical responses.

13
Signal Detection Theory
  • After subjecting the system to a number of
    trials, the categorical responses are matched
    with the noise and signal plus noise stimuli
    to construct a 2?2 contingency table, which is
    then used to calculate HR and FAR.
  • Results vary with decision criterion.

14
Signal Detection Theory
  • By changing the decision criterion for a
    response, we can construct multiple contingency
    tables and plot a curve of HR, FAR points based
    on the tables. The curve describes the systems
    discrimination ability (called the Relative
    Operating Characteristic, or ROC curve.)
  • ROC curves can be used to compare multiple
    systems and/or to select optimal decision
    criterion.

15
Signal Detection Theory
  • Idea Use SDT and ROC curves to evaluate how
    ARPS forecasts significant reflectivity (gt40
    dBZ).
  • For our purposes, the response is the
    categorical forecast of significant reflectivity
    for a specific decision criterion, the signal
    is the observed significant reflectivity, and the
    stimulus is the ARPS reflectivity.

16
Signal Detection Theory
  • Adjust the decision criterion for model
    reflectivity to be considered significant
    (triggering yes response/forecast). Criterion
    for observations to be treated as signal is kept
    constant. Lenient (strict) model criteria
    larger (smaller) fields larger (smaller) HR and
    FAR scores.

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Beyond SDT
  • Another idea Keep the model criterion constant,
    but adjust the criterion for the observation.
    Calculate the HR and MR for each criterion and
    plot. The result is a Relative Operating Level
    (ROL) curve, which can be used to compare the
    reliability of multiple systems.
  • Mason and Graham, Wea. Forecasting, 1999.

21
Aspects of QualityBias
  • Bias. Correspondence between mean forecast and
    mean observation.
  • Murphy, Wea. Forecasting, 1993

22
Aspects of QualityBias
  • Bias (H FA)/(H M)
  • Can be shown to be equivalent to
  • Bias (HR FAR) (FAR/EF)
  • As EF increases (decreases), Bias decreases
    (increases).

23
Aspects of Quality Reliability
  • Reliability. Correspondence between conditional
    mean observation and conditioning forecast,
    averaged over all forecasts.
  • Murphy, Wea. Forecasting, 1993.

24
Aspects of Quality Reliability
  • Reliability of Yes Forecasts
  • Hit Ratio (HR) H/(H FA)
  • (Correct Alarm Ratio, Postagreement,
    Frequency of Hits)
  • False Alarm Ratio (FAR) FA/(H FA)
  • Note FAR 1 HR

25
Aspects of Quality Reliability
  • Hit Ratio is equivalent to
  • HR
  • HR --------------------------------
  • (HR FAR) (FAR/EF)
  • Or HR HR/Bias.
  • As EF increases (decreases), HR increases
    (decreases) and FAR decreases (increases).
  • If Bias 1, HR HR and FAR MR.

26
Aspects of Quality Reliability
  • Reliability of No Forecasts
  • Miss Ratio (MR) M/(M CN)
  • (Detection Failure Ratio)
  • Correct Null Ratio (CNR) CN/(M CN)
  • (Frequency of Correct Null Forecasts)
  • Note CNR 1 MR

27
Aspects of Quality Reliability
  • Miss Ratio is equivalent to
  • 1 - HR
  • MR ---------------------------------------
  • (FAR HR) (1 FAR)/EF
  • As EF increases (decreases), MR increases
    (decreases) and CNR decreases (increases).

28
Aspects of QualityAccuracy and Skill
  • Accuracy. Average correspondence between
    individual pairs of forecasts and observations.
  • Skill. Accuracy of forecasts of interest
    relative to accuracy of forecasts produced by
    standard of reference.
  • Murphy, Wea. Forecasting, 1993

29
Aspects of QualityAccuracy and Skill
  • Accuracy
  • Proportion Correct (PC) (H CN)/N
  • Mean Square Error (MSR) (M FA)/N
  • Note MSR 1 PC

30
Aspects of QualityAccuracy and Skill
  • PC is equivalent to
  • PC (HR FAR 1)EF (1 FAR)
  • If (HR FAR 1) gt 0 As EF increases, PC
    increases
  • If (HR FAR 1) lt 0 As EF increases, PC
    decreases

31
Aspects of QualityAccuracy and Skill
  • General form of Skill Score
  • SS (PCs PCr)/(1 PCr)
  • Where PCs is the PC of a forecast system, and PCr
    is the PC of a reference forecast system
    (climatology, persistence, chance, etc.)

32
Aspects of QualityAccuracy and Skill
  • Heidke Skill Score PCr is generated using
    random forecasts with the same bias as the
    forecast system being evaluated.
  • HSS
  • 2(HCN-MFA)
  • ------------------------------------------
  • (HM)(MCN)(HFA)(FACN)

33
Aspects of QualityAccuracy and Skill
  • The Heidke Skill Score is equivalent to
  • (2FAR2HR)(EF2)(2HR-2FAR)EF
  • --------------------------------------------------
    -----
  • (2FAR-2HR)(EF2)(HR-3FAR1)EFFAR

34
Aspects of QualityAccuracy and Skill
  • Appleman Skill Score PCr is generated using
    constant forecasts of the most frequently
    observed event (the best unskilled predictor).
  • A (H CN x)/(N x),
  • x MAX( (H M), (FA CN) )

35
Aspects of QualityAccuracy and Skill
  • The Appleman Skill Score is equivalent to
  • A HR FAR (FAR/EF)
  • if EF lt 0.5.
  • A (HR 1)/(1 EF) HR FAR 2
  • if EF gt 0.5.

36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
Quality What Should We Look At?
  • Since accuracy, skill, reliability, and bias are
    all sensitive to event frequency, it is more
    difficult to use these types of scores to compare
    two forecast systems. This is especially true if
    there is a wide variability in the frequency of
    events.

44
The End
  • Comments?
  • Ideas?
  • Constructive Criticism?
Write a Comment
User Comments (0)
About PowerShow.com