Measuring forecast skill: is it real skill or is it the varying climatology - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Measuring forecast skill: is it real skill or is it the varying climatology

Description:

Measuring forecast skill: is it real skill or. is it the varying climatology? Tom Hamill ... Acknowledgements: Matt Briggs, Dan Wilks, Craig Bishop, Beth Ebert, ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 29
Provided by: CDCN8
Category:

less

Transcript and Presenter's Notes

Title: Measuring forecast skill: is it real skill or is it the varying climatology


1
Measuring forecast skill is it real skill or
is it the varying climatology?
  • Tom Hamill
  • NOAA / ESRL / PSD, Boulder, Colorado
    tom.hamill_at_noaa.gov www.cdc.noaa.gov/people/tom.h
    amill
  • Josip Juras
  • University of Zagreb, Croatia

2
Hypothesis
  • If climatological event probability varies among
    samples, then many verification metrics will
    credit a forecast with extra skill it doesnt
    deserve - the extra skill comes from the
    variations in the climatology.

3
Consider Equitable Threat Scores
4
  • Consider Equitable
  • Threat Scores
  • ETS location-dependent,
  • related to climatological
  • probability.

5
  • Consider Equitable
  • Threat Scores
  • ETS location-dependent,
  • related to climatological
  • probability.
  • (2) Average of ETS at
  • individual grid points 0.28

6
  • Consider Equitable
  • Threat Scores
  • ETS location-dependent,
  • related to climatological
  • probability.
  • (2) Average of ETS at
  • individual grid points 0.28
  • (3) ETS after data lumped into
  • one big table 0.42

7
Considering three metrics
  • (1) Brier Skill Score (1perfect, 0reference)

(2) Relative Operating Characteristic
(3) Equitable Threat Score
(each will show this tendency to have scores vary
depending on how theyre calculated)
8
Equitable Threat Scorestandard method of
calculation
Assume we have a deterministic forecast All m
samples, at all locations and times, populate
one contingency table Event
forecast? YES NO
-------------------------------------------------
YES a b Event ------------------
------------------------------- Observed?
NO c d -----------------------------
--------------------
where
9
Equitable Threat Scorealternative method of
calculation
Consider the possibility of different regions
with different climates. Assume nc contingency
tables, each associated with samples with a
distinct climatological event frequency. ns(k)
out of the m samples were used to populate the
kth table. ETS calculated separately for each
contingency table, and alternative,
weighted- average ETS is calculated as
BSS?
ROC?
10
ETS calculated two ways
11
Example of unexpected skill two islands, zero
meteorologists
Imagine a planet with a global ocean and two
isolated islands. Weather forecasting other
than climatology for each island is
impossible. Island 1 Forecast, observed
uncorrelated, N (?, 1) Island 2 Forecast,
observed uncorrelated, N (?, 1) 0 ?
5 Event Observed gt 0
12
Two islands
As ? increases
Island 2
Island 1
But still, each islands forecast is no better
than a random draw from its climatology. Expect
no skill.
13
Skill with conventional methods of calculation
Reference climatology implicitly becomes N(?,1)
N(?,1) not N(?,1) OR N(?,1)
14
The new reference climatology
15
Are standard methods wrong?
  • Assertion weve just re-defined climatology,
    theyre the correct scores with reference to that
    climatology.
  • Response You can calculate them this way, but
    you shouldnt.
  • You will draw improper inferences due to lurking
    variable - i.e., the varying climatology should
    be a predictor.
  • Discerning real skill or skill difference gets
    tougher

One method that is sometimes used is to combine
all the data into a single 2x2 table this
procedure is legitimate only if the probability
p of an occurrence (on the null hypothesis) can
be assumed to be the same in all the individual
2x2 tables. Consequently, if p obviously varies
from table to table, or we suspect that it may
vary, this procedure should not be used. W. G.
Cochran, 1954, discussing ANOVA tests
16
Related problem when means are the same but
climatological variances differ
  • Event v gt 2.0
  • Island 1 f N(0,1), v N(0,1), Corr (f,v)
    0.0
  • Island 2 f N(0,?), v N(0,?), 1 ?? 3,
    Corr (f,v) 0.9
  • Expectation positive skill over two islands, but
    not a function of ?

17
Scores vary with ?
more
18
the island with the greater climatological
uncertainty of the observed event ends up
dominating the calculations.
more
19
Solutions ?
  • (1) Analyze events where climatological
    probabilities are the same at all locations,
    e.g., terciles.

20
Solutions, continued
  • (2) Use sample-weighted averages

ROC
21
Conclusions
  • Many conventional verification metrics like BSS,
    RPSS, threat scores, ROC, potential economic
    value, etc. can be overestimated if climatology
    varies among samples.
  • results in false inferences think theres skill
    where theres none.
  • complicates evaluation of model improvements
    Model A better than Model B, but doesnt appear
    quite so since both inflated in skill.
  • Fixes
  • Consider events where climatology doesnt vary
    such as the exceedance of a quantile of the
    climatological distribution
  • Combine after calculating for distinct
    climatologies.
  • Please Document your method for calculating a
    score!

Acknowledgements Matt Briggs, Dan Wilks, Craig
Bishop, Beth Ebert, Steve Mullen, Simon Mason,
Bob Glahn, Neill Bowler, Ken Mylne, Bill Gallus,
Frederic Atger, Francois LaLaurette, Zoltan
Toth, Jeff Whitaker.
22
Brier Skill Scores from raw ensembles
Event is whether the observed weather will be
above threshold T. Let Xe(j) X1(j), ,
Xn(j) be n-member ensemble forecast of the
relevant scalar variable for the j th of m
samples (taken over many case days and / or
locations). Ensemble sorted from lowest to
highest. Convert sorted ensemble to an
n-member binary forecast Ie(j) I1(j), ,
In (j) . 1 if Xi(j) gtT, 0 if Xi(j)
T Observed weather also converted to
binary, denoted by Io(j).
forecast probability
Brier score of the forecast
back
23
Brier Skill Score, continued
Standard method of calculation Single
climatological probability calculated over all m
samples
BSS 1.0 BSf / BSc (def. of Brier Skill
Score)
climatological event probability
Brier Score of climatology
24
Brier Skill Score, continued
Alternative 1 Multiple climatological
probabilities calculated for different regions,
then summed.
Suppose m samples split up into nc subsets, each
with a distinct climatological event frequency.
Let pc(k) be the climatological probability in
the kth of the nc subsets, with ns(k) samples in
this subset. Let rk r(1) , , r(ns(k)) be
the associated set of sample indices out of the m
samples.
Brier score of climatology calculated separately
for each subset.
The overall Brier score of climatology is the sum
of scores for each subset.
25
Brier Skill Score, continued
Alternative 2 Final BSS is sample-weighted
average of BSS for each subset.
calculate forecast Brier score separately
for each distinct region, as was done for
climatology
ns(k) / m is the weight applied to that
regions BSS
26
Relative Operating Characteristicstandard
method of calculation
Populate 2x2 contingency tables, separate one for
each sorted ensemble member. The contingency
table for the ith sorted ensemble member is
Event forecast by ith member?
YES NO ---------------------------
---------------------------- YES ai bi
Event ------------------------------------
------------------- Observed? NO ci di
----------------------------------------------
---------
( ai bi ci di
1)
(false alarm rate)
(hit rate)
ROC is a plot of hit rate (y) vs. false alarm
rate (x). Commonly summarized by area under
curve (AUC), 1.0 for perfect forecast, 0.5 for
climatology.
back
27
Relative Operating Characteristic alternative
method of calculation
As with the BSS, suppose samples can be
partitioned into nc subsets, each associated
with a distinct climatological event frequency.
Using the ns(k) samples, the hit rates and false
alarm rates for the kth climatology
Then calculate sample-weighted average hit rates
and false alarm rates
28
Island 1, 0.0232 ETS -0.0022, HR 0.0172
Island 2, ? 1, 0.0288 ETS 0.4195, HR
0.5937
ETS(combined table) 0.193, HR 0.336
Island 2, ? 3, 0.26 ETS 0.5327, HR
0.778
ETS(combined table) 0.499, HR 0.715
back
Write a Comment
User Comments (0)
About PowerShow.com