Verification of Probability Forecasts at Points WMO QPF Verification Workshop Prague, Czech Republic 14-16 May 2001 Barbara G. Brown NCAR Boulder, Colorado, U.S.A. bgb@ucar.edu - PowerPoint PPT Presentation

About This Presentation
Title:

Verification of Probability Forecasts at Points WMO QPF Verification Workshop Prague, Czech Republic 14-16 May 2001 Barbara G. Brown NCAR Boulder, Colorado, U.S.A. bgb@ucar.edu

Description:

Objective (deterministic, statistical, ensemble-based) Subjective ... Throws away important information related to the ordering of predictands and magnitude of error ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 68
Provided by: bgb
Category:

less

Transcript and Presenter's Notes

Title: Verification of Probability Forecasts at Points WMO QPF Verification Workshop Prague, Czech Republic 14-16 May 2001 Barbara G. Brown NCAR Boulder, Colorado, U.S.A. bgb@ucar.edu


1
Verification of Probability Forecasts at Points
WMO QPF Verification WorkshopPrague, Czech
Republic14-16 May 2001 Barbara G.
BrownNCARBoulder, Colorado, U.S.A.bgb_at_ucar.edu
2
Why probability forecasts?
  • the widespread practice of ignoring uncertainty
    when formulating and communicating forecasts
    represents an extreme form of inconsistency and
    generally results in the largest possible
    reductions in quality and value.
  • --Murphy (1993)

3
Outline
  • Background and basics
  • Types of events
  • Types of forecasts
  • Representation of probabilistic forecasts in the
    verification framework

4
Outline continued
  • Verification approaches focus on 2-category case
  • Measures
  • Graphical representations
  • Using statistical models
  • Signal detection theory
  • Ensemble forecast verification
  • Extensions to multi-category verification problem
  • Comparing probabilistic and categorical forecasts
  • Connections to value
  • Summary, conclusions, issues

5
Background and basics
  • Types of events
  • Two-category
  • Multi-category
  • Two-category events
  • Either event A happens or Event B happens
  • Examples Rain/No-rain
  • Hail/No-hail
  • Tornado/No-tornado
  • Multi-category event
  • Event A, B, C, .or Z happens
  • Example Precipitation categories
  • (lt 1 mm, 1-5 mm, 5-10 mm, etc.)

6
Background and basics cont.
  • Types of forecasts
  • Completely confident
  • Forecast probability is either 0 or 1
  • Example Rain/No rain
  • Probabilistic
  • Objective (deterministic, statistical,
    ensemble-based)
  • Subjective
  • Probability is stated explicitly

7
Background and basics cont.
  • Representation of probabilistic forecasts in the
    verification framework
  • x 0 or 1
  • f 0, , 1.0
  • f may be limited to only certain values between
  • 0 and 1
  • Joint distribution
  • p(f,x), where x 0, 1
  • Ex If there are 12 possible values of f, then
    p(f,x) is comprised of 24 elements

8
Background and basics, cont.
  • Factorizations Conditional and marginal
    probabilities
  • Calibration-Refinement factorization
  • p(f,x) p(xf) p(f)
  • p(x0f) 1 p(x1f) 1 E(xf)
  • Only one number is needed to specify the
    distribution
  • p(xf) for each f
  • p(f) is the frequency of use of each forecast
    probability
  • Likelihood-Base Rate factorization
  • p(f,x) p(fx) p(x)
  • p(x) is the relative frequency of a Yes
    observation (e.g., the sample climatology of
    precipitation) p(x) E(x)

9
Attributes from Murphy and Winkler(1992)
(sharpness)
10
Verification approaches 2x2 case
Completely confident forecasts
Use the counts in this table to compute various
common statistics (e.g., POD, POFD, H-K, FAR,
CSI, Bias, etc.)
11
Verification measures for 2x2 (Yes/No) completely
confident forecasts
12
Relationships among measures in the 2x2 case
Many of the measures in the 2x2 case are strongly
related in surprisingly complex ways. For example
13
0.10
0.30
0.50
0.70
0.90
The lines indicate different values of POD and
POFD (where POD POFD). From Brown and Young
(2000)
14
CSI as a function of p(x1) and PODPOFD
0.9
0.7
0.5
0.3
0.1
15
CSI as a function of FAR and POD
16
Measures for Probabilistic Forecasts
  • Summary measures
  • Expectation
  • Conditional
  • E(fx0), E(fx1)
  • E(xf)
  • Marginal
  • E(f)
  • E(x) p(x1)
  • Correlation
  • Joint distribution
  • Variability
  • Conditional
  • Var.(fx0), Var(fx1)
  • Var(xf)
  • Marginal
  • Var(f)
  • Var(x) E(x)1-E(x)

17
From Murphy and Winkler (1992)Summary measures
for joint and marginal distributions
18
From Murphy and Winkler (1992)Summary measures
for conditional distributions
19
Performance measures
  • Brier score
  • Analogous to MSE negative orientation
  • For perfect forecasts BS0
  • Brier skill score
  • Analogous to MSE skill score

20
From Murphy and Winkler (1992)
21
Brier score displays
From Shirey and Erickson, http//www.nws.noaa.gov/
tdl/synop/amspapers/masmrfpap.htm
22
Brier score displays
From http//www.nws.noaa.gov/tdl/synop/mrfpop/main
frames.htm
23
Decomposition of the Brier Score
  • Break Brier score into more elemental components

Reliability
Resolution
Uncertainty
Where I the number of distinct probability
values and
Then, the Brier Skill Score can be re-formulated
as
24
Graphical representations of measures
  • Reliability diagram
  • p(x1fi) vs. fi
  • Sharpness diagram
  • p(f)
  • Attributes diagram
  • Reliability, Resolution, Skill/No-skill
  • Discrimination diagram
  • p(fx0) and p(fx1)
  • Together, these diagrams provide a relatively
    complete picture of the quality of a set of
    probability forecasts

25
Reliability and Sharpness (from Wilks 1995)
Climatology
Minimal RES
Underforecasting
Good RES, at expense of REL
Reliable forecasts of rare event
Small sample size
26
Reliability and Sharpness (from Murphy and
Winkler 1992)
Sub
Model
St. Louis 12-24 h PoP Cool Season
Model
Sub
No skill
No RES
27
Attributes diagram (from Wilks 1995)
28
Icing forecast examples
29
Use of statistical models to describe
verification features
  • Exploratory study by Murphy and Wilks (1998)
  • Case study
  • Use regression model to model reliability
  • Use Beta distribution to model p(f) as measure of
    sharpness
  • Use multivariate diagram to display combinations
    of characteristics
  • Promising approach that is worthy of more
    investigation

30
Fit Beta distribution to p(f) 2 parameters p. q
Ideal plt1 qlt1
1
0
31
Fit regression to Reliability diagram p(xf)
vs. f 2 parameters b0, b1
Murphy and Wilks (1997)
32
Summary Plot
Murphy and Wilks 1997
33
Signal Detection Theory (SDT)
  • Approach that has commonly been applied in
    medicine and other fields
  • Brought to meteorology by Ian Mason (1982)
  • Evaluates the ability of forecasts to
    discriminate between occurrence and
    non-occurrence of an event
  • Summarizes characteristics of the Likelihood-Base
    Rate decomposition of the framework
  • Tests model performance relative to specific
    threshold
  • Ignores calibration
  • Allows comparison of categorical and
    probabilistic forecasts

34
Mechanics of SDT
  • Based on likelihood-base rate decomposition
  • p(f,x) p(fx) p(x)
  • Basic elements
  • Hit rate (HR)
  • HR POD YY / (YYNY)
  • Estimate of p(f1x1)
  • False Alarm Rate (FA)
  • FA 1 - POFD YN / (YN NN)
  • Estimate of p(f1x0)
  • Relative Operating Characteristic curve
  • Plot HR vs. FA

35
ROC Examples Mason(1982)
36
ROC Examples Icing forecasts
37
ROC
  • Area under the ROC is a measure of forecast skill
  • Values less than 0.5 indicate negative skill
  • Measurement of ROC Area often is better if a
    normal distribution model is used to model HR and
    FA
  • Area can be underestimated if curve is
    approximated by straight line segments
  • Harvey et al (1992), Mason (1982) Wilson (2000)

38
Idealized ROC (Mason 1982)
f(x1)
f(x0)
f(x0)
f(x1)
f(x1)
f(x0)
S2
S1
S0.5
S s0 / s1
39
Comparison of Approaches
  • Brier score
  • Based on squared error
  • Strictly proper scoring rule
  • Calibration is an important factor lack of
    calibration impacts scores
  • Decompositions provide insight into several
    performance attributes
  • Dependent on frequency of occurrence of the event
  • ROC
  • Considers forecasts ability to discriminate
    between Yes and No events
  • Calibration is not a factor
  • Less dependent on frequency of occurrence of
    event
  • Provides verification information for individual
    decision thresholds

40
Relative operating levels
  • Analogous to the ROC, but from the
    Calibration-Refinement perspective (i.e., given
    the forecast)
  • Curves based on
  • Correct Alarm Ratio
  • Miss Ratio
  • These statistics are estimates of two conditional
    probabilities
  • Correct Alarm Ratio p(x1f1)
  • Miss Ratio p(x1f0)
  • For a system with no skill, p(x1f1)
    p(x1f0) p(x)

41
ROC Diagram (Mason and Graham 1999)
42
ROL Diagram (Mason and Graham 1999)
43
Verification of ensemble forecasts
  • Output of ensemble forecasting systems can be
    treated as
  • A probability distribution
  • A probability
  • A categorical forecast
  • Probabilistic forecasts from ensemble systems can
    be verified using standard approaches for
    probabilistic forecasts
  • Common methods
  • Brier score
  • ROC

44
Example Palmer et al. (2000)Reliability
ECMWF ensemble
Multi-model ensemble
lt0
lt1
45
Example Palmer et al. (2000)ROC
ECMWF ensemble
Multi-model ensemble
46
Verification of ensemble forecasts (cont.)
  • A number of methods have been developed
    specifically for use with ensemble forecasts. For
    example
  • Rank histograms
  • Rank position of observations relative to
    ensemble members
  • Ideal Uniform distribution
  • Non-ideal can occur for many reasons (Hamill
    2001)
  • Ensemble distribution approach
  • (Wilson et al. 1999)
  • Fit distribution to ensemble
  • Determine probability associated with that
    observation

47
Rank histograms
48
Distribution approach (Wilson et al. 1999)
49
Extensions to multiple categories
  • Examples
  • QPF with several thresholds/categories
  • Approach 1 Evaluate each category on its own
  • Compute Brier score, reliability, ROC, etc. for
    each category separately
  • Problems
  • Some categories will be very rare, have few Yes
    observations
  • Throws away important information related to the
    ordering of predictands and magnitude of error

50
Example Brier skill score for several categories
From http//www.nws.noaa.gov/tdl/synop/mrfpop/main
frames.htm
51
Extensions to multiple categories (cont.)
  • Approach 2 Evaluate all categories
    simultaneously
  • Rank Probability Score (RPS)
  • Analogous to Brier Score for multiple categories
  • Skill score
  • Decompositions analogous to BS, BSS

52
Multiple categories Examples of alternative
approaches
  • Continuous ranked probability score
  • (Bouttier 1994 Brown 1974 Matheson and Winkler
    1976 Unger 1985)
  • and decompositions (Hersbach 2000)
  • Analogous to RPS with infinite number of classes
  • Decompose into Reliability and Resolution/uncertai
    nty components
  • Multi-category reliability diagrams (Hamill 1997)
  • Measures calibration in a cumulative sense
  • Reduces impact of categories with few forecasts
  • Other references
  • Bouttier 1994
  • Brown 1974
  • Matheson and Winkler 1976
  • Unger 1985

53
Continuous RPS example (Hersbach 2000)
54
MCRD example (Hamill 1997)
55
Connections to value
  • Cost-Loss ratio model
  • Optimal to protect whenever C lt pL or p gt C/L
  • where p is the probability of adverse weather

56
Wilks Value Score (Wilks 2001)
  • VS is the percent improvement in value between
    climatological and perfect information as a
    function of C/L
  • VS is impacted by (lack of) calibration
  • VS can be generalized for particular/idealized
    distributions of C/L

57
VS example Wilks (2001)
Las Vegas, PoP April 1980 March 1987
58
VS example Icing forecasts
59
VS Beta model example (Wilks 2001)
60
Richardson approach
  • ROC context
  • Calibration errors dont impact the score

61
Miscellaneous issues
  • Quantifying the uncertainty in verification
    measures
  • Issue Spatial and temporal correlation
  • A few approaches
  • Parametric methods
  • Ex Seaman et al. (1996)
  • Robust methods (confidence intervals for medians)
  • Ex Brown et al. (1997)
  • Velleman and Hoaglin (1981)
  • Bootstrap methods
  • Ex Hamill (1999)
  • Kane and Brown (2001)
  • Treatment of observations as probabilistic?

62
Conclusions
  • Basis for evaluating probability forecasts was
    established many years ago (Brier, Murphy,
    Epstein)
  • Recent renewal in interest has led to new ideas
  • Still more to do
  • Develop and implement a cohesive set of
    meaningful and useful methods
  • Develop greater understanding of methods we have
    and how they inter-relate

63
Verification of Probabilistic QPFs Selected
References
  • Brown, B.G., G. Thompson, R.T. Bruintjes, R.
    Bullock and T. Kane, 1997 Intercomparison of
    in-flight icing algorithms. Part II Statistical
    verification results. Weather and Forecasting,
    12, 890-914.
  • Davis, C., and F. Carr, 2000 Summary of the 1998
    workshop on mesoscale model verification.
    Bulletin of the American Meteorological Society,
    81, 809-819.
  •  
  • Hamill, T.M., 1997 Reliability diagrams for
    multicategory probabilistic forecasts. Weather
    and Forecasting, 12, 736741.
  •  
  • Hamill, T.M., 1999 Hypothesis tests for
    evaluating numerical precipitation forecasts.
    Weather and Forecasting, 14, 155-167.
  •  
  • Hamill, T.M., 2001 Interpretation of rank
    histograms for verifying ensemble forecasts.
    Monthly Weather Review, 129, 550-560.
  •  

64
References (cont.)
  • Harvey, L.O., Jr., K.R. Hammond, C.M. Lusk, and
    E.F. Mross, 1992 The application of signal
    detection theory to weather forecasting behavior.
    Monthly Weather Review, 120, 863-883.
  •  
  • Hersbach, H., 2000 Decomposition of the
    continuous ranked probability score for ensemble
    prediction systems. Weather and Forecasting, 15,
    559-570.
  •  
  • Hsu, W.-R., and A.H. Murphy, 1986 The attributes
    diagram A geometrical framework for assessing
    the quality of probability forecasts.
    International Journal of Forecasting, 2, 285-293.
  •  
  • Kane, T.L., and B.G. Brown, 2000 Confidence
    intervals for some verification measures a
    survey of several methods. Preprints, 15th
    Conference on Probability and Statistics in the
    Atmospheric Sciences, 8-11 May, Asheville, NC,
    U.S.A., American Meteorological Society (Boston),
    46-49.
  •  

65
References (cont.)
  • Mason, I., 1982 A model for assessment of
    weather forecasts. Australian Meteorological
    Magazine, 30, 291-303.
  •  
  • Mason, I., 1989 Dependence of the critical
    success index on sample climate and threshold
    probability. Australian Meteorological Magazine,
    37, 75-81.
  •  
  • Mason, S., and N.E. Graham, 1999 Conditional
    probabilities, relative operating
    characteristics, and relative operating levels.
    Weather and Forecasting, 14, 713-725.
  •  
  • Murphy, A.H., 1993 What Is a god forecast? An
    essay on the nature of goodness in weather
    forecasting. Weather and Forecasting, 8, 281293.
  •  
  • Murphy, A.H., and D.S. Wilks, 1998 A case study
    of the use of statistical models in forecast
    verification Precipitation probability
    forecasts. Weather and Forecasting, 13, 795-810.
  •  

66
References (cont.)
  • Murphy, A.H., and R.L. Winkler, 1992 Diagnostic
    verification of probability forecasts.
    International Journal of Forecasting, 7, 435-455.
  •  
  • Richardson, D.S., 2000 Skill and relative
    economic value of the ECMWF ensemble prediction
    system. Quarterly Journal of the Royal
    Meteorological Society, 126, 649-667.
  •  
  • Seaman, R., I. Mason, and F. Woodcock, 1996
    Confidence intervals for some performance
    measures of Yes-No forecasts. Australian
    Meteorological Magazine, 45, 49-53.
  •  
  • Stanski, H., L.J. Wilson, and W.R. Burrows, 1989
    Survey of common verification methods in
    meteorology. WMO World Weather Watch Tech. Rep.
    8, 114 pp.
  •  
  • Velleman, P.F., and D.C. Hoaglin, 1981
    Applications, Basics, and Computing of
    Exploratory Data Analysis. Duxbury Press, 354 pp.
  •  

67
References (cont.)
  • Wilks, D.S., 1995 Statistical Methods in the
    Atmospheric Sciences, Academic Press, San Diego,
    CA, 467 pp.
  •  
  • Wilks, D.S., 2001 A skill score based on
    economic value for probability forecasts.
    Meteorological Applications, in press.
  •  
  • Wilson, L.J., W.R. Burrows, and A. Lanzinger,
    1999 A strategy for verification of weather
    element forecasts from an ensemble prediction
    system. Monthly Weather Review, 127, 956-970.
Write a Comment
User Comments (0)
About PowerShow.com