Title: An Overview of the Benefits of Calibration Using Reforecasts
1An Overview of the Benefits of Calibration Using
Reforecasts
prepared for 2006 NCEP Ensemble Users Workshop,
Oct-Nov 2006
NOAA Earth System Research Laboratory
- Tom Hamill and Jeff Whitaker
- NOAA / ESRL, Physical Sciences Div.
- tom.hamill_at_noaa.gov
- Inspiration Paul Dallavalle presentation,
Norfolk WAF/NWP meeting, summer 1996
2NOAAs reforecast data set
- Reforecast definition a data set of
retrospective numerical forecasts using the same
model as is used to generate real-time forecasts. - Model T62L28 NCEP GFS, circa 1998
- Initial States NCEP-NCAR Reanalysis II plus 7
/- bred modes. - Duration 15 days runs every day at 00Z from
19781101 to now. (http//www.cdc.noaa.gov/people/j
effrey.s.whitaker/refcst/week2). - Data Selected fields (winds, hgt, temp on 5
press levels, precip, t2m, u10m, v10m, pwat,
prmsl, rh700, heating). NCEP/NCAR reanalysis
verifying fields included (Web form to download
at http//www.cdc.noaa.gov/reforecast). - Real-time probabilistic precipitation forecasts
http//www.cdc.noaa.gov/reforecast/narr
3Main Points
- Large improvement in probabilistic forecast skill
and reliability by calibrating using large,
stable data set of NWP forecasts / obs. - Generally
- smaller training sample size --gt small benefit.
- large training sample size --gt large benefit.
- Improvements are larger for surface variables
(surface temperature, precipitation) than for
upper-air variables (Z500). - Use for bias correction, of course. But also
useful for calibration of spread deficiencies,
statistical downscaling.
4More background in January 2006 BAMS and other
articles. Reference list provided after
conclusions.
5Wouldnt it be nice if we could calibrate with
only a past few forecasts?
But, consider training with a short sample in a
climatologically dry region. How could you
calibrate this latest forecast?
youd like enough training data to have
some similar events at a similar time of year to
this one.
6Calibration principles
- Would like f(OF), that is, the pdf of the
expected observed state given the forecast. - Calibration should implicitly
- adjust for model bias
- adjust for any spread deficiency
- downscale (coarse prediction grid --gt predictable
local detail in observations).
7Analog high-resolution precipitation forecast
calibration technique
(actually run with 10 to 75 analogs)
8Analog high-resolution precipitation forecast
calibration technique
Approximate O F
(actually run with 10 to 75 analogs)
9Reforecasts and statistical downscaling
Downscaling using PRISM / Mountain Mapper
technology (C. Daly. Oregon St., NOAA RFCs,
OHD)
10Recent OR-WA floods, 3-6 day forecast
11Verified over 25 years of forecasts skill
scores use conventional method of calculation
which may overestimate skill (Hamill and Juras
2006).
?
12Effect of training sample size
colors of dots indicate which size analog
ensemble provided the largest amount of skill.
13Calibration of Z500, T850, T2m
- Errors generally more well behaved than
precipitation more normally distributed. - However, spread deficiency worse for T2m.
14Calibration techniques
- Uncalibrated PDF from raw ensemble
- Gross Bias Correction
- (1) Calculate Mean B F - O
- (2) Corrected ens raw ens B
- Analog method
- Similar to method for precipitation, but now find
forecast analogs using only the current grid
points data. 50 members. - Wilks and Hamill (2006, MWR, to appear) found
that many other calibration methods (e.g.
logistic regression, non-homogeneous Gaussian
regression) were similar in performance.
15Verification of Z500, T850, T2m
- Northern Hemisphere (Z500, T850) 00Z North
American surface obs with gt 97 complete record
from 1979 - 2004 (T2m) - Use continuous ranked probability skill score
(CRPSS 0no skill, 1perfect) use method of
calculation in Hamill and Juras (2006, Oct.
QJRMS) to avoid overestimating skill when
climatology varies.
16Z500 CRPSS
17T850 CRPSS
18T2m CRPSS
?
19Issues (1) should reanalyses be part of
reforecast process?
- Want homogeneous
- characteristics of forecasts skill the same for
1980s forecasts as 2006 forecasts. - Part of better skill of current forecasts is the
better initial condition. - Reanalysis would improve skill of old forecasts.
- Reanalyses should use same or similar model as
used in reforecasts.
20Issues (2) Are reforecasts still necessary with
improved models?
ECMWF produced a short reforecast data
set. Calibration using their week-2 reforecasts p
roduced a skill increase of 11 for our
reforecast, skill improvement was 16
Whitaker and Vitart (2006)
21Issues (3) NCEP proposes a single-member T126
reforecast. Is that enough?
Analog reforecast process repeated, as in prior
cartoon. But now rather than matching ensemble-mea
n pattern, match todays control forecast to
past control forecast. Grey area measures
degradation relative to baseline using
ensemble mean. Not much degradation in skill,
esp. at short leads! (and you dont even have to
run an ensemble to get a probabilistic forecast).
22Conclusions
- Large improvement in probabilistic forecast skill
and reliability by calibrating using large,
stable data set of NWP forecasts / obs. - The benefit youll get from a much smaller
training sample size is correspondingly much
smaller. - Improvements are larger for surface variables
(surface temperature, precipitation) than for
upper-air variables (Z500). - Calibration achieves more if you do more than a
bias correction for the mean error.
23References
Hamill, T. M., J. S. Whitaker, and X. Wei, 2003
Ensemble re-forecasting improving medium-range
forecast skill using retrospective forecasts.
Mon. Wea. Rev., 132, 1434-1447.
http//www.cdc.noaa.gov/people/tom.hamill/reforeca
st_mwr.pdf Hamill, T. M., J. S. Whitaker, and
S. L. Mullen, 2005 Reforecasts, an important
dataset for improving weather predictions. Bull.
Amer. Meteor. Soc., 87, 33-46. http//www.cdc.noaa
.gov/people/tom.hamill/refcst_bams.pdf
Whitaker, J. S, F. Vitart, and X. Wei, 2006
Improving week two forecasts with multi-model
re-forecast ensembles. Mon. Wea. Rev., 134,
2279-2284. http//www.cdc.noaa.gov/people/jeffrey.
s.whitaker/Manuscripts/multimodel.pdf Hamill,
T. M., and J. S. Whitaker, 2006 Probabilistic
quantitative precipitation forecasts based on
reforecast analogs theory and application. Mon.
Wea. Rev., in press. http//www.cdc.noaa.gov/peopl
e/tom.hamill/reforecast_analog_v2.pdf Hamill,
T. M., and J. Juras, 2006 Measuring forecast
skill is it real skill or is it the varying
climatology? Quart. J. Royal Meteor. Soc., in
press. http//www.cdc.noaa.gov/people/tom.hamill/s
kill_overforecast_QJ_v2.pdf Wilks, D. S., and
T. M. Hamill, 2006 Comparison of ensemble-MOS
methods using GFS reforecasts. Mon. Wea. Rev., in
press. http//www.cdc.noaa.gov/people/tom.hamill/W
ilksHamill_emos.pdf Hamill, T. M. and J. S.
Whitaker, 2006 White Paper. Producing
high-skill probabilistic forecasts
using reforecasts implementing the National
Research Council vision. Available at
http//www.cdc.noaa.gov/people/tom.hamill/whitepap
er_reforecast.pdf .
24Daily Max Temp CRPSS
- Notes
- Skill much lower than T850 station data, Tmax
trained on 00Z temp, worse model biases? - (2) Consistent 1-day impact of large sample size
Wilks 45-d?
25back
26Prior 45 days?
27(No Transcript)
28Bias correction using forecast and observed CDFs?
29T2m CRPSS, low and high climatological spread
30850 hPa temperature bias for a grid point in
the central U.S.
Spread of yearly bias estimates from
31-day running mean F-O Note the spread is
often larger than the bias, especially for long
leads.
31Comparison against NCEP medium-range T126
ensemble, ca. 2002
the improvement is a little bit of increased
reliability, a lot of increased resolution.