ENSEMBLE PRODUCTS: DATABASE, PRODUCT GENERATION TOOLS, DELIVERY, VERIFICATION - PowerPoint PPT Presentation

About This Presentation
Title:

ENSEMBLE PRODUCTS: DATABASE, PRODUCT GENERATION TOOLS, DELIVERY, VERIFICATION

Description:

Raw operational & hind-cast data for all ... Include hind-casts to permit user ... Sample size larger (hind-casts important for longer leads with ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 64
Provided by: emcNce
Category:

less

Transcript and Presenter's Notes

Title: ENSEMBLE PRODUCTS: DATABASE, PRODUCT GENERATION TOOLS, DELIVERY, VERIFICATION


1
ENSEMBLE PRODUCTSDATABASE, PRODUCT GENERATION
TOOLS, DELIVERY, VERIFICATION
  • Zoltan Toth
  • Environmental Modeling Center
  • NOAA/NWS/NCEP
  • Acknowledgements Steve Lord, Ensemble Team
  • http//wwwt.emc.ncep.noaa.gov/gmb/ens/index.html

2
OUTLINE / SUMMARY
  • REQUIREMENTS
  • Critical in driving changes in operational
    organization like NWS
  • Must be reformulated and formalized for ensemble
    era
  • BASIC ENSEMBLE DATABASE
  • Raw operational and hind-cast ensemble member
    data
  • Bias corrected / downscaled member dat
  • PRODUCT GENERATION
  • Internal
  • Distribute derived products
  • External Decision Support Systems
  • Distribute basic dataset
  • DISTRIBUTION
  • Passive
  • All basic data must be available via ftp
  • On demand
  • Subsets of basic data
  • Derived products
  • VERIFICATION
  • User relevant - Bias corrected, downscaled

3
REQUIREMENTS
  • How to effect change in operational environment?
  • NWS is operational organization
  • Operations driven by requirements
  • How to change requirements without new supporting
    infrastructure?
  • Chicken and egg dilemma
  • New requirements are needed to support change
    from
  • Old traditional paradigm
  • Single value forecast to
  • New paradigm
  • To include forecast uncertainty
  • Areas to be considered
  • Observing system Observational uncertainty
  • Numerical guidance products Ensembles
  • Centralized (NCEP) and distributed (WFO) forecast
    products Design new suit
  • Adaptive procedures throughout forecast process,
    with 2-way feedback loop
  • Resource implications
  • Computational needs
  • Telecommunication needs
  • Training needs

4
ENSEMBLE DATABASE
  • Types of ensemble data
  • Raw ensemble output operational
  • Hind-cast ensemble for use in bias correction
  • Bias corrected ensemble data (centrally
    corrected)
  • Member by member
  • Distributional characteristics for selected
    variables
  • Downscaled ensemble data
  • Members
  • Distributions only
  • Requirements (ideally)
  • Ensemble member information needed for
  • Temporal/spatial covariance and cross-variable
    forecast information
  • Assumption ensemble has valuable information
    needs to be verified
  • General user
  • Bias corrected / downscaled data for all members,
    all variables of interest
  • Specialized user who wants to do special bias
    correction
  • Raw operational hind-cast data for all
    members/variables

5
INTERROGATION / PRODUCT GENERATION TOOLS
  • User interface
  • Users can ask all meaningful questions based on
    ensemble
  • Answers delivered in specified format
  • Data access
  • Utility that returns requested ensemble forecast
    data
  • Output to be used by
  • Product generator (internal)
  • Decision Support System (external)
  • Product generation
  • Prepares derived products based on ensemble data
  • Probabilities, etc
  • 10 functionalities within NAWIPS
  • Applications within NOMADS
  • Decision Support Systems
  • User specific application / product generator /
    decision support information

6
WHAT PRODUCTS?
  • MUST SUPPORT HIERARCHY OF PROGRESSIVELY MORE
    COMPLEX PRODUCTS
  • First moment only traditional approach
  • Best estimate of first moment (univariate mean or
    median)
  • Most Likely Forecast (multivariate)
  • Close to mean at short lead time
  • Display value only in highly non-linear phase (no
    unique solution)
  • Second moment related information added
  • Two bounds of distribution (10 90 percentile of
    pdf)
  • Likely scenarios (with likelihood), if they exist
    (based on statistical tests)
  • Multiple modes (univariate)
  • Clusters (multivariate)
  • Display value
  • Probability of events that are defined by
  • Single variable (univariate)
  • Multiple variables (multivariate)
  • Quantitative use
  • Full information preferred approach
  • Pdf (univariate)
  • Ensemble member trajectories (multivariate)

7
HOW TO DISSEMINATE PRODUCTS?
  • MULTIPLE CHANNELS
  • Routinely prepare and actively distribute
    often-used basic products
  • Uni-variate ensemble mean, spread, 10-90
    percentiles, PQPF, etc
  • AWIPS, NAWIPS, ftp, web
  • Stage all raw and bias-corrected ensemble
    forecasts on ftp sites
  • Internally, for product generation engine
  • On ftp sites, for professional users
  • Include hind-casts to permit user specific bias
    correction
  • Downscaled information will require orders of
    magnitude more storage
  • On-demand access to basic and derived information
    web/ftp access
  • Partner with public private sector service
    providers to serve their data needs
  • Use data access software to serve up required
    ensemble data for
  • Ftp data access (eg, NOMADS NCDC plans)
  • User processes information (Decision Support
    Systems, etc)
  • Product generator
  • Computation of derived products (on schedule or
    on demand)

8
VERIFICATION
  • Purposes
  • Assess value added at each step of forecast
    process
  • Assess value of newly developed components or
    configuration
  • Evaluation criteria
  • User relevant verification
  • What matters to different user groups
  • Based on best performance and other constraints
  • System that produces best downscaled information
    at given computational cost
  • Infrastructure requirement
  • Unified verification software for ensemble /
    probabilistic forecasts
  • Part of unified general verification framework
  • Shared across EMC, NCEP, NWS, NOAA, broader
    community?
  • THORPEX connection

9
OUTLINE / SUMMARY
  • REQUIREMENTS
  • Critical in driving changes in operational
    organization like NWS
  • Must be reformulated and formalized for ensemble
    era
  • BASIC ENSEMBLE DATABASE
  • Raw operational and hind-cast ensemble member
    data
  • Bias corrected / downscaled member dat
  • PRODUCT GENERATION
  • Internal
  • Distribute derived products
  • External Decision Support Systems
  • Distribute basic dataset
  • DISTRIBUTION
  • Passive
  • All basic data must be available via ftp
  • On demand
  • Subsets of basic data
  • Derived products
  • VERIFICATION
  • User relevant - Bias corrected, downscaled

10
BACKGROUND
11
OUTLINE / SUMMARY
  • WHY WE NEED ENSEMBLES?
  • Scientifically Capture case dependent variations
    in uncertainty
  • Users Downstream applicatns forced by nonlinear
    trajectories
  • WHEN ARE THEY CRITICAL?
  • Case-dependent variations in chaotic and model
    error growth
  • HOW CAN THEY BE GENERATED?
  • Multiple integrations
  • Initial perturbations
  • Model perturbations
  • Bias correction
  • Product generation
  • ISSUES IDENTIFIED
  • Often no consensus solution, further research
    needed to refine issues
  • Apparently most promising paths recommended

12
WHY ENSEMBLES?
  • TRADITIONAL PARADIGM
  • Single value forecast incomplete from viewpoints
    of
  • Science Inherently statistically inconsistent
    with observations
  • Applications Significantly fewer users, with
    less value
  • Probabilistic forecasts needed Generate them
    through
  • Single forecast integration
  • Accumulate error statistics over many cases
    (bias correction, eg, MOS)
  • Pro Maximum possible fidelity in forecast - all
    comp. resources go into one solution
  • Improved statistical reliability Slight increase
    in statistical resolution
  • Cons Aggregate statistics - no case dependent
    variations in uncertainty captured
  • As errors become nonlinear, single solution
    becomes unrepresentative
  • Loss of statistical resolution
  • Liouville equations
  • Theoretically proper solution in perfect model
    framework
  • Pdf of initial state integrated in time
  • Impractical, enormous computational costs
  • Ensemble forecasts
  • Multiple integrations started with sample from
    estimated initial pdf
  • Provides multiple trajectories for critical
    downstream applications

13
ENSEMBLES WHEN?
  • Single forecast approach favored when
  • Case-dependent variations are weak in
  • Level of linear error growth at short lead times
  • Pdf evolution at short lead times (ie,
    quasi-linear behaviour)
  • Model-related error behaviour (at any lead time)
  • Aggregate bias-correction algorithms adequate
  • Use ensembles otherwise
  • review criteria above for each application
  • Bias-correct both single value ensemble
    forecasts (ie, pdf)
  • Decide on forecast configuration based on results
  • Generic configuration
  • Higher resolution control for short lead time if
    beneficial
  • Lower resolution ensemble out to longer lead
    times
  • Benefits from combining hi-re control lo-res
    ensemble at shorter leads?
  • Considerations
  • Integrations must resolve phenomena of interest
  • Unless sophisticated statistical down-scaling
    techniques can be developed

14
ENSEMBLES HOW?
  • How to represent initial value related
    uncertainty?
  • Perturb initial conditions
  • How to represent model related uncertainty?
  • Perturb model integration
  • How many sample trajectories needed?
  • Ensemble size
  • How to convey forecasts?
  • Trajectories and derived products
  • Unified approach across all applications when
    practical
  • Based on general scientific principles
  • Choices based on / supported by experimental
    results when possible
  • Computational feasibility considered
  • Facilitated by ESMF framework common interface
    for
  • Initial model perturbations, bias correction,
    product generation
  • Applicable in most cases
  • Adjusted if/when necessary
  • Maintenance economic

15
HOW TO REPRESENT INITIAL VALUE RELATED
UNCERTRAINTY?
  • Two distinct problems
  • Estimate analysis uncertainty
  • All estimates ultimately sample based gt
    difficult to disentangle from sampling
  • Implicit solutions (ie, not explicit pdf)
  • Choice among sampling strategies, given an
    estimate
  • Brute force (Monte Carlo) sampling Perturbed
    Observations method
  • Run multiple analysis cycles with perturbed
    observations (Canadian approach)
  • Both growing and non-growing error space sampled
    with realistic amplitude
  • Very poor sampling of myriad non-growing
    directions
  • Noise hurts analysis performance
  • Directed sampling
  • Singular vectors fastest growth for
    pre-selected time period
  • Transient growth emphasized
  • Computationally expensive
  • No general solution If transient growth is
  • Important - Need different perturbations for
    various lead times
  • Not important - No need for SVs
  • Most often norm used is uncoupled from analysis
    error estimates
  • Relevant dynamics identified via growth
    optimization calculations

16
HOW TO REPRESENT INITIAL VALUE RELATED
UNCERTRAINTY?
  • Proposed solution Random sampling in growing
    sub-space ET / ETKF
  • Link with DA
  • GSI ET
  • Take error variance from GSI to specify ensemble
    perturbation level
  • Feed back information from ensemble into
    background error covariance
  • Ensemble-based DA ETKF
  • Same ensemble principles, except 2-way
    interactions tuned simultaneously
  • Ensemble Filter or
  • Variational solutions
  • General applicability
  • As long as transient behaviour is not dominating
  • Ideal for downstream applications
  • ET provides series of perturbed analyses
    consistent in time
  • Important for wave, land surface, etc ensembles
    where
  • Instantaneous error/perturbation dependent on
    forcing history
  • Need for collaboration where DA ensemble
    overlap
  • Analysis is not complete until uncertainty
    estimate provided and assessed

17
HOW TO REPRESENT MODEL RELATED UNCERTRAINTY?
  • Theoretically not well understood problem
  • Numerical model different from reality
  • Truncated model representation in resolved
  • Space and time scales (dynamics, physics)
  • Processes (physics)
  • Other approximations (numerics)
  • Other approximations/errors in representing
    nature due to
  • Lack of full understanding of nature
  • Mistakes (science and coding bugs)
  • Lack of accounting for model related
    uncertainties (for ensemble applications)
  • Deficiency in perturbation growth
  • Performance metrics for modelling guidance differ
    depending on use
  • Traditional, single forecasts
  • Minimize single forecast error
  • Ensemble requirement
  • Account for case dependent model related
    uncertainty
  • Systematic effort needed
  • Incorporate capability of simulating model
    related uncertainty
  • Strong collaboration between modelling and
    ensemble communities

18
HOW TO REPRESENT MODEL RELATED UNCERTRAINTY?
  • Current approaches
  • Stochastic perturbations
  • Catch-all efforts to represent effect of
    unresolved scales of motion on resolved scales
  • Increase growth of spread (ie, properly simulate
    reaql level of predictability)
  • Multi-model (version) method
  • Pragmatic effort
  • Works to minimize effect of unidentified
    modelling errors
  • Possibly reduces case dependent biases (that
    cannot be removed statistically)
  • High development/maintenance costs
  • Effort given up by pioneering center (MSC)
  • Scientifically not appealing
  • Admits fractured nature of our knowledge
  • Must transcend
  • Proposed solution
  • Continue development of stochastic perturbation
    method
  • Perturb resolved scales within ensemble subspace
  • Continue use of multi-model approach
  • Share development/maintenance costs with other
    centers
  • NAEFS MSC, FNMOC, others

19
MEMBERSHIP VS. MODEL RESOLUTION?
  • TRADE-OFF BETWEEN MODEL VS. PDF RESOLUTION
    (LIMITED RESOURCES)
  • Step-wise changes from single forecast to
    ensembles of increasing size 1 gt N
  • Decrease in model resolution
  • Degrading fidelity / statistical consistency
  • Bias correction / downscaling becomes more
    demanding
  • Increase in membership
  • Improving statistical resolution (case-dependent
    variations in pdf captured)
  • Potentially better forecasts
  • Membership questions
  • Fewer members needed in phase of
  • Linear error growth (short lead time)
  • Bias correction to generate pdf
  • More members needed in phase of
  • Nonlinear error growth (longer lead times)
  • Highly nonlinear phenomena, eg, hurricane genesis
  • Bias correction / downscaling to improve fidelity
  • To resolve higher moments of pdf

20
MEMBERSHIP VS. MODEL RESOLUTION?
  • PROPOSED SOLUTION
  • Considerations
  • Cannot increase membership with lead time
  • Must compromise, considering entire time range of
    ensemble forecast
  • Integrations must resolve phenomena of interest
  • Unless sophisticated statistical down-scaling
    techniques can be developed
  • Potential gain from more members capped by level
    of refinement in other parts of forecast process
  • No point refining one aspect of forecast process
    skill limited by weakest link
  • No use of very large ensemble with poor model
  • Bias correction / downscaling can
    interpolate/extrapolate pdf based on smaller
    ensemble
  • Generic configuration guidelines for maximum
    overall benefits
  • Higher resolution control for short lead time if
    beneficial
  • Lower resolution ensemble out to longer lead
    times
  • Benefits from combining hi-re control lo-res
    ensemble at shorter leads?
  • Ratio of 12 horizontal resolution for ensemble
    vs. hires control
  • O(10) membership

21
STATISTICAL POST-PROCESSING
  • Distinguish between
  • Bias correction on model grid
  • Eliminate lead time dependency
  • Coarse to coarse resolution mapping - Cheap
  • Downscaling to (much) higher resolution grid
    (NDFD)
  • Needed if effective model resolution is below
    desired output resolution
  • Coarse to fine resolution mapping Expensive
  • Sub-grid variability must be added for ensembles
  • Best done by dynamical methods
  • LAM or variable resolution global model Very
    expensive
  • Background
  • Based on sample of forecast truth pairs
  • Model, nature must be stationary
  • Quality depends on signal (systematic error) to
    noise (random error) ratio
  • Improved when
  • Random error smaller (short range)
  • Sample size larger (hind-casts important for
    longer leads with larger random errors)
  • Degraded when more details sought
  • Need for larger sample

22
STATISTICAL POST-PROCESSING
  • Approaches
  • 1-step downscaling
  • Potential advantage in reduced noise
  • 2 steps
  • Lead-time dependent bias correction on model grid
    (cheap)
  • Diagnostic evaluation of model forecasts possible
  • Lead-time independent downscaling to finer grid
    (more expensive)
  • Applied on bias corrected forecasts - Perfect
    prog approach, not dependent on lead time
  • More flexibility
  • Applications differ in
  • Statistical method for extracting info from
    forecast truth pairs
  • Linear regression, Bayesian, analog method
  • What they bias correct statistically (skilful and
    not reliable statistically)
  • Only 1st, or also additional moments?
  • How rest of information from forecast treated
  • Retain from raw forecasts (if info statistically
    reliable)
  • Replace stochastically (if info not skilful, not
    reliable)
  • Analog approach - difficult to control retained
    vs. lost information
  • Stochastic generator preferred solution

23
STATISTICAL POST-PROCESSING PROPOSED SOLUTION
  • PROPOSED SOLUTION Follow 2-step approach,
    develop centrally applied
  • Bias correction on model grid
  • Bayesian approach can handle all non-Gaussian,
    non-linear situations
  • Can optimally merge hires control, lores
    ensemble, and climate information
  • Bias-corrected ensemble trajectories
  • Downscaling to NDFD grid
  • Current methods no sub-grid processes
    considered/added
  • Linear function of grid-scale info limited
    utility
  • Climate anomaly
  • Downscaling vector
  • Alternative methods sub-grid processes
    considered/added
  • More information / larger sample needed MDL,
    other collaborators
  • Local analogs
  • Must mosaic together independent patches
  • Cannot well control what information is retained
    vs. stochastically replaced in ensemble
  • Stochastic generator
  • More general solution
  • Difficult to construct?
  • Forecast configuration evaluation

24
HIND-CASTING
  • What
  • With operational process, generate forecasts for
    past cases
  • Must use operational procedures otherwise lost
    purpose
  • Resource intensive
  • Purpose
  • Increase sample size for bias correction /
    downscaling
  • Required for
  • Longer lead bias correction
  • Shorter lead bias correction with more details
    (regime dependent)
  • Options
  • Freeze operational system
  • Generate hind-cast data set prior to use in
    operations
  • Labor intensive
  • Any improvements must wait until next hind-cast
    dataset can be prepared
  • Generate hind-casts in real time, on continuous
    basis
  • Can upgrade forecast system any time following a
    2-month parallel experiment
  • Computationally more expensive Re-computes
    hind-casts with new system every year
  • Logistically simpler, institutionalized process
  • Cheaper in terms of human resources

25
HIND-CASTING PROPOSED SOLUTION
  • Consider real-time generation arrangement as part
    of operations
  • Use forecast process for hind-casts identical to
    operations
  • Cannot share hind-casts across applications if
    operational forecasts are not shared
  • Assumption bias at longer lead does not depend
    on analy
  • Critical for longer range applications
  • Global ensemble
  • Highly nonlinear applications such as river flow
    forecasting
  • Assume bias at long lead independent of analysis
    technique (use reanalysis)
  • Coupled ensemble
  • Needed for refined, regime dependent bias
    correction / downscaling for
  • Regional ensemble, etc
  • Short-range bias depends on analysis technique
  • Must regenerate reanalysis with current DA system
  • Can reanalysis also be done in real time on next
    machine (2010)?

26
REAL-TIME GENERATION OF HIND-CAST DATASET?
Todays Julian Date TJD
TJD 30
TJD - 30
Actual ensemble generated today
2006
Time
2005
2004
2003
1980
1979
Hind-casts for TJD30 generated today
Hind-casts (or its statistics) for TJD/- 30
saved on disc
27
WHAT PRODUCTS?
  • HIERARCHY OF PROGRESSIVELY MORE COMPLEX PRODUCTS
  • First moment only traditional approach
  • Best estimate of first moment (univariate mean or
    median)
  • Most Likely Forecast (multivariate)
  • Close to mean at short lead time
  • Display value only in highly non-linear phase (no
    unique solution)
  • Second moment related information added
  • Two bounds of distribution (10 90 percentile of
    pdf)
  • Likely scenarios (with likelihood), if they exist
    (based on statistical tests)
  • Multiple modes (univariate)
  • Clusters (multivariate)
  • Display value
  • Probability of events that are defined by
  • Single variable (univariate)
  • Multiple variables (multivariate)
  • Quantitative use
  • Full information preferred approach
  • Pdf (univariate)
  • Ensemble member trajectories (multivariate)

28
HOW TO DISSEMINATE PRODUCTS?
  • MULTIPLE CHANNELS
  • Routinely prepare and actively distribute
    often-used basic products
  • Univariate ensemble mean, spread, 10-90
    percentiles, PQPF, etc
  • AWIPS, NAWIPS, ftp, web
  • Stage all raw and bias-corrected ensemble
    forecasts on ftp sites
  • Internally, for product generation engine
  • On ftp sites, for professional users
  • Include hind-casts to permit user specific bias
    correction
  • Downscaled information will require orders of
    magnitude more storage
  • On-demand access to derived information web/ftp
    access
  • Use product generation engine
  • Accesses bias-corrected ensemble database
  • Derives any desired product
  • NAWIPS software developed in collaboration with
    NCO
  • NOMADS functionalities serve User Support Systems
  • Strongly encouraged by NRC panel report
  • Conflict with some private sector partners?

29
INTERFACE OF UNIFIED ENSEMBLE APPROACH WITH
DIFFERENT FORECAST SYSTEMS
  • Experts working on different aspects of unified
    ensemble approach (columns)
  • Others responsible for pulling together all
    pieces for specific systems (rows)
  • Critical area High-Impact ensemble systems
  • Unified framework for very high resolution
    ensemble, embedded into regional ensemble
  • Tropical storm Severe weather Storm surge Fire
    weather Air Quality, etc applications
  • Must share basic infrastructure to prevent
    proliferation of systems
  • Can adapt basic structure as needed for special
    applications (eg, different model versions used)
  • Suggested long term goal Variable resolution
    modeling
  • Single framework to address multiscale processes,
    replaces current global and multitude of LAM
    integrations
  • Simplified modeling, DA, ensemble infrastructure
  • Scientifically challanging - 5 years ( global
    resolution higher by then)
  • Adaptively configure model to serve all high
    impact cases with pre-defined priorities
  • THORPEX research/development resources may be
    available

30
BACKGROUND
31
USER REQUIREMENTSPROBABILISTIC FORECAST
INFORMATION IS CRITICAL

32
(No Transcript)
33
BACKGROUND-2
34
EVALUATION OF FORECAST SYSTEMS
  • Some statistics based on forecast system only
  • Other statistics based on comparison of forecast
    and observed systems gt
  • FORECAST SYSTEM ATTRIBUTES
  • Abstract concepts (like length)
  • Reliability and Resolution
  • Both can be measured through different statistics
  • Statistical properties
  • Interpreted for large set of forecasts (ie,
    describe behavior of forecast system),
  • not for a single forecast
  • For their definition
  • Assume that forecasts
  • Can be of any format
  • Take a finite number of different classes
  • Consider empirical frequency distribution of
  • Verifying observations corresponding to large
    number of forecasts of same class gt
  • Observed Frequency Distribution (ofd)

35
STATISTICAL RELIABILITY STAT AGGREGATE
STATISTICAL CONSISTENCY OF FORECASTS WITH
OBSERVATIONS
  • BACKGROUND
  • Consider particular forecast class Fa
  • Consider distribution of observations Oa that
    follow forecasts Fa
  • DEFINITION
  • If forecast Fa has the exact same form as Oa, for
    all forecast classes,
  • the forecast system is statistically consistent
    with observations gt
  • The forecast system is perfectly reliable
  • MEASURES OF RELIABILITY
  • Based on different ways of comparing Fa and Oa

36
STATISTICAL RESOLUTION TEMPORAL EVOLUTION
ABILITY TO DISTINGUISH, AHEAD OF TIME, AMONG
DIFFERENT OUTCOMES
  • BACKGROUND
  • Assume observed events are classified into finite
    number of classes
  • DEFINITION
  • If all observed classes are preceded by
    distinctly different forecasts, the forecasts
    resolve the problem gt
  • The forecast system has perfect resolution
  • MEASURES OF RELIABILITY
  • Based on degree of separation of distributions of
    observations that follow various forecast classes
  • Measured by difference between ofds climate
    distribution
  • Measures differ by how differences between
    distributions are quantified

FORECASTS
OBSERVATIONS
EXAMPLES
37
QUALITY OF BIAS CORRECTION
  • REAL WORLD APPLICATIONS
  • More complex problem, processes not stationary
    but influenced by
  • Seasonal cycle
  • Bias estimate as function of seasonal cycle
  • Can be done but increases sample size requirement
  • Regime changes
  • Bias estimate as function of regime
  • Can be done
  • Use of recent data works well for short lead
  • Increases sample size requirement for long lead
  • Changes in forecast process (periodic upgrades of
    NWP DA modeling)
  • Estimation can be done but may require
    regeneration of large hind-cast dataset
  • How to balance influence of different factors?
  • Update model continuously
  • Best statistical resolution
  • Lack of large hind-cast dataset corrupts ability
    to bias correct
  • Generate large hind-cast dataset
  • Best reliability
  • Corrupts statistical resolution for not using
    best available NWP system

38
ISSUES
  • How much skill we lose by using an 8-10 year old
    forecast system?
  • 2 days skill at D3, 3 days at D7.5, 4 days at 9D
  • Estimate 2 rms error reduction per year in
    recent yrs
  • How much skill we gain due to use of larger
    sample?
  • Reduction depends on level of bias in forecast
  • CDC hind-casts have much larger bias
  • Large bias reduction observed
  • Still does not compensate for loss of skill until
    10 day lead time
  • Operational forecast system
  • Not known
  • Estimate via synthetic data
  • Use synthetic data to estimate bias error
    reduction with more data current fcst
  • What is current level of NH extratropical time
    mean error (estimated bias)?
  • 6-14 of climate standard deviation (2m temp)
  • Gain from expanding sample
  • From 100 days (Kalman Filter method) to 5,000
    days (50 yrs hind-cast)
  • 5-10 (depending on method) of random rms error
  • This is equivalent to 2.5-5 yrs of model
    development
  • If DA/model has to be frozen for 3-5 yrs

39
Raw, Optimal Actual Bias Corrected Ensembles
Annual Mean RPSS ( 20040301 20050228 ) 500 mb
Height over Northern Hemisphere
  • Decaying average
  • bias correction improves RPSS for all lead
    time vs. raw oper. ens.
  • Climate error removed
  • bias corrected reforecast gains significant
    improvement for all lead time vs. raw reforecast

3 OPR ENS.
  • Operational vs. reforecast ens.
  • oper. fcst is better than the bias-corrected
    reforecast out to 9-10 days. Beyond 10 days,
    bias-corrected reforecast becomes competitive to
    or better than oper. fcst

3 RFC ENS.
  • Sign of improving
  • larger for CDC reforecast

40
Gain in 5 yrs 15 hrs or 6 m (10)
Gain per year 2 rms error reduction
Gain in 5 yrs 0.06 or 15 hrs
41
(No Transcript)
42
ISSUES - 2
  • Regime dependent vs. climate mean bias correction
  • Regime dependent (with small sample)
  • Better at short lead
  • Climate mean (with much larger sample)
  • Better at long lead
  • Could try regime dependent correction using
    hind-cast sample
  • Combined method may be best?
  • Is it worth the effort to generate hind-cast
    dataset?
  • Questionable
  • Are there additional factors/gains to consider?
  • Bias correction for highly non-linear downstream
    applications and for rare events
  • River flow forecasting?
  • Can we have the good aspects of both worlds?
  • Latest model AND large hind-cast dataset?
  • Real time generation of hind-cast data
  • Assume biggest problem is bias in first moment
  • Can be estimated using large sample of single
    forecasts

43
Raw, Optimal Actual Bias Corrected Ensembles
RPSS of 500 mb Height Northern Hemisphere, 2004
Summer
  • Decaying average applied to
  • CDC reforecast
  • decaying method gives better results than
    climate mean bias estimation for short range
    (day 5), value of regime dependent correction
  • some gain from climate mean bias correction
    after _5 days

44
Raw, Optimal Actual Bias Corrected Ensembles
RPSS of 500 mb Height Northern Hemisphere, 2004
Summer
  1. Raw operational forecast much better than
    optimally bias-corrected hindcast with 8-yr old
    DA/model up to D8
  2. Regime-dependent bias correction superior to
    climate mean correction (with hindcast data) out
    to D3
  3. Climate mean correction superior to
    regime-dependent method beyond D4

45
(No Transcript)
46
REAL-TIME GENERATION OF HIND-CAST DATASET?
  • When computing facilities are upgraded (2007 at
    NCEP)
  • Forgo increasing resolution/membership
  • Instead generate hind-cast ensemble
  • Arrangement
  • Generate control forecast from re-analysis
    initial conditions
  • Initialized from 30 days ahead of current date
  • For each year in re-analysis dataset (last 27
    yrs)
  • Additional cost is 2-3 times current ensemble
    configuration - DOABLE
  • Bias correction methodology
  • Same as currently, except
  • Estimate climate mean bias as mean of difference
    between
  • Re-analysis hind-cast forecast (/-30 days x 30
    yrs 1,800 sample)
  • Use centrally weighted recursive filter to save
    on disc storage requirements
  • Combine regime dependent Kalman filter climate
    mean bias estimates
  • More weight on Kalman filter at short
  • More weight on climate mean estimate at long
    leads
  • Explore possible regime dependent corrections at
    long lead
  • Save use entire sample of 1,800 using
    hind-casts in research
  • Discard 30 day old hind-cast data

47
REAL-TIME GENERATION OF HIND-CAST DATASET?
Todays Julian Date TJD
TJD 30
TJD - 30
Actual ensemble generated today
2006
Time
2005
2004
2003
1968
1967
Hind-casts for TJD30 generated today
Hind-casts (or its statistics) for TJD/- 30
saved on disc
48
LIMITATIONS
  • Only first moment can be corrected
  • Single hind-cast dataset (hindcast ensemble
    prohibitively expensive)
  • Second higher moments will be corrected using
    only recent forecast data
  • Only longer range forecasts can be corrected
  • Hindcasts initialized with old reanalysis
  • Short-range hindcast stat. characteristics
    different from operational fcsts
  • Short-range forecasts will be corrected using
    only recent forecast data
  • TESTS
  • Two alternatives with comparable cpu requirements
    to be tested
  • Hindcast generation (T126L64)
  • Increased horizontal resolution, without
    hindcasts (T170L64)
  • Decision based on verification results
  • Each alternative evaluated after bias correction
  • Expected result
  • Hindcast better for longer ranges

49
OTHER POTENTIAL APPLICATIONS OF REAL-TIME
HINDCASTING
  • Real-time generation of reanalysis?
  • Consistency between reanalysis and operational
    analysis
  • Allows for testing applicability of hindcasts for
    short-range applications
  • Allows for testing utility of hind-cast ensembles
  • Test if high cost justified by improved utility
    of forecasts
  • Real-time generation of coupled ocean-atmosphere
    forecasts?
  • Allows periodic upgrades to coupled DA/model

50
BACKGROUND
51
QUALITY UTILITY OF FORECASTS
  • Quality of forecast process depends on its
  • Statistical resolution
  • Ability to distinguish (provide unique signals
    before) future events
  • Temporal sequence foreseen - Inherent value of
    forecast process
  • NWP methods used in 6 hr 15 days range,
    statistical methods are not viable
  • Can be improved by NWP DA model development
  • Statistical reliability
  • Ability to simulate (not predict) nature
    faithfully
  • Realism, fidelity - But no info on temporal
    sequence
  • Can be improved by
  • NWP model development
  • Use of statistical bias correction methods
  • Better statistical methods
  • Larger data sample - Can be perfectly corrected
    with large enough sample
  • Utility of forecasts depends on both resolution
    and reliability
  • Dual requirement of
  • Continuously improved DA model
  • Routinely done couple times per year

52
STATISTICAL POST-PROCESSING ISSUES
  • GOAL
  • Improve reliability while maintaining resolution
    in NWP forecasts
  • Reduce systematic errors (improve reliability)
    while
  • Not increasing random errors (maintaining
    resolution)
  • Retain all useful information in NWP forecast
  • METHODOLOGY - Requirements
  • Use bias-free estimators of systematic error
  • Need methods with fast convergence
  • APPROACH Computational efficiency
  • Remove lead-time dependent bias on model grid
  • Working on coarser model grid allows use of more
    complex methods
  • Feedback on systematic errors to model
    development
  • Downscale bias-corrected forecast to finer grid
  • Further refinement/complexity added
  • No dependence on lead time
  • Mapping from coarse to fine grid

53
STATISTICAL POST-PROCESSING ISSUES - 2
  • APPLICATION User requirements
  • Statistically correct all forecast information
    before transmitted to users
  • Modest reduction in bias can have large impact on
    users (prob. fcst)
  • Make raw forecast data also available for
    specialized users
  • OUTPUT FORMAT
  • Adjusted ensemble trajectories
  • Corrected pdf
  • BIAS CORRECTION TECHNIQUES Array of methods
  • Frequentist's approach
  • Estimate/correct bias moment by moment (e.g., D.
    Unger et al.)
  • Simple approach, implemented partially
  • May be less applicable for extreme cases
  • Bayesian approach (e.g., Roman Krzysztofovicz)
  • Allows simultaneous adjustment of all modes
    considered
  • More complex, currently under development
  • BMA, Ridging are variants of approach

54
STATISTICAL POST-PROCESSING ISSUES - 3
  • DOWNSCALING METHODS
  • Bias correction onto finer grid
  • Cumulative distribution matching
  • Downscaling vector
  • Statistically generated spatio-temporal variance
    on fine grid
  • DATA NEEDS Hierarchy of possible adjustments
  • More data allows better quality / more detailed
    bias estimation
  • Hierarchy of adjustments based on amount of data
  • Seasonally changing climate mean
  • Regime dependent
  • Requires much more data except
  • When most recent data used
  • Works for short lead time only since regimes
    change
  • Situation dependent
  • Use LAM integrations
  • Dual-resolution ensemble approach of Jun Du

55
Current method, Real data
Current method, Simulated data
Estimated
Before bias correction
Estimated
Before bias correction
After bias correction
After bias correction
Current new methods, Simul. data
After bc., current method
Actual
New method, Simulated data
Estimated
Before bc.
Before bias correction
After bc, new method
After bias correction
56
Statistical Post-processing
  • Bias correction methods
  • Decaying averaging bias correction ( 46 day
    independent training data)
  • Operational NCEP ensemble
  • CDC reforecast data
  • 25-yr climatological mean forecast error (Hamill
    Whitaker)
  • CDC reforecast
  • 31-day centered running mean forecast error (
    dependent data), used as optimal benchmark
  • Operational NCEP ensemble
  • CDC reforecast data
  • Hybrid system, 2 step adjustment

FCSTclibriated FCSTOPR BIAS Decaying
OPR-RFC BIASRFC-REA (25-YRS)
  • Data sets
  • NCEP operational global ens. Data (2004/05
    analysis/modeling system)
  • CDC reforecast data set (1998 modeling system,
    1978 2005)
  • 500 mb height, 850 mb temp, 2m temp, 10m U and V,
    Mar. 2004-Feb. 2005

57
Raw, Optimal Actual Bias Corrected Ensembles
Annual Mean RPSS ( 20040301 20050228 ) 500 mb
Height over Northern Hemisphere
  • Decaying average
  • bias correction improves RPSS for all lead
    time vs. raw oper. ens.
  • Climate error removed
  • bias corrected reforecast gains significant
    improvement for all lead time vs. raw reforecast

3 OPR ENS.
  • Operational vs. reforecast ens.
  • oper. fcst is better than the bias-corrected
    reforecast out to 9-10 days. Beyond 10 days,
    bias-corrected reforecast becomes competitive to
    or better than oper. fcst

3 RFC ENS.
  • Sign of improving
  • larger for CDC reforecast

58
Raw, Optimal Actual Bias Corrected Ensembles
RPSS of 850 mb Temperature Northern Hemisphere,
2004 Summer
  • Decaying average
  • oper. ens. with bias correction has better
    performance than the raw fcst.

3 OPRR ENS.
  • Climate error removed
  • bias corrected reforecast gains significant
    improvement for most lead time vs. raw reforecast

3 RFC ENS.
  • Operational vs. reforecast ens.
  • both the raw and post-processed oper. fcst.
    are better than the bias-corrected reforecast

59
Raw, Optimal Actual Bias Corrected Ensembles
  • Decaying average
  • RMS error of OPR_DAV2 reduced for first week
  • Climate error removed
  • RMS error of RFC_COR improvement for all lead
    times vs. RFC_RAW


60
Bias-correction algorithms
  • New method (method 2)
  • Bias weighted average of
  • Bias Estimation
  • Equal weight
  • Kalman Filter
  • Bias correction
  • Current method (method 1)

Kalman Filter weight
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com