PM2.5 Model Performance: Lessons Learned and Recommendations - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

PM2.5 Model Performance: Lessons Learned and Recommendations

Description:

Big Bend National Park, Texas (BRAVO); Four-Month Study ... Falcon Dam. Laguna Atascosa. Padre Island. Lake Corpus Christi. Pleasanton. Hagerman. Purtis ... – PowerPoint PPT presentation

Number of Views:81

Avg rating:3.0/5.0

Slides: 38

Provided by: techni82

Category:

more less

Transcript and Presenter's Notes

Title: PM2.5 Model Performance: Lessons Learned and Recommendations

1
PM2.5 Model Performance Lessons Learned and
Recommendations

Naresh Kumar
Eladio Knipping
EPRI
February 11, 2004

2
Acknowledgements

Atmospheric Environmental Research, Inc. (AER)
Betty Pun, Krish Vijayaraghavan and Christian
Seigneur
Tennessee Valley Authority (TVA)
Elizabeth Bailey, Larry Gautney, Qi Mao and
others
University of California, Riverside
Zion Wang, Chao-Jung Chien and Gail Tonnesen

3
Overview

Model Performance Issues
Need for Performance Guidelines/Benchmarking
Review of Statistics
Summary

4
Model Performance Issues

Evaluation of Modeling Systems
Local vs. Regional Evaluation
Daily/Episodic/Seasonal/Annual Averaging
Threshold and Outliers
What Species to Evaluate?
Sampling/Network Issues

5
Examples from Model Applications

Two applications of CMAQ-MADRID
Southeastern U.S. (SOS 1999 Episode)
Big Bend National Park, Texas (BRAVO) Four-Month
Study
Statistical performance for SO42, EC, OM, PM2.5

6
Application in Southeastern U.S.

Southern Oxidant Study (SOS 1999)
June 29 to July 10, 1999
Meteorology processed from MM5 simulations using
MCIP2.2
Emissions files courtesy of TVA
Simulation
Continental U.S. Domain
32-km horizontal resolution without nesting

7
Application to Big Bend National Park
REMSAD
CMAQ-MADRID

The Georgia Tech/Goddard Global Ozone Chemistry
Aerosol Radiation Transport (GOCART) model
prescribed boundary conditions for SO2 and SO42
to the REMSAD domain.
Preliminary Base Case simulation used boundary
conditions as prescribed from a simulation of the
larger outer domain by REMSAD.
SO2 and SO42 concentrations were scaled at
CMAQ-MADRID boundary according to CASTNet and
IMPROVE Network observations.

8
BRAVO Monitoring Network
9
Local vs. Regional (SOS 1999)
10
Local vs. Regional (BRAVO)
11
Daily SO42 PO Pairs with Different Averaging
12
Daily SO42 PO Pairs for Each Month
13
Effect of Threshold
14
Mean-Normalized/Fractional Statistics
15
Need for Model Performance Guidelines

If no guidelines exist
Conduct model simulation with best estimate of
emissions and meteorology
Perform model evaluation using favorite
statistics
Difficult to compare across models
State that model performance is quite good or
adequate or reasonable or not bad or as
good as it gets
Use relative reduction factors
With guidelines for ozone modeling
If model didnt perform within specified
guidelines
Extensive diagnostics performed to understand
poor performance
Improved appropriate elements of modeling system
Enhanced model performance

16
Issues with Defining Performance Guidelines for
PM2.5 Models

What is reasonable, acceptable or good
model performance?
Past experience How well have current models
done?
What statistical measures should be used to
evaluate the models?

17
Criteria to Select Statistical Measures I

Simple yet Meaningful
Easy to Interpret
Relevant to Air Quality Modeling Community
Properties of Statistics
Normalized vs. Absolute
Paired vs. Unpaired
Non-Fractional vs. Fractional
Symmetry
Underestimates and overestimates must be
perceived equally
Underestimates and overestimates must be weighted
equally
Scalable biases scale appropriately in statistics

18
Criteria to Select Statistical Measures II

Statistics that can attest to
Bias
Error
Ability to capture variability
Peak accuracy (to some extent)
Normalizes daily predictions paired with
corresponding daily observations
Inherently minimizes effect of outliers
Some statistics/figures may be preferable for
EVALUATION, whereas others may be preferred for
DIAGNOSTICS

19
Problems with Thresholds Outliers

Issues with addressing low-end comparisons via
threshold
Instrumental uncertainty detection limit,
signal-to-noise
Operational uncertainty
Additional considerations network, background
concentration, geography, demographics
Inspection for outliers
Outlier vs. valid high observation
Definition of outlier must be objective and
unambiguous
Clear guidance necessary for performance
analysis.

20
Review of Statistics

Ratio of Means (Bias of Means) or
Quantile-Quantile Comparisons
Defeats purpose of daily observations completely
unpaired
Hides any measure of true model performance
Normalized Mean Statistics (not to confuse with
Mean Normalized)
Defeats purpose of daily observations Equally
weighs all errors regardless of magnitude of
individual daily observations
Masks results in bias (e.g., numerator zero
effect)
Based on Linear Regressions
Slope of Least Squares Regression Root
(Normalized) Mean Square Error
Slope of Least Median of Squares Regression
(Rousseeuw regression)
Can be skewed neglects magnitude of
observations good for cross-comparisons.
Fractional Statistics
Taints integrity of statistics by placing
predictions in denominator not scalable

21
Bias Statistics

Mean Normalized Bias/Arithmetic Bias Factor
Same statistic ABF is the style for symmetric
perception
ABF 21 for 100 MNB, ABF 12 for 50 MNB
MNB in can be useful during diagnostics due to
simple and meaningful comparison to MNE, but the
comparison is flawed.
The statistics give less weight to
underpredictions than to overpredicitions.
Logarithmic Bias Factor/Logarithmic-Mean
Normalized Bias
Wholly symmetric representation of bias that
satisfies all criteria
Can be written in factor form or in
percentage form

22
Error Statistics

Mean Normalized Error
Each data point normalized with paired
observation
Similar flaw as Arithmetic Mean Normalized Bias
The statistic gives less weight to
underpredictions than to overpredicitions.
Logarithmic Error Factor/Logarithmic-Mean
Normalized Error
Satisfies all criteria
Comparisons between logarithmic-based statistics
(bias and error) are visibly meaningful when
expressed in factor form

23
Comparing Bias and Error Statistics Based on
Arithmetic and Logarithmic Means
24
Mean Normalized/Fractional Statistics
25
Logarithmic/Arithmetic Statistics
26
Logarithmic/Arithmetic Statistics
Note MNB/ABF MNE use 95 data interval. FB,
FE, LMNB/LBF and LMNE/LEF use 100 of data.
27
Relating Criteria for LBF/LMNB and LEF/LMNE

Criterion for Logarithmic EF/MNE can be
Established from Criterion for Logarithmic BF/MNB
For example Error twice the amplitude of Bias
Logarithmic Bias Factor/Logarithmic-Mean
Normalized Bias
LBF 1.251 to 11.25 LMNB 25 to -20
Logarithmic Error Factor/Logarithmic-Mean
Normalized Error
LEF 1.56 LMNE 56

28
Relating Criteria for LBF/LMNB and LEF/LMNE

Criterion for Logarithmic EF/MNE can be
Established from Criterion for Logarithmic BF/MNB
For example Error twice the amplitude of Bias
Logarithmic Bias Factor/Logarithmic-Mean
Normalized Bias
LBF 1.501 to 11.50 LMNB 50 to -33
Logarithmic Error Factor/Logarithmic-Mean
Normalized Error
LEF 2.25 LMNE 125

29
Variability Statistics

Coefficient of Determination R2
Should not be used in absence of previous
statistics
Coefficient of Determination of Linear
Regressions
Least Squares Regression through Origin Ro2
Used by some in global model community as a
measure of performance and ability to capture
variability
Least Median of Squares Regression
More robust, inherently minimizes effects of
outliers
Comparison of Coefficients of Variation
Comparison of Standard Deviation/Mean of
predictions and observations
Other statistical metrics?

30
Summary Items for Discussion

What spatial scales to use for model performance?
Single Site Local/Region of Interest
Large Domain/Continental
What statistics should be used?
What are the guidelines/benchmarks for
performance evaluation?
Should the same guidelines be used for all
components
Sulfate, Nitrate, Carbonaceous, PM2.5
Ammonium, Organic Mass, EC, Fine Soil, Major
Metal Oxides
How are network considerations taken into account
in guidelines?
Should models meet performance guidelines for an
entire year and/or other time scales (monthly,
seasonal)?
Should there be separate guidelines for different
time scales?
Statistics based on daily PO pairs
Average daily results to create weekly, monthly,
seasonal or annual statistics

31
More Examples

More examples of Comparison of Statistics
Fractional
Arithmetic-Mean Normalized
Logarithmic-Mean Normalized

32
Mean Normalized/Fractional Statistics
33
Logarithmic/Arithmetic Statistics
34
Mean Normalized/Fractional Statistics
35
Logarithmic/Arithmetic Statistics
36
Mean Normalized/Fractional Statistics
37
Logarithmic/Arithmetic Statistics

Write a Comment

User Comments (0)