Title: Benchmark database based on surrogate climate records
1Benchmark database based on surrogate climate
records
2Goals of COST-HOME working group 1
- Literature survey
- Benchmark dataset
- Known inhomogeneities
- Test the homogenisation algorithms (HA)
3Benchmark dataset
- Real (inhomogeneous) climate records
- Most realistic case
- Investigate if various HA find the same breaks
- Good meta-data
- Synthetic data
- For example, Gaussian white noise
- Insert know inhomogeneities
- Test performance
- Surrogate data
- Empirical distribution and correlations
- Insert know inhomogeneities
- Compare to synthetic data test of assumptions
4Creation benchmark Outline talk
- Start with homogeneous data
- Multiple surrogate and synthetic realisations
- Mask surrogate records
- Add global trend
- Insert inhomogeneities in station time series
- Published on the web
- Homogenize by COST participants and third parties
- Analyse the results and publish
51) Start with homogeneous data
- Monthly mean temperature and precip (France)
- Later also daily data
- Later maybe other variables
- Homogeneous
- No missing data
- Detrended
- 20 to 30 years is enough for good statistics
- Longer surrogates are based on multiple copies
- Larger scale correlations are small
- Distribution well defined with 30a data
- Generated networks are 50, 100 and 200 a long
62) Multiple surrogate realisations
- Multiple surrogate realisations
- Temporal correlations
- Station cross-correlations
- Empirical distribution function
- Annual cycle removed before, added at the end
- Number of stations between 5 and 20
- Cross correlation varies as much as possible
- Show plot temporal structure of surrogates
- Show plot cross correlations
7One station with annual cycle
8One station anomalies
9Multiple stations 10 year zoom
10Multiple stations 10 year zoom
11IAAFT algorithm smoothes jumps
123) Mask surrogate records
- Beginning of records jagged (rough)
- Linear increase in number of stations
- Last station after 25 of full time
- End of record all stations are measuring
- Influence of jagged edge on detection and
correction - But trend is also increasing in time (i.e.
different)! - Is this a problem?
133) Mask surrogate records
144) Add global trend
- NASA GISS GISS Surface Temperature Analysis
(GISTEMP) by J. Hansen - Global mean surface temperature
- Last year of any surrogate network is 1999
155) Insert inhomogeneities in stations
- Random breaks (implemented)
- Frequency of breaks 1/20a, 1/40a
- Size constants for temperature 0.25, 0.5, 1.0 C
- Size factors for rain 0.8, 0.9, 1.1, 1.2
- Simultaneous breaks
- Frequency of breaks 1/50a
- In 10 to 50 of network
165) Insert inhomogeneities in stations
- Outliers
- Frequency 1 3
- Size 99 and 99.9 percentiles
- Local trends (only temperature)
- Linear increase or decrease in one station
- Duration 30, 60a
- Maximum size 0.2 to 1.5 C
- Frequency once in 10 of the stations
176) Published on the web
- Inhomogeneous data will be published on the
COST-HOME homepage - Everyone is welcome to download and homogenize
the data
187) Homogenize by participants
- Return homogenised data
- Should be in COST-HOME file format (next slide)
- Return break detections
- BREAK
- OUTLI
- BEGTR
- ENDTR
- Multiple breaks at one data possible
197) Homogenize by participants
- COST-HOME file format http//www.meteo.uni-bonn.d
e/ venema/themes/homogenisation/costhome_fileforma
t.pdf - For benchmark COST homogenisation software
- One data and one quality-flag file per station
- Filename variable, resolution, quality, station
- ASCII network-file with station names
- ASCII break-file with dates and station names
20COST-HOME file format monthly data
21COST-HOME file format network file
228) Analyse the results
- Detailed analysis will be performed in the
working groups - Detection
- Correction
- Daily data homogenisation
- Synthetic and surrogate data
- RMS Error
- No. breaks detected (function of size)
- Application reduction in the scatter in the
trends - Performance difference between synthetic
(Gaussian, white noise) and surrogate data
23Work in progress
- Monthly precipitation
- Implement some inhomogeneity types
- Daily data other inhomogeneities
- Synthetic data (Gaussian white noise)
- More input data!
- Agree on the details of the benchmark
- Next meeting?
- Set deadline for the availability benchmark
- Deadline for the return of the homogeneous data
24Questions
- Ideas for a better benchmark
- For example, for other inhomogeneities, constants
- Types of inhomogeneities for daily data
- Automatic processing
- In the order of 100 networks
25(No Transcript)
267) Homogenize by participants
- COST-HOME file format http//www.meteo.uni-bonn.d
e/ venema/themes/homogenisation/costhome_fileforma
t.pdf - For benchmark COST homogenisation software
- Regular ASCII matrix (columns)
- One data and one quality-flag file per station
- Yearly, daily, subdaily data columns for time,
one for data - Monthly data year column, 12 columns for data
- Filename variable, resolution, quality, station
- ASCII network-file with station names
- ASCII break-file with dates and station names