Title: 3rd Training Workshop, 16-19 June, Oostende-IOC Offices General description of data management procedures
13rd Training Workshop, 16-19 June, Oostende-IOC
Offices General description of data management
procedures Description of the different
steps"From data collection to the SeaDataNet
mgmt system"
2Topics
- PART A Presentation of general data management
rules - SeaDataNet Data Policy and Data Licence
- Quality Controls
- Rules for metadata sumbission to prevent
duplication - PART B Identification of main stages and
available tools - Reformatting from observation format to common
data format - Quality Controls
- Metadata Concepts and Management (online,
offline)
3Topics
- PART A Presentation of general data management
rules - SeaDataNet Data Policy and Data Licence
- Quality Controls
- Rules for metadata sumbission to prevent
duplication - PART B Identification of main stages and
available tools - Reformatting from observation format to common
data format - Quality Controls
- Metadata Concepts and Management (online,
offline)
4I. SDN Data Policy History
- Drafted by Project Office, 02/2007
- Reviewed by the Steering Committee
- Validated by the Coordination Group
- sdn_po07_Data_policy.doc, 04/2007
- http//www.seadatanet.org/media/seadatanet/files/p
ublications/seadatanet_data_policy
5I. SeaDataNet Data Policy
- It is derived from the INSPIRE directive for
spatial information taking into account the
national rules and the SeaDataNet users needs. - Objectives
- to serve the scientific community, public
organizations, environmental agencies - to facilitate the data flow through the
Transnational Activities by stating clearly the
conditions for data submission, access and use
6I. Links and Framework
- SeaDataNet Data Policy is fully compatible with
the EU Directives, International Policies, Laws
and Data Principles - Directive 2003/4/EC of the European Parliament
and of the Council of 28 January 2003 on public
access to environmental information and repealing
Council Directive 90/313/EEC (http//ec.europa.eu/
environment/aarhus/index.htm). - INSPIRE Directive for spatial information in the
Community (http//inspire.jrc.it/home.html) - IOC Data Policy (http//ioc3.unesco.org/iode/conte
nts.php?id200) - ICES Data Policy 2006 (https//www.ices.dk/Datacen
tre/Data_Policy_2006.pdf) - WMO Resolution 40 (Cg-XII see http//www.nws.noaa
.gov/im/wmor40.htm) - Implementation plan for the Global Observing
System for Climate in support of the UNFCCC,
2004 GCOS 92, WMO/TD No.1219. - Global Earth Observation System of Systems GEOSS
10-Year Implementation Plan Reference Document
(Final Draft) 2005. GEO 204. February 2005. - CLIVAR Initial Implementation Plan, 1998 WCRP
No. 103, WMO/TS No. 869, ICPO No. 14. June 1998.
7I. Policy for Data Access and Use
- Metadata
- free and open access, no registration required
- each data centre is obliged to provide the
meta-data in standardized format to populate the
catalogue services - Data and products
- visualisation freely available
- the general case is free and open access (e.g.
academic purposes) - however (due to national policies) mandatory user
registration is required (using Single Sign One
(SSO) Service) - a SeaDataNet role (partner, academic,
commercial etc.) is attributed to individual
user using the Authentication, Authorization
and Administration (AAA) Service - Each NODC attributes the roles to the users of
its of country - Out of the partnership, the roles are assigned by
SeaDataNet user-desk - When register, the user must accept the SDN
licence agreement - each data centre node delivers data according to
the users role and its local regulation - each data centre should provide freely the data
sets necessary to develop the common products
8I. SeaDataNet Users Management
9I. User Agreement on SeaDataNet Licence
10(No Transcript)
11Topics
- PART A Presentation of general data management
rules - SeaDataNet Data Policy and Data Licence
- Quality Controls
- Rules for metadata sumbission to prevent
duplication - PART B Identification of main stages and
available tools - Reformatting from observation format to common
data format - Quality Controls
- Metadata Concepts and Management (online,
offline)
12II. SeaDataNet Services
- SeaDataNet Quality Control is one of the
off-line services that provides methodologies,
standards and tools to ensure the reliability,
compatibility, coherence of the data - a common Quality Control Protocol
- a tool for visualization and automatic checks
(ODV)
On-line services
off-line services
13II. QC procedures
- Overview (IOC, ICES, EU recommendations, MEDAR
Protocol) - automatic and visual controls on the data and
their metadata. - Data measured from the same instrument and coming
from the same cruise are organized at the same
file, reformatted to the same exchange format and
then are subject to a series of quality tests - check of the format
- check of the location and time
- check of measurements
- The results of the automatic control are
attached as QC flags to each data value. - Validation or correction is made manually to the
QC flags and NOT to the data. - The results of the QC reported to the data
originator to give feedback and ask questions.
14MEDATLAS Quality Flags values (based to the GTSPP
Flag Scale definition)
- 0 No QC
- 1 Correct value
- 2 Out of statistics but not obviously wrong
- 3 Doubtful value
- 4 Bad value
- 5 Modified value (only for the location, date,
bottom depth) - 9 missing value
15SEADATANET Quality Flags values (L021) (Based on
IGOSS/UOT/GTSPP Argo QC flags)
- Quality flags
- 0 No quality control
- 1 The value appears to be correct
- 2 The value appears to be probably good
- 3 The value appears probably bad
- The value appears erroneous
- Information flags
- 5 The value has been changed
- 6 Below detection limit
- 7 In excess of quoted value
- 8 Interpolated value
- 9 Missing value
- A Incomplete information
16II. Main QC procedures description
- Detects anomalies like wrong platform codes or
names, parameters name or units, missing
mandatory information like reference to a cruise
or observation system, source laboratory, sensor
type - No further control should be made before the
correction and validation of the archive format
17II. Main QC procedures description
- Check of date and location
- For vertical profiles
- duplicate entries
- date reasonable date, station date within the
begin and end date of the cruise. - ship velocity between two consecutive stations.
- (e.g., speed gt15 knots means wrong station date
or wrong station location). - location/shoreline on land position
- bottom sounding out of the regional scale,
compared with the reference surroundings - For time series of fixed mooring
- sensor depth checks less than the bottom depth
- series duration checks consistence with the
start and end date of the dataset - duplicate moorings checks
- land position checks
18II. Main QC procedures description
- Conventional techniques
- Algorithms
- - comparison of the location, time of the
measurements (5 miles, 15 mins in GTSPP) - - comparison of the measurements
- - comparison of extra metadata (platform codes-
floats id, ) - Visualization of ships tracks, transects,
- Advanced techniques
- Unique data identifier -CRC Tag (GTSPP report
2002)
- Keep the most complete data set
19II. Main QC procedures description
- presence of at least two parameters
vertical/time reference measurement - pressure/time must be monotonous increasing
- the profile/time series must not be constant
sensor jammed - broad range checks check for extreme regional
values compared with the min. and max. values for
the region. The broad range check is performed
before the narrow range check. - data points below the bottom depth
- spikes detection usually requires visual
inspection. For time series a filter is applied
first to remove the effect of tides and internal
waves. - narrow range check comparison with pre-existing
climatological statistics. Time series are
compared with internal statistics. - density inversion test (potential density
anomaly, FOFONOF and MILLARD, 1983, MILLERO and
POISSON, 1981) - Redfield ratio for nutrients ratio of the
oxygen, nitrate and alkalinity (carbonates)
concentration over the phosphate (172, 16 and 122
in Atlantic and Indian ocean, Takahashi al )
20II. Broad Range Check
- Regional parameterization in MEDAR/MEDATLAS II
- (plus depth parameterization)
21II. Main QC procedures description
- Narrow range check
- qc flag2, probably good data, after auto control
- qc1, manually
- The automatic comparison with reference
climatologies is made by linearly interpolating
the references at the level of the observation. - Outliers are detected if the data points differ
from the references more than - 5 x standard deviation over the shelf (depth
lt200m) - 4 x standard deviation at the slop and straits
region - (200 mlt depth lt 400m)
- 3 x standard deviation at the deep sea (depth
gt400m)
22II. Main QC procedures description
- The test is sensitive to the vertical/time
resolution. - It requires at least 3 consecutive
good/acceptable values. - It requires 2 consecutive at the surface and the
bottom. - The IOC Algorithm to detect the spikes taking
into account the difference in values (for
regularly spaced data like CTD) - V2-(V3V1)/2 - V1-V3/2 ) gt THRESHOLD VALUE
- For irregularly spaced values (like bottle data)
a better algorithm to detect the spikes, taking
into account the difference in gradients instead
the difference in values, is - (V2-V1)/(P2-P1)-(V3-V1)/(P3-P1) -
(V3-V1)/(P3-P1) gt THRESHOLD VALUE
23II. QC procedures description
- density inversion test, the importance of visual
check - example of density inversion due to temperature
increase with depth
Suggested threshold value0.03 for high
resolution data, 0.05 for near surface and low
resolution data
24II. Main QC procedures description
- Large temperature inversion and gradient tests
- (World Ocean Data Centre, NODC Ocean Climate
Laboratory)
Relying solely to temperature data to quantify
the maximum allowable temperature increase with
depth (inversion) and decrease (excessive
gradient) with depth (0.3 C per m, 0.7 C per m)
25II. Main QC procedures description
- ARGO Real-Time QC on vertical profiles
Based on the Global Temperature and Salinity
Profile Project GTSPP of IOC/IODE, the automatic
QC tests are Platform identification checks
whether the floats ID corresponds to the correct
WMO number. Impossible date test checks whether
the observation date and time from the float is
sensible. Impossible location test checks
whether the observation latitude and longitude
from the float is sensible. Position on land test
observation latitude and longitude from the
float be located in an ocean. Impossible speed
test checks the position and time of the
floats. Global range test applies a gross
filter on observed values for temperature and
salinity. Regional range test checks for extreme
regional values Pressure increasing test checks
for monotonically increasing pressure Spike test
checks for large differences between adjacent
values. Gradient test is failed when the
difference between vertically adjacent
measurements is too steep. Digit rollover test
checks whether the temperature and salinity
values exceed the floats storage capacity. Stuck
value test checks for all measurements of
temperature or salinity in a profile being
identical. Density inversion Densities are
compared at consecutive levels in a profile, in
both directions, i.e. from top to bottom profile
and from bottom to top. Grey list (7 items)
stop the real-time dissemination of measurements
from a sensor that is not working
correctly. Gross salinity or temperature sensor
drift to detect a sudden and important sensor
drift. Frozen profile test detect a float that
reproduces the same profile (with very small
deviations) over and over again. Deepest pressure
test the profile has pressures not higher than
DEEPEST_PRESSURE plus 10.
26II. Main QC procedures description
- CORIOLIS Real-Time QC on time series
Automatic quality controls test 1 Platform
Identification test 2 Impossible Date Test
test 3 Impossible Location Test test 4
Position on Land Test test 5 Impossible Speed
Test test 6 Global Range Test test 7 Regional
Global Parameter Test for Red Sea and
Mediterranean Sea test 8 Spike Test test 10
comparison with climatology
27II. Main QC procedures description
- CORIOLIS Delayed Mode QC on profiles and time
series
- Automated and Visual QC (already described)
- Objective analysis and residual analysis (to
correct sensor drift and offsets)
- Objective Analysis
- Post objective analysis subjective checks (to
detect unrealistic bulls eyes features in
data sparse areas)
28II. References
- Argo quality control manual, V2.2, 2006
(http//www.coriolis.eu.org/cdc/argo/argo-quality-
control-manual.pdf) - Coriolis Data Centre, In-situ data quality
control, V1.3, 2005 (http//www.coriolis.eu.org/cd
c/documents/cordo-rap-04-047-quality-control.pdf) - GOSUD Real-time QC, go-um-03-01, V1.0, 2003
(https//www.ifremer.fr/bscw/bscw.cgi/0/53815) - Data Type guidelines - ICES Working Group of
Marine Data Management (12 data types)
(http//www.ices.dk/Ocean/guidelines.htm) - GTSPP Real-Time Quality Control Manual, 1990 (IOC
MANUALS AND GUIDES 22) - (http//www.meds-sdmm.dfo-mpo.gc.ca/ALPHAPRO/gtsp
p/qcmans/MG22/guide22_e.htm) - UNESCO/IOC/IODE and MAST, Manual of Quality
Control Procedures for Validation of
Oceanographic Data, 1993 (Manual and Guides 26)
(http//www.jodc.go.jp/info/ioc_doc/Manual/mg26.pd
f) - Medar-Medatlas protocol, Part I Exchange format
and quality checks for observed profiles, V3,
2001 (http//www.ifremer.fr/medar/qc_doc/med_manv3
.doc) - QUALITY CONTROL OF SEA LEVEL OBSERVATIONS,
ESEAS-RI, V1.0, 2006 (http//www.eseas.org/eseas-r
i/deliverables/d1.2/) - QUALITY CONTROL PROCESSING OF HISTORICAL
OCEANOGRAPHIC TEMPERATURE, SALINITY, AND OXYGEN
DATA. Timothy Boyer and Sydney Levitus, 1994.
National Oceanographic Data Centre, Ocean Climate
Laboratory - World Ocean Database 2005 Documentation. Ed.
Sydney Levitus. NODC Internal Report 18,U.S.
Government Printing Office, Washington, D.C., 163
pp - Quality checks at Ifremer/Sismer
(http//www.ifremer.fr/sismer/program/qc_phy/quali
ty_UK.htm) - IGOSS Quality Flags (http//www.nodc.noaa.gov/argo
/qc_flags.htm)
29Topics
- PART A Presentation of general data management
rules - SeaDataNet Data Policy and Data Licence
- Quality Controls
- Rules for metadata sumbission to prevent
duplication - PART B Identification of main stages and
available tools - Reformatting from observation format to common
data format - Quality Controls
- Metadata Concepts and Management (online,
offline)
30III. Causes of the duplicates
- RT and DM profiles from operational oceanography
- Data sets from the GTS (real time transmission)
with rounded values and poorly documented
profiles - International Programmes and data
exchange/dissemination - Data insufficiently documented and attributed to
two different sources - PTS files and same station with other parameters
- Data declassified by the Navies with poor
meta-data -
31III. Why to prevent duplications ?
- effect the products preparation
- (bias the computations)
- mistakenly reported and disseminated data
32III. How to handle the duplicates ?
- There are copies of one data set in several in
several regional databases (ICES), project
(MEDAR) and global databases (WOD05) - The duplicate data should not be reach the
aggregation level - The simplest way the duplicates descriptions
(metadata) must not enter the system - Submit only your national metadata
- (Project coordinator country collator/data
center country)
33Topics
- PART A Presentation of general data management
rules - SeaDataNet Data Policy and Data Licence
- Quality Controls
- Rules for metadata sumbission to prevent
duplication - PART B Identification of main stages and
available tools - Reformatting from observation format to common
data format - Quality Controls
- Metadata Concepts and Management (online,
offline)
34IV. Quality Control Procedures within SeaDataNet
-
- Reformatting
- Quality Controls
- Metadata Management Information Compilation
35IV. Data reformatting
- In general the original formats of the data
files cannot be used in data management - Incomplete/not standardized meta-data
- Incompatibility with QC and other processing
input format - Need of a unique format for safeguarding and
exchanging the data sets of the same type - Data management format, archiving format and
transport (exchange) format may be not
necessarily the same
36IV. Sustainability of an archiving format
- The archiving format should
- be independent from the computer (and libraries)
RDBS are not appropriate - insure that any isolated data includes enough
meta-data to be processed (eg. Location and date) - be compatible and include at least the mandatory
fields (meta-data) requested for the greed
exchange format(s) - Include additional textual or standardized
history or comment fields to prevent any loss
of information - Provide similar structure and meta-data for
different data type such as vertical profiles and
time series -
- These rules are normally followed also for
exchange formats.
37IV. SeaDataNet adopted transport formats
- Obligatory formats
- NetCDF (Binary) for gridded data and 3D
observation data such as ADCP - ODV4 spreadsheet for other data types (vertical
profiles and time series - Optional
- ASCII Medatlas
38IV. SeaDataNet Tool for data reformatting
- a new reformatting tool to convert any ascii file
to Medatlas and ODV formats - In addition interacts with Mikado to produce ISO
19115 XML metadata descriptions - How it works? Next presentation by M.Fichaut
39Topics
- PART A Presentation of general data management
rules - SeaDataNet Data Policy and Data Licence
- Quality Controls
- Rules for metadata sumbission to prevent
duplication - PART B Identification of main stages and
available tools - Reformatting from observation format to common
data format - Quality Controls
- Metadata Concepts and Management (online,
offline)
40V. SeaDataNet Quality Control Standards
- SeaDataNet quality control flags (L201)
- SeaDataNet Protocol V1
- Ocean Data View V4
41SEADATANET Quality Flags values (Based on
IGOSS/UOT/GTSPP Argo QC flags)
42V. Tool for quality control
- Ocean Data View for automatic checks and
visualization - Integration for DIVA
- (presentation by R. Schlitzer)
43Topics
- PART A Presentation of general data management
rules - SeaDataNet Data Policy and Data Licence
- Quality Controls
- Rules for metadata sumbission to prevent
duplication - PART B Identification of main stages and
available tools - Reformatting from observation format to common
data format - Quality Controls
- Metadata Concepts and Management (online,
offline)
44VI. Why Metadata?
- We need the metadata to discover the data
- SeaDataNet built a metadata system to discover
the data - It is ISO19115 compliant for interoperability
with other systems - Partners maintain the system by submitting
metadata - No metadata no discovery of partners data
45VI. Metadata Discovery System
46VI. Metadata Discovery System
47VI. Metadata Discovery System
48VI. Metadata Discovery System
49VI. System Maintenance and Upgrade
- 1. Version 0 2006-2007
- Continuation and maintenance of existing
Sea-Search system - the data access needs several different requests
to each data centres - and the data sets are delivered in different
formats - 2. Version 1 2008-2010
- Setup of the integrated online data services to
users - networking of 10 interoperable data centres of
the Technical Task Team - unique request to the interconnected data centres
- and the data sets are delivered with a unique
format - Presently under test and progressive integration
of 10 data centres during 2008
50VI. How to submit Metadata?
- 1. Compile the information
- For all types of data information is required
about - Where the data were collected location
(preferably as latitude and longitude) and
depth/height - When the data were collected (date and time in
UTC or clearly specified local time zone) - How the data were collected (e.g. sampling
methods, instrument types, analytical techniques) - How the data are referenced (e.g. station
numbers, cast numbers) - Who collected the data, including name and
institution of the data originator(s) and the
principal investigator - What has been done to the data (e.g. details of
processing and calibrations applied, algorithms
used to compute derived parameters) - Comments for other users of the data (e.g.
problems encountered and comments on data quality)
51VI. How to submit Metadata?
- 2. Check for missing mandatory information and
ask the originator - (bold in documents describing the XML schemas,
available at BSCW) - 3. Map your metadata with the fields described in
the XML documents - 4. Prepare the XML files
- Mannually
- using the on-tools (CMS forms)
- MIKADO
- Automatically (if metadata are in a RDBS,
configuration files are needed) - using MIKADO
- local tools
52VI. How to submit Metadata?
- Always validate the XML files using the
available XSD schemas before sending them to the
directory manager (if you used the off-line
tools) - Practical work follows
53VI. Population tools and updating
mechanismsSynopsis
Through on-line (like CMS forms) and off-line
tools that produce XML ISO 19115 compliant
exchanges and developed by the Technical Task
Team in the joint research activities JRA1,JRA2
54SDN Discovery System Contents
EDMED 3.500 CSR 39.648 EDMERP 1.600 EDMO
1.134 CDI 341.499