3rd Training Workshop, 16-19 June, Oostende-IOC Offices General description of data management procedures - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

3rd Training Workshop, 16-19 June, Oostende-IOC Offices General description of data management procedures

Description:

3rd Training Workshop, 16-19 June, Oostende-IOC Offices General description of data management procedures Description of the different steps – PowerPoint PPT presentation

Number of Views:256
Avg rating:3.0/5.0
Slides: 55
Provided by: Autho664
Category:

less

Transcript and Presenter's Notes

Title: 3rd Training Workshop, 16-19 June, Oostende-IOC Offices General description of data management procedures


1
3rd Training Workshop, 16-19 June, Oostende-IOC
Offices General description of data management
procedures Description of the different
steps"From data collection to the SeaDataNet
mgmt system" 
  • Sissy Iona
  • HNODC, Greece

2
Topics
  • PART A Presentation of general data management
    rules
  • SeaDataNet Data Policy and Data Licence
  • Quality Controls
  • Rules for metadata sumbission to prevent
    duplication
  • PART B Identification of main stages and
    available tools
  • Reformatting from observation format to common
    data format
  • Quality Controls
  • Metadata Concepts and Management (online,
    offline)

3
Topics
  • PART A Presentation of general data management
    rules
  • SeaDataNet Data Policy and Data Licence
  • Quality Controls
  • Rules for metadata sumbission to prevent
    duplication
  • PART B Identification of main stages and
    available tools
  • Reformatting from observation format to common
    data format
  • Quality Controls
  • Metadata Concepts and Management (online,
    offline)

4
I. SDN Data Policy History
  • Drafted by Project Office, 02/2007
  • Reviewed by the Steering Committee
  • Validated by the Coordination Group
  • sdn_po07_Data_policy.doc, 04/2007
  • http//www.seadatanet.org/media/seadatanet/files/p
    ublications/seadatanet_data_policy

5
I. SeaDataNet Data Policy
  • It is derived from the INSPIRE directive for
    spatial information taking into account the
    national rules and the SeaDataNet users needs.
  • Objectives
  • to serve the scientific community, public
    organizations, environmental agencies
  • to facilitate the data flow through the
    Transnational Activities by stating clearly the
    conditions for data submission, access and use

6
I. Links and Framework
  • SeaDataNet Data Policy is fully compatible with
    the EU Directives, International Policies, Laws
    and Data Principles
  • Directive 2003/4/EC of the European Parliament
    and of the Council of 28 January 2003 on public
    access to environmental information and repealing
    Council Directive 90/313/EEC (http//ec.europa.eu/
    environment/aarhus/index.htm).
  • INSPIRE Directive for spatial information in the
    Community (http//inspire.jrc.it/home.html)
  • IOC Data Policy (http//ioc3.unesco.org/iode/conte
    nts.php?id200)
  • ICES Data Policy 2006 (https//www.ices.dk/Datacen
    tre/Data_Policy_2006.pdf)
  • WMO Resolution 40 (Cg-XII see http//www.nws.noaa
    .gov/im/wmor40.htm)
  • Implementation plan for the Global Observing
    System for Climate in support of the UNFCCC,
    2004 GCOS 92, WMO/TD No.1219.
  • Global Earth Observation System of Systems GEOSS
    10-Year Implementation Plan Reference Document
    (Final Draft) 2005. GEO 204. February 2005.
  • CLIVAR Initial Implementation Plan, 1998 WCRP
    No. 103, WMO/TS No. 869, ICPO No. 14. June 1998.

7
I. Policy for Data Access and Use
  • Metadata
  • free and open access, no registration required
  • each data centre is obliged to provide the
    meta-data in standardized format to populate the
    catalogue services
  • Data and products
  • visualisation freely available
  • the general case is free and open access (e.g.
    academic purposes)
  • however (due to national policies) mandatory user
    registration is required (using Single Sign One
    (SSO) Service)
  • a SeaDataNet role (partner, academic,
    commercial etc.) is attributed to individual
    user using the Authentication, Authorization
    and Administration (AAA) Service
  • Each NODC attributes the roles to the users of
    its of country
  • Out of the partnership, the roles are assigned by
    SeaDataNet user-desk
  • When register, the user must accept the SDN
    licence agreement
  • each data centre node delivers data according to
    the users role and its local regulation
  • each data centre should provide freely the data
    sets necessary to develop the common products

8
I. SeaDataNet Users Management
9
I. User Agreement on SeaDataNet Licence
10
(No Transcript)
11
Topics
  • PART A Presentation of general data management
    rules
  • SeaDataNet Data Policy and Data Licence
  • Quality Controls
  • Rules for metadata sumbission to prevent
    duplication
  • PART B Identification of main stages and
    available tools
  • Reformatting from observation format to common
    data format
  • Quality Controls
  • Metadata Concepts and Management (online,
    offline)

12
II. SeaDataNet Services
  • SeaDataNet Quality Control is one of the
    off-line services that provides methodologies,
    standards and tools to ensure the reliability,
    compatibility, coherence of the data
  • a common Quality Control Protocol
  • a tool for visualization and automatic checks
    (ODV)

On-line services
off-line services
13
II. QC procedures
  • Overview (IOC, ICES, EU recommendations, MEDAR
    Protocol)
  • automatic and visual controls on the data and
    their metadata.
  • Data measured from the same instrument and coming
    from the same cruise are organized at the same
    file, reformatted to the same exchange format and
    then are subject to a series of quality tests
  • check of the format
  • check of the location and time
  • check of measurements
  • The results of the automatic control are
    attached as QC flags to each data value.
  • Validation or correction is made manually to the
    QC flags and NOT to the data.
  • The results of the QC reported to the data
    originator to give feedback and ask questions.

14
MEDATLAS Quality Flags values (based to the GTSPP
Flag Scale definition)
  • 0 No QC
  • 1 Correct value
  • 2 Out of statistics but not obviously wrong
  • 3 Doubtful value
  • 4 Bad value
  • 5 Modified value (only for the location, date,
    bottom depth)
  • 9 missing value

15
SEADATANET Quality Flags values (L021) (Based on
IGOSS/UOT/GTSPP Argo QC flags)
  • Quality flags
  • 0 No quality control
  • 1 The value appears to be correct
  • 2 The value appears to be probably good
  • 3 The value appears probably bad
  • The value appears erroneous
  • Information flags
  • 5 The value has been changed
  • 6 Below detection limit
  • 7 In excess of quoted value
  • 8 Interpolated value
  • 9 Missing value
  • A Incomplete information

16
II. Main QC procedures description
  • Format Check
  • Detects anomalies like wrong platform codes or
    names, parameters name or units, missing
    mandatory information like reference to a cruise
    or observation system, source laboratory, sensor
    type
  • No further control should be made before the
    correction and validation of the archive format

17
II. Main QC procedures description
  • Check of date and location
  • For vertical profiles
  • duplicate entries
  • date reasonable date, station date within the
    begin and end date of the cruise.
  • ship velocity between two consecutive stations.
  • (e.g., speed gt15 knots means wrong station date
    or wrong station location).
  • location/shoreline on land position
  • bottom sounding out of the regional scale,
    compared with the reference surroundings
  • For time series of fixed mooring
  • sensor depth checks less than the bottom depth
  • series duration checks consistence with the
    start and end date of the dataset
  • duplicate moorings checks
  • land position checks

18
II. Main QC procedures description
  • Duplicates checks
  • Conventional techniques
  • Algorithms
  • - comparison of the location, time of the
    measurements (5 miles, 15 mins in GTSPP)
  • - comparison of the measurements
  • - comparison of extra metadata (platform codes-
    floats id, )
  • Visualization of ships tracks, transects,
  • Advanced techniques
  • Unique data identifier -CRC Tag (GTSPP report
    2002)
  • Keep the most complete data set

19
II. Main QC procedures description
  • Measurements main checks
  • presence of at least two parameters
    vertical/time reference measurement
  • pressure/time must be monotonous increasing
  • the profile/time series must not be constant
    sensor jammed
  • broad range checks check for extreme regional
    values compared with the min. and max. values for
    the region. The broad range check is performed
    before the narrow range check.
  • data points below the bottom depth
  • spikes detection usually requires visual
    inspection. For time series a filter is applied
    first to remove the effect of tides and internal
    waves.
  • narrow range check comparison with pre-existing
    climatological statistics. Time series are
    compared with internal statistics.
  • density inversion test (potential density
    anomaly, FOFONOF and MILLARD, 1983, MILLERO and
    POISSON, 1981)
  • Redfield ratio for nutrients ratio of the
    oxygen, nitrate and alkalinity (carbonates)
    concentration over the phosphate (172, 16 and 122
    in Atlantic and Indian ocean, Takahashi al )

20
II. Broad Range Check
  • Regional parameterization in MEDAR/MEDATLAS II
  • (plus depth parameterization)

21
II. Main QC procedures description
  • Narrow range check
  • qc flag2, probably good data, after auto control
  • qc1, manually
  • The automatic comparison with reference
    climatologies is made by linearly interpolating
    the references at the level of the observation.
  • Outliers are detected if the data points differ
    from the references more than
  • 5 x standard deviation over the shelf (depth
    lt200m)
  • 4 x standard deviation at the slop and straits
    region
  • (200 mlt depth lt 400m)
  • 3 x standard deviation at the deep sea (depth
    gt400m)

22
II. Main QC procedures description
  • Spikes check
  • The test is sensitive to the vertical/time
    resolution.
  • It requires at least 3 consecutive
    good/acceptable values.
  • It requires 2 consecutive at the surface and the
    bottom.
  • The IOC Algorithm to detect the spikes taking
    into account the difference in values (for
    regularly spaced data like CTD)
  • V2-(V3V1)/2 - V1-V3/2 ) gt THRESHOLD VALUE
  • For irregularly spaced values (like bottle data)
    a better algorithm to detect the spikes, taking
    into account the difference in gradients instead
    the difference in values, is
  • (V2-V1)/(P2-P1)-(V3-V1)/(P3-P1) -
    (V3-V1)/(P3-P1) gt THRESHOLD VALUE

23
II. QC procedures description
  • density inversion test, the importance of visual
    check
  • example of density inversion due to temperature
    increase with depth

Suggested threshold value0.03 for high
resolution data, 0.05 for near surface and low
resolution data
24
II. Main QC procedures description
  • Large temperature inversion and gradient tests
  • (World Ocean Data Centre, NODC Ocean Climate
    Laboratory)

Relying solely to temperature data to quantify
the maximum allowable temperature increase with
depth (inversion) and decrease (excessive
gradient) with depth (0.3 C per m, 0.7 C per m)
25
II. Main QC procedures description
  • ARGO Real-Time QC on vertical profiles

Based on the Global Temperature and Salinity
Profile Project GTSPP of IOC/IODE, the automatic
QC tests are Platform identification checks
whether the floats ID corresponds to the correct
WMO number. Impossible date test checks whether
the observation date and time from the float is
sensible. Impossible location test checks
whether the observation latitude and longitude
from the float is sensible. Position on land test
observation latitude and longitude from the
float be located in an ocean. Impossible speed
test checks the position and time of the
floats. Global range test applies a gross
filter on observed values for temperature and
salinity. Regional range test checks for extreme
regional values Pressure increasing test checks
for monotonically increasing pressure Spike test
checks for large differences between adjacent
values. Gradient test is failed when the
difference between vertically adjacent
measurements is too steep. Digit rollover test
checks whether the temperature and salinity
values exceed the floats storage capacity. Stuck
value test checks for all measurements of
temperature or salinity in a profile being
identical. Density inversion Densities are
compared at consecutive levels in a profile, in
both directions, i.e. from top to bottom profile
and from bottom to top. Grey list (7 items)
stop the real-time dissemination of measurements
from a sensor that is not working
correctly. Gross salinity or temperature sensor
drift to detect a sudden and important sensor
drift. Frozen profile test detect a float that
reproduces the same profile (with very small
deviations) over and over again. Deepest pressure
test the profile has pressures not higher than
DEEPEST_PRESSURE plus 10.
26
II. Main QC procedures description
  • CORIOLIS Real-Time QC on time series

Automatic quality controls test 1 Platform
Identification test 2 Impossible Date Test
test 3 Impossible Location Test test 4
Position on Land Test test 5 Impossible Speed
Test test 6 Global Range Test test 7 Regional
Global Parameter Test for Red Sea and
Mediterranean Sea test 8 Spike Test test 10
comparison with climatology
27
II. Main QC procedures description
  • CORIOLIS Delayed Mode QC on profiles and time
    series
  • Automated and Visual QC (already described)
  • Objective analysis and residual analysis (to
    correct sensor drift and offsets)
  • World Ocean Data Centre
  • Objective Analysis
  • Post objective analysis subjective checks (to
    detect unrealistic bulls eyes features in
    data sparse areas)

28
II. References
  • Argo quality control manual, V2.2, 2006
    (http//www.coriolis.eu.org/cdc/argo/argo-quality-
    control-manual.pdf)
  • Coriolis Data Centre, In-situ data quality
    control, V1.3, 2005 (http//www.coriolis.eu.org/cd
    c/documents/cordo-rap-04-047-quality-control.pdf)
  • GOSUD Real-time QC, go-um-03-01, V1.0, 2003
    (https//www.ifremer.fr/bscw/bscw.cgi/0/53815)
  • Data Type guidelines - ICES Working Group of
    Marine Data Management (12 data types)
    (http//www.ices.dk/Ocean/guidelines.htm)
  • GTSPP Real-Time Quality Control Manual, 1990 (IOC
    MANUALS AND GUIDES 22)
  • (http//www.meds-sdmm.dfo-mpo.gc.ca/ALPHAPRO/gtsp
    p/qcmans/MG22/guide22_e.htm)
  • UNESCO/IOC/IODE and MAST, Manual of Quality
    Control Procedures for Validation of
    Oceanographic Data, 1993 (Manual and Guides 26)
    (http//www.jodc.go.jp/info/ioc_doc/Manual/mg26.pd
    f)
  • Medar-Medatlas protocol, Part I Exchange format
    and quality checks for observed profiles, V3,
    2001 (http//www.ifremer.fr/medar/qc_doc/med_manv3
    .doc)
  • QUALITY CONTROL OF SEA LEVEL OBSERVATIONS,
    ESEAS-RI, V1.0, 2006 (http//www.eseas.org/eseas-r
    i/deliverables/d1.2/)
  • QUALITY CONTROL PROCESSING OF HISTORICAL
    OCEANOGRAPHIC TEMPERATURE, SALINITY, AND OXYGEN
    DATA. Timothy Boyer and Sydney Levitus, 1994.
    National Oceanographic Data Centre, Ocean Climate
    Laboratory
  • World Ocean Database 2005 Documentation. Ed.
    Sydney Levitus. NODC Internal Report 18,U.S.
    Government Printing Office, Washington, D.C., 163
    pp
  • Quality checks at Ifremer/Sismer
    (http//www.ifremer.fr/sismer/program/qc_phy/quali
    ty_UK.htm)
  • IGOSS Quality Flags (http//www.nodc.noaa.gov/argo
    /qc_flags.htm)

29
Topics
  • PART A Presentation of general data management
    rules
  • SeaDataNet Data Policy and Data Licence
  • Quality Controls
  • Rules for metadata sumbission to prevent
    duplication
  • PART B Identification of main stages and
    available tools
  • Reformatting from observation format to common
    data format
  • Quality Controls
  • Metadata Concepts and Management (online,
    offline)

30
III. Causes of the duplicates
  • RT and DM profiles from operational oceanography
  • Data sets from the GTS (real time transmission)
    with rounded values and poorly documented
    profiles
  • International Programmes and data
    exchange/dissemination
  • Data insufficiently documented and attributed to
    two different sources
  • PTS files and same station with other parameters
  • Data declassified by the Navies with poor
    meta-data

31
III. Why to prevent duplications ?
  • effect the products preparation
  • (bias the computations)
  • mistakenly reported and disseminated data

32
III. How to handle the duplicates ?
  • There are copies of one data set in several in
    several regional databases (ICES), project
    (MEDAR) and global databases (WOD05)
  • The duplicate data should not be reach the
    aggregation level
  • The simplest way the duplicates descriptions
    (metadata) must not enter the system
  • Submit only your national metadata
  • (Project coordinator country collator/data
    center country)

33
Topics
  • PART A Presentation of general data management
    rules
  • SeaDataNet Data Policy and Data Licence
  • Quality Controls
  • Rules for metadata sumbission to prevent
    duplication
  • PART B Identification of main stages and
    available tools
  • Reformatting from observation format to common
    data format
  • Quality Controls
  • Metadata Concepts and Management (online,
    offline)

34
IV. Quality Control Procedures within SeaDataNet
  • Reformatting
  • Quality Controls
  • Metadata Management Information Compilation

35
IV. Data reformatting
  • In general the original formats of the data
    files cannot be used in data management
  • Incomplete/not standardized meta-data
  • Incompatibility with QC and other processing
    input format
  • Need of a unique format for safeguarding and
    exchanging the data sets of the same type
  • Data management format, archiving format and
    transport (exchange) format may be not
    necessarily the same

36
IV. Sustainability of an archiving format
  • The archiving format should
  • be independent from the computer (and libraries)
    RDBS are not appropriate
  • insure that any isolated data includes enough
    meta-data to be processed (eg. Location and date)
  • be compatible and include at least the mandatory
    fields (meta-data) requested for the greed
    exchange format(s)
  • Include additional textual or standardized
    history or comment fields to prevent any loss
    of information
  • Provide similar structure and meta-data for
    different data type such as vertical profiles and
    time series
  • These rules are normally followed also for
    exchange formats.

37
IV. SeaDataNet adopted transport formats
  • Obligatory formats
  • NetCDF (Binary) for gridded data and 3D
    observation data such as ADCP
  • ODV4 spreadsheet for other data types (vertical
    profiles and time series
  • Optional
  • ASCII Medatlas

38
IV. SeaDataNet Tool for data reformatting
  • a new reformatting tool to convert any ascii file
    to Medatlas and ODV formats
  • In addition interacts with Mikado to produce ISO
    19115 XML metadata descriptions
  • How it works? Next presentation by M.Fichaut

39
Topics
  • PART A Presentation of general data management
    rules
  • SeaDataNet Data Policy and Data Licence
  • Quality Controls
  • Rules for metadata sumbission to prevent
    duplication
  • PART B Identification of main stages and
    available tools
  • Reformatting from observation format to common
    data format
  • Quality Controls
  • Metadata Concepts and Management (online,
    offline)

40
V. SeaDataNet Quality Control Standards
  • SeaDataNet quality control flags (L201)
  • SeaDataNet Protocol V1
  • Ocean Data View V4

41
SEADATANET Quality Flags values (Based on
IGOSS/UOT/GTSPP Argo QC flags)
42
V. Tool for quality control
  • Ocean Data View for automatic checks and
    visualization
  • Integration for DIVA
  • (presentation by R. Schlitzer)

43
Topics
  • PART A Presentation of general data management
    rules
  • SeaDataNet Data Policy and Data Licence
  • Quality Controls
  • Rules for metadata sumbission to prevent
    duplication
  • PART B Identification of main stages and
    available tools
  • Reformatting from observation format to common
    data format
  • Quality Controls
  • Metadata Concepts and Management (online,
    offline)

44
VI. Why Metadata?
  • We need the metadata to discover the data
  • SeaDataNet built a metadata system to discover
    the data
  • It is ISO19115 compliant for interoperability
    with other systems
  • Partners maintain the system by submitting
    metadata
  • No metadata no discovery of partners data

45
VI. Metadata Discovery System
46
VI. Metadata Discovery System
47
VI. Metadata Discovery System
48
VI. Metadata Discovery System
49
VI. System Maintenance and Upgrade
  • 1. Version 0 2006-2007
  • Continuation and maintenance of existing
    Sea-Search system
  • the data access needs several different requests
    to each data centres
  • and the data sets are delivered in different
    formats
  • 2. Version 1 2008-2010
  • Setup of the integrated online data services to
    users
  • networking of 10 interoperable data centres of
    the Technical Task Team
  • unique request to the interconnected data centres
  • and the data sets are delivered with a unique
    format
  • Presently under test and progressive integration
    of 10 data centres during 2008

50
VI. How to submit Metadata?
  • 1. Compile the information
  • For all types of data information is required
    about
  • Where the data were collected location
    (preferably as latitude and longitude) and
    depth/height
  • When the data were collected (date and time in
    UTC or clearly specified local time zone)
  • How the data were collected (e.g. sampling
    methods, instrument types, analytical techniques)
  • How the data are referenced (e.g. station
    numbers, cast numbers)
  • Who collected the data, including name and
    institution of the data originator(s) and the
    principal investigator
  • What has been done to the data (e.g. details of
    processing and calibrations applied, algorithms
    used to compute derived parameters)
  • Comments for other users of the data (e.g.
    problems encountered and comments on data quality)

51
VI. How to submit Metadata?
  • 2. Check for missing mandatory information and
    ask the originator
  • (bold in documents describing the XML schemas,
    available at BSCW)
  • 3. Map your metadata with the fields described in
    the XML documents
  • 4. Prepare the XML files
  • Mannually
  • using the on-tools (CMS forms)
  • MIKADO
  • Automatically (if metadata are in a RDBS,
    configuration files are needed)
  • using MIKADO
  • local tools

52
VI. How to submit Metadata?
  • Always validate the XML files using the
    available XSD schemas before sending them to the
    directory manager (if you used the off-line
    tools)
  • Practical work follows

53
VI. Population tools and updating
mechanismsSynopsis
Through on-line (like CMS forms) and off-line
tools that produce XML ISO 19115 compliant
exchanges and developed by the Technical Task
Team in the joint research activities JRA1,JRA2
54
SDN Discovery System Contents
EDMED 3.500 CSR 39.648 EDMERP 1.600 EDMO
1.134 CDI 341.499
Write a Comment
User Comments (0)
About PowerShow.com