XML Data Validation - PowerPoint PPT Presentation

About This Presentation
Title:

XML Data Validation

Description:

But almost all values in the record are fake and invalid ... sch:pattern name = 'Final Checks' id = 'completed' sch:rule context = 'house' ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 15
Provided by: exchange
Category:
Tags: xml | data | fake | id | validation

less

Transcript and Presenter's Notes

Title: XML Data Validation


1
The Exchange NetworkNode Mentoring Workshop
  • XML Data Validation
  • An Open QA Framework
  • February 28, 2005

2
Topics
  • XML Schema Validation
  • Limitations of Schema Validation
  • Schematron and extensible stylesheet language
    transformation (XSLT)
  • Data Validation Process
  • Implementation and Tools
  • Conclusion

3
XML Schema Validation
  • Validate if an instance is a well-formed XML
    document
  • Schema validates data types
  • Schema validates data structures (child and
    sibling relationships)

4
Limitations of Schema Validation
  • Schema validation cannot
  • Attribute Constrain If attribute X has a value,
    attribute Y is required
  • Validate Logic Relations If the parent of
    element A is element B, it must have an attribute
    Y, otherwise an attribute Z
  • Validate Dependency If element X has a value M,
    then Y must exist

5
Limitations of Schema Validation
  • Formatted String A date must have a format of
    mm-dd-yyyy
  • Length Constrain A value length must be between
    9 - 10
  • Multiple Ranges Data must be in the 45-50 and
    100-200 range
  • Custom Simple Types i.e., FacilityID

6
NEI Data Example
  • The XML segment is valid according to NEI schema.
    But almost all values in the record are fake and
    invalid
  • You really cannot assure data quality using
    schema validation alone
  • ltTransmittalSubmissionGroup schemaVersion"3.0"gt
  • ltTransmittalRecordTypeCodegtOOlt/TransmittalRecord
    TypeCodegt
  • ltCountyStateFIPSCodegtStringlt/CountyStateFIPSCode
    gt
  • ltOrganizationFormalNamegtStringlt/OrganizationForm
    alNamegt
  • ltTransactionTypeCodegtStlt/TransactionTypeCodegt
  • ltInventoryYeargt1000lt/InventoryYeargt
  • ltInventoryTypeCodegtStringlt/InventoryTypeCodegt
  • ltTransactionCreationDategt10000000lt/TransactionCr
    eationDategt
  • ltSubmissionNumbergt0lt/SubmissionNumbergt
  • ltReliabilityIndicatorgt0lt/ReliabilityIndicatorgt
  • ltTransactionCommentgtStringlt/TransactionCommentgt
  • ltIndividualFullNamegtStringlt/IndividualFullNamegt
  • ltTelephoneNumbergtStringlt/TelephoneNumbergt
  • ltTelephoneNumberTypeNamegtStringlt/TelephoneNumber
    TypeNamegt
  • ltElectronicAddressTextgtStringlt/ElectronicAddress
    Textgt
  • ltElectronicAddressTypeNamegtStringlt/ElectronicAdd
    ressTypeNamegt
  • ltSourceTypeCodegtStringlt/SourceTypeCodegt
  • ltAffiliationTypeTextgtStringlt/AffiliationTypeText
    gt
  • ltFormatVersionNumbergt0lt/FormatVersionNumbergt

7
Schematron
  • An XML schema language
  • Combine powerful validation capability with
    simple syntax
  • Based on XSLT and XPath
  • Open Source Implementation (OSI)
  • Currently undergoing Industry Standards
    Organization (ISO) standardization (ISO/IEC 19757
    - DSDL Document Schema Definition Language)

8
Schematron Rules
  • A schematron rule has three major parts
  • The context The element to which a rule applies
  • An assertion A statement about an element,
    usually an XPath expression
  • A result A statement to be reported if an
    assertion fails or succeeds

9
Schematron Rule Example
ltschpattern name Final Checks id
completedgt ltschrule context housegt
ltschassert test count(wall) 4gtA house
should have 4 walls.lt/schassertgt lt/schrulegt
lt/schpatterngt
10
Flow Data Validation Process
11
Pros and Cons
  • Simple rule-based XML validation framework
  • Promote natural language description of errors
  • Based on open standards (XSLT and XPath)
  • Open Source Schematron implementation
  • Lack of regular expression support
  • Custom validations against existing registries /
    dictionaries not available

12
Schematron with Extensions
13
Current Implementation
  • A set of Web methods
  • Provides both schema validation and schematron
    validation
  • Has synchronous and asynchronous modes
  • Supports table lookups to any database tables
  • Can process compressed or uncompressed XML
    documents
  • Accessible to any nodes, applications, or users

14
Conclusion
  • Streamlined data validation is crucial to
    successful data exchange
  • Data validation should happen as early as
    possible
  • Technologies and tools are available for boosting
    data quality
  • Schematron is a recommended direction
Write a Comment
User Comments (0)
About PowerShow.com