Title: W3C XML Schema: what you might not know (and might or might not like!)
1W3C XML Schema what you might not know (and
might or might not like!)
- Noah Mendelsohn
- Distinguished Engineer
- IBM Corp.
- October 10, 2002
2Topics
- Quick review of XML concepts
- Why XML Schema?
- What is XML Schema?
- Where do schemas come from?
- A few validation tricks
- Wrapup
3 Warning! To save screen
space, some examples are simplified. Namespace
decls. are omitted, only the key parts of schema
declarations are shown, etc.
4Quick review of XMLconcepts
5This is an XML document
lt?xml version1.0?gt lte1gt lte2gt
lte3 a1123 /gt lte2gt lt/e1gt
6Infoset the XML data model
lt?xml version1.0?gt lte1gt lte2gt lte3
a1123 /gt lte2gt lt/e1gt
7More on XML infosets
- XML 1.0 describes only documents with angle
bracket syntax ltgt - Infosets also describe DOM, SAX, and other
representations - XML Schema validates infosetsapplies to all of
the representations - XML Schema can validate from any element
information item (e.g. e1 or e2)
8Why XML Schema?
9What are schemas for?
- Contracts agreeing on formats
- Tool building know what the data will be before
the first instance shows up - Database integration
- User interface tools
- Programming language bindings
- Validation make sure we got what we expected
10What is XML Schema?
11This is an XML document
lt?xml version1.0?gt ltmynse1
xmlnsmynshttp//example.org/myns
xmlnsyournshttp//example.org/yournsgt
ltmynse2gt ltyournse1 a1xyz/gt
ltmynse3 a1123 mynsa1456/gt
ltyournse1 mynsa1456/gt lt/mynse2gt
ltyournse4/gt lt/mynse1gt
12This is an XML schema
ltxsdschema targetNamespacehttp//example
.org/myns xmlnsxsd"http//www.w3.org/200
1/XMLSchema" ..namespaces ommitted to
protect innocent..gt lt!- declare element e1
-gt ltxsdelement namee1gt ltxsdsequencegt
ltxsdelement namee2/gt
ltxsdelement refyournse4/gt
lt/xsdsequencegt lt/xsdelement lt/xsdschemagt
13This is an XML schema document
ltxsdschema targetNamespacehttp//example
.org/myns xmlnsxsd"http//www.w3.org/200
1/XMLSchema" ..namespaces ommitted to
protect innocent..gt lt!- declare element e1
-gt ltxsdelement namee1gt ltxsdsequencegt
ltxsdelement namemynse2/gt
ltxsdelement refyournse4/gt
lt/xsdsequencegt lt/xsdelement lt/xsdschemagt
14This is an XML document
lt?xml version1.0?gt ltmynse1gt
xmlnsmynshttp//example.org/myns
xmlnsyournshttp//example.org/yournsgt
ltmynse2gt ltyournse1 a1xyz/gt
ltmynse3 a1123 mynsa1456/gt
ltyournse1 mynsa1456/gt lt/mynse2gt
ltyournse4/gt lt/mynse1gt
To validate this, we need gt1 schema document
15Import brings in declarations for other namespaces
ltxsdschema targetNamespacehttp//example
.org/myns.xsd xmlnsxsd"http//www.w3.org
/2001/XMLSchema" ..namespace ommitted to
protect innocent..gt ltimport namespacehttp//e
xample.org/yourns schemaLocationhttp//ex
ample.org/yourns.xsdgt ltxsdelement
namee1gt ltxsdsequencegt
ltxsdelement namemynse2/gt ltxsdelement
refyournse4/gt lt/xsdsequencegt
lt/xsdelement lt!- declare element e2 -gt
ltxsdelement namee2 type/gt lt/xsdschemagt
16Terminology
17Cool tricks with components
- In memory schemas
- Handy tools for working with schemas
- Build the components for you
- Resolve subtyping across namepaces, etc.
- Examples
- http//www.eclipse.org/xsd
- Henry Thompsons XSV
- Conformance testing
18How to read the spec.
- 3.3 Element Declarations
- 3.3.1 The Element Declaration Schema Component
- 3.3.2 XML Representation of Element Declaration
Schema Components - 3.3.3 Constraints on XML Representations of
Element Declarations - 3.3.4 Element Declaration Validation Rules
- 3.3.5 Element Declaration Information Set
Contributions - 3.3.6 Constraints on Element Declaration Schema
Components - Warning the spec. never gives any rule twice!
19Post-schema validation infoset (PSVI)
- Fearsome title, simple concept
- Infoset the data model for an XML
documenttells you what you can know (that
matters) after a parse. - PSVI tells you what you can know after a
validation - What parts of doc are valid?
- Per which types?
- Default values
- Etc.
20Self-describing vs. schema- described docs
- You can use xsitype in your documents lte
xsitypexsdintegergt123lt/egt - Use xsitype with built ins (and no attributes)
- Your document is nearly self-describing
- SOAP encoding supports this
- xsitype with your own types
- Partially self-describing
- You know the type names need schema to know
what types are - SOAP 1.2 Encoding supports this too!
21Where do schemas come from?
22How are schema components found?
- In short, wherever you want!
- Hint from schema
- ltxsdimport ns schemaLocationyyy.xsd/gt
- Hint from instance
- ltmynse1 schemaLocation myNSUri yyy.xsd/gt
- Processor command line or config
- Compiled into application (validating HTML editor)
23Why all this flexibility?
- gt 1 schema / namespace (versions, bug fixes,
experiments, etc.) - Who gets control?
- Docheads want to name schema in instance
- eCommerce do you trust the schema named in a
purchaseOrder? - Ultimately the application chooses
- Synthetic DB builds it dynamically
24Streaming
- Most validation can be done 1 pass
- Id/idref, key/keyref require limited lookaside
- Problem
- ltmyinstancegt 10Mbytes of data here lt!-
oops..need a new schema! --gt ltnewnsa
schemaLocnewnsUri xxxgt lots more
datalt/mysinstancegt - Answer
- Assemble schema incrementally or in advance
- Result must be same cant tell which from the
outside!
25Our language vs. your language why ltimportgt?
ltxsdschema targetNamespacehttp//example.org/ns
1 xmlnsns1http//example.org/ns1
xmlnsxsd"http//www.w3.org/2001/XMLSchema"gt
ltxsdelement nameX/gt ltxsdelement
refns1X/gt lt/xsdschemagt
A fragment of a schema document
26Our language vs. your language why ltimportgt?
ltxsdschema targetNamespacehttp//example.org/ns
1 xmlnsns1http//example.org/ns1
xmlnsns2http//example.org/ns2
xmlnsxsd"http//www.w3.org/2001/XMLSchema"gt
ltimport namespacehttp//example.org/ns2gt
ltxsdelement nameX/gt ltxsdelement
refns1X/gt ltxsdelement refns2Y/gt
lt/xsdschemagt
Add a reference to an external element
27Our language vs. your language why ltimportgt?
Unimported namespaces enhance the schema language
Imported namespaces enhance your language.
ltxsdschema targetNamespacehttp//example.org/ns
1 xmlnsns1http//example.org/ns1
xmlnsns2http//example.org/ns2
xmlnsxsd"http//www.w3.org/2001/XMLSchema
xmlnsxsd2"http//www.w3.org/2004/XMLSchemagt
ltimport namespacehttp//example.org/ns2gt
ltxsdelement nameX/gt ltxsdelement
refns1X/gt ltxsdelement refns2Y/gt
ltxsd2betterElement namenewone
/gt lt/xsdschemagt
Enhance the schema language!
28A few validation tricksModeling content
29How to validate this?
ltsoapEnvelopegt ltsoapBodygt your message
here lt/soapBodygt lt/soapEnvelopegt
- What is the content model for ltsoapBodygt?
- Can you validate the contents?
30Inside/out vocabularies (some specific SOAP
examples)
lt! SOAP PURCHASE ORDER --gt ltsoapEnvelope
xmlnssoaphttp//www.w3.org/2002/06/soap-envelop
egt ltsoapBodygt ltpopurchaseOrder
xmlnspohttp//example.org/pogt
lt/popurchaseOrdergt lt/soapBodygt lt/soapEnvelope
gt lt! SOAP INVOICE --gt ltsoapEnvelope
xmlnssoaphttp//www.w3.org/2002/06/soap-envelop
egt ltsoapBodygt ltinvinvoice
xmlnspohttp//example.org/invgt
lt/invinvoicegt lt/soapBodygt lt/soapEnvelopegt
31Schemas for envelopes
Putting skip here says dont validate the
content of the body
ltxsdcomplexType namebodyTypegt ... ltxsdsequ
encegt ltxsdany processContentsski
p/gt lt/xsdsequencegt ... lt/xsdcomplexTypegt
32Schemas for envelopes
Putting strict here says you must have
declarations and must successfully validate the
contents of body.
ltxsdcomplexType namebodyTypegt ...
ltxsdsequencegt ltxsdany
processContentsstrict/gt lt/xsdsequencegt ...
lt/xsdcomplexTypegt
33Schemas for envelopes
Putting lax here says validate only if your
schema has declarations for the contents
ltxsdcomplexType namebodyTypegt ... ltxsdsequ
encegt ltxsdany processContentslax
/gt lt/xsdsequencegt ... lt/xsdcomplexTypegt
34Versioning vocabularies schemas
- Its hard!
- Use namespaces?
- Do 50 bug fixes give you 50 namespaces?
- How much interop? Does old schema accept new
version? - What about Xpath?
- For better or worse schemas has no organized
model for versioning
35Inheritance why have it?
- Allow reuse of definitions
- Model real-world inheritance and polymorphism
- Substitutability
- Mappings to programming systems w/inheritance
- Schemas provides mechanisms offering parial
solutions to these problems
36Refinement vs. Extension
- Data inheritance is different from method
inheritance - No active code receiver sees everything
order matters, e.g. for multiple inheritance - Innovation(?) in schema
- Restriction subtype is a subset (supports
substitutability) - Extension subtype builds on base (supports
modular development, some mappings to real world
and programming languages.) - No multiple inheritance (for now)
37Wrapup
38Some things I learned
- No such thing as a simple feature
- Big committee -gt big language
- Documents data together are cool
- But neither community gets a simple schema
language - Make realistic schedules we didnt make time to
pull features
39 Thank you!