XML Primer - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

XML Primer

Description:

A way to mark up a document to expose the structure of the document to ... Three digits, followed by a dash, followed by two uppercase letters. Complex Types ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 65
Provided by: columbusst
Category:
Tags: xml | dash | primer

less

Transcript and Presenter's Notes

Title: XML Primer


1
XML Primer
2
XML
  • Extensible Markup Language
  • Introduced in 1998
  • A way to mark up a document to expose the
    structure of the document to machine processing

3
What Was Wrong With HTML?
  • Intended to expose documents to browsers for
    display
  • Browsers need to display documents that arent
    properly marked up
  • Documents dont have to conform
  • Markup tags are predefined and cant be created
    by a user

4
Whats Good About XML?
  • Standard Format
  • If a document doesnt conform, it isnt XML
  • User-defined tags (Invent your own language)
  • XML is evolving
  • Many technologies have sprung up around it
  • DTDs, Schema, Namespaces, Encryption, Signature,
    XPointer, XLink, XPath, XSLT, DOM, SAX, RDF,
    SOAP, JAXB, JAXP, JAXM, JAXR, SOAP, WSDL, UDDI,
    BPEL,!

5
Whats Needed For Web Services?
  • The rules for creating XML documents
  • XML Schema a way to describe the structure of a
    document
  • XML Namespaces Definitions of mechanisms for
    combining documents from different sources
  • XML Processing How to parse and manipulate a
    document from Java

6
What Does XML Look Like?
  • Optional Prolog
  • Root Element
  • Elements
  • Attributes

7
Prolog
  • Identifies the document as XML
  • Includes comments about the document
  • Includes meta-information about the content

8
Processing Instruction (PI)
  • lt? ?gt
  • lt?PITarget ?gt
  • PITarget meaningful keyword
  • lt?xml version1.0 encodingUTF-8?gt
  • UTF-8 Unicode 8 bit Good for English over the
    internet. Preserves 8-bit ASCII

9
Comments
  • lt!-- This is a comment --gt
  • Can span multiple lines
  • Cant be nested

10
Elements
  • An element is the pairing of a start tag with an
    end tag
  • ltnamegtDavid Woolbrightlt/namegt
  • Every start tag must have a matching end tag
  • Everything between the tags is the content of the
    element
  • Tags are user-defined

11
Tags
  • Begin with a letter
  • 0-9, A-Z, a-z, _ ,- ,
  • XML is case sensitive

12
Content Types of Elements
  • Element only
  • ltagtltnamegtDavidlt/namegtlt/agt
  • Mixed
  • ltagtltbgtxxxlt/bgt yyy lt/agt
  • Empty
  • ltagegtlt/agegt
  • or
  • ltage/gt

13
XML Uses Proper Nesting
  • Tags can contain tags, but tags cannot overlap
  • ltagt ltbgt lt/bgt lt/agt Yes
  • ltagt ltbgt lt/agt lt/bgt No

14
Documents Have a Single Root Element
  • lt?xml version1.0 encodingUTF-8?gt
  • lt! Sample document --gt
  • ltnamegt
  • ltfirstgtDavidlt/firstgt
  • ltmigtElt/migt
  • ltlastgtWoolbrightlt/lastgt
  • lt/namegt

15
XML Rules Produce Hierarchies
name
first mi last
16
XML Terminology
  • Parent element
  • Child element
  • Sibling element
  • Ancestor
  • Descendant

A B C D E
17
Attributes
  • An attribute is a name-value pair
  • Tags can contain 0 or more attributes
  • Attribute syntax namevalue
  • ltpo id1276 custid83730gt lt/pogt
  • Some attributes are reserved
  • xmllangen

18
Semantics
  • XML applications can attach any semantics they
    choose to XML markup
  • Attributes can be used to refer to other parts of
    a document in order to prevent duplication of
    information

19
Element vs Attributes
  • ltwork number653.323.3938gt
  • Or
  • ltworkgt
  • ltareagt653lt/areagt
  • ltexchangegt323lt/exchangegt
  • ltnumbergt3938lt/numbergt
  • lt/workgt

20
Character Data and Entities
  • All character data must comply with the
    documents encoding
  • Other characters must be escaped
  • Start with , finish with
  • Example x80 128
  • lt gt quot apos amp

21
CDATA
  • Multi-character escape construct
  • Syntax
  • lt!CDATAany sequence of chars gt
  • Example
  • ltMYXMLDATAgt
  • lt!CDATAltAgtltBgtlt/Bgtlt/Agt gt
  • lt/MYXMLDATAgt

22
XML Namespaces
  • XML documents can be composed to create new
    documents
  • Name conflicts can occur when documents are
    combined
  • Conflicts are resolved by qualification
  • Qualified Name Namespace ID Local Name

23
URIs U Know?
  • XML namespaces use Uniform Resource Identifiers
    (URI) for Namespace Identifiers
  • URIs can be locators, names or both
  • URL http//www.colstate.edu
  • URN URIs that are globally unique and
    persistent
  • UUID Universally Unique Identifiers 128 bit
    ids that are globally unique (Ethernet address
    high-precision timestamp increment counter).
    Used as unique ids in UDDI

24
Namespace Syntax
  • URIs are long and may contain characters not
    allowed in XML element names
  • Syntax involves two steps
  • Namespace ID is associated with a prefix
  • Qualified names are obtained as a combination of
    the prefix, a colon character, and the local
    element name

25
Namespace Example
  • ltmsgmessage fromxxx xmlnsmsghttp//www.xcomm
    e.com/ns/message xmlnspohttp//www.skatestown.co
    m/ns/pogt
  • ltmsgtextgtHi therelt/msgtextgt
  • ltpotextgtHello alllt/potextgt
  • lt/msgmessagegt

26
Default Namespaces
  • Namespaces increase document size and reduce
    readability
  • Default namespaces can be specified
  • Elements in the default space dont need a prefix
  • ltmsgmessage fromxxx
  • xmlns http//www.xcomme.com/ns/message
    xmlnspohttp//www.skatestown.com/ns/pogt
  • lttextgtHi therelttextgt
  • ltpotextgtHello alllt/potextgt
  • lt/msgmessagegt

27
Namespace-Prefixed Attributes
  • Attributes can have namespaces
  • ltpoitem sku318-BP xmlnspohttp//www.xxx.yyygt
  • ltpo.order popriorityhighgt

28
Dereferencing URI
  • In many cases the URI is a URL
  • What happens if the URL resource is unavailable?
  • It doesnt matter
  • URI is for identification purposes only

29
XML Schemas
  • Document Type Definition (DTD) a set of rules
    for describing the structure of an XML document
  • DTDs help attach meaning to a document
  • DTD dont address namespace integration, flexible
    content models
  • DTDs arent written in XML

30
Well-formed XML
  • If a document conforms to the rules of XML syntax
    (nested tags, one root tag, ), it is well-formed
    .
  • XML processing software can read well-formed
    documents without problems
  • XML parsers generate immediate non-recoverable
    errors when they detect the document isnt
    well-formed

31
Valid XML
  • A document is valid if it conforms to the rules
    of a DTD or Schema
  • The logic for making sure the document is valid
    lies inside the parser, relieving the application
    of this burden

32
Schema Benefits
  • Schemas enable the following
  • Identification of the elements the document can
    contain
  • Identification of the order and relation of the
    elements
  • Identification of the attributes of every element
    and whether they are optional
  • Identification of the datatype of attribute
    content

33
A Simple Schema (W3Schools)
  • lt?xml version"1.0"?gt
  • ltxsschema xmlnsxs"http//www.w3.org/2001/XMLSch
    ema" targetNamespace"http//www.w3schools.com"
    xmlns"http//www.w3schools.com"
    elementFormDefault"qualified"gt
  • ltxselement name"note"gt
  • ltxscomplexTypegt
  • ltxssequencegt
  • ltxselement name"to" type"xsstring"/gt
  • ltxselement name"from" type"xsstring"/gt
  • ltxselement name"heading"
    type"xsstring"/gt
  • ltxselement name"body" type"xsstring"/gt
  • lt/xssequencegt lt/xscomplexTypegt
  • lt/xselementgt
  • lt/xsschemagt

34
XML Referencing the Schema
  • lt?xml version"1.0"?gt
  • ltnote xmlns"http//www.w3schools.com"
    xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
    ce" xsischemaLocation"http//www.w3schools.com
    note.xsd"gt
  • lttogtTovelt/togt
  • ltfromgtJanilt/fromgt
  • ltheadinggtReminderlt/headinggt
  • ltbodygtDon't forget me this weekend!lt/bodygt
  • lt/notegt

35
The Schema Root
  • lt?xml version1.0 ?gt
  • ltxsschemagt
  • ltxsschemagt

36
Schemas Define Elements
  • A simple element is one that only contains text
  • Syntax for defining a simple element
  • ltxselement namexxx type yyy/gt
  • XML Schema has built in data types

37
XML Scheme Built-in Data Types
  • xsstring
  • xsdecimal
  • xsinteger
  • xsboolean
  • xsdate
  • xstime

38
Some Schema Element Definitions
  • ltxselement namefirstname typexsstring/gt
  • The document could contain
  • ltfirstnamegtDavidlt/firstnamegt
  • ltxselement nameage typexsinteger
    default0/gt
  • The document could contain
  • ltagegt89lt/agegt

39
Attributes
  • Simple elements cant have attributes
  • Elements with attributes are complex
  • Attributes can have a default value or a fixed,
    specified value

40
Some Schema Attribute Definitions
  • ltxsattribute namefirstname typexsstring/gt
  • The document could contain
  • ltfirstnamegtDavidlt/firstnamegt
  • ltxsattribute nameage typexsinteger
    default0/gt
  • The document could contain
  • ltagegt89lt/agegt

41
Default and Fixed Values
  • ltxsattribute namefirstname typexsstring
  • defaultJoe/gt
  • ltxsattribute namefirstname typexsstring
  • fixedJoe/gt

42
Optional and Required Attributes
  • All attributes are optional by default
  • Specify use for required attributes
  • ltxsattribute namefirstname typexsstring
  • userequired/gt

43
Restrictions on Content
  • When an attribute has a defined data type, the
    content of the XML document must conform to the
    type, otherwise the document wont validate
  • Other restrictions called facets can be added
    to elements and attributes

44
Restriction Types
  • length, minlength,maxlength the exact, minimum,
    and maximum character length of the value
  • pattern a regular expression for the value
  • enumeration a list of possible values
  • whitespace rules for handling whitespace
  • minExclusive,minInclusive,maxExclusive the
    range of digits allowed
  • totalDigits the number of digits in a numeric
    value
  • fractionDigits the number of digits after the
    decimal pt

45
Facets Restricting Range
  • ltxselement nameagegt
  • ltxssimpleTypegt
  • ltxsrestriction basexsintegergt
  • ltxsminInclusive value0/gt
  • ltxsmaxInclusive value120/gt
  • lt/xsrestrictiongt
  • lt/xssimpleTypegt
  • lt/xselementgt

46
Facets Restricting Values
  • ltxselement nameagegt
  • ltxssimpleTypegt
  • ltxsrestriction basexsstringgt
  • ltxsenumeration valueAudi/gt
  • ltxsenumeration valueAudi/gt
  • ltxsenumeration valueAudi/gt
  • lt/xsrestrictiongt
  • lt/xssimpleTypegt
  • lt/xselementgt

47
Reformulated
  • ltxselement nameage typecartype/gt
  • ltxssimpleType namecartypegt
  • ltxsrestriction basexsstringgt
  • ltxsenumeration valueAudi/gt
  • ltxsenumeration valueAudi/gt
  • ltxsenumeration valueAudi/gt
  • lt/xsrestrictiongt
  • lt/xssimpleTypegt

48
Using Patterns
  • ltxssimpleType nameskuTypegt
  • ltxsrestriction basexsstringgt
  • ltxspattern value\d3-A-Z2/gt
  • lt/xsrestrictiongt
  • lt/xssimpleTypegt
  • Three digits, followed by a dash, followed by two
    uppercase letters

49
Complex Types
  • Complex types address elements that can have
    nested children, sequencing, multiplicity of
    child elements
  • Syntax
  • ltxsdcomplexType nametypeNamegt
  • ltxsdsomeTopLevelModelGroupgt
  • lt!- Sequencing, multiplicity
    constraints,
  • --gt
  • lt/xsdsomeTopLevelModelGroupgt
  • lt! Attribute declarations--gt
  • lt/xscomplexTypegt

50
ComplexType
  • ltxsdcomplexType namepoTypegt
  • ltxsdsequencegt
  • ltxsdelement namebillto
    typeaddressType
  • ltxsdelement nameshipto
    typeaddressType
  • ltxsdelement nameordergt
  • ltxsdcomplexTypegt
  • ltxsdsequencegt
  • ltxsdelement nameitem
  • typeitemType
  • maxOccursunbounded/gt
  • lt/xsdsequencegt
  • ltxsdcomplexTypegt
  • lt/xsdelementgt
  • lt/xsdsequencegt
  • ltxsdattribute namesubmitted
    userequired
  • typexsddate/gt
  • lt/xscomplexTypegt

51
Global and Local Elements and Attributes
  • An element or attribute defined in a complex type
    is local to that definition
  • An element or attribute defined in the top level
    (xsdschema) is global
  • Global elements can be document roots
  • Global attributes can be used on any element in
    the document that allows them

52
Basic Schema Reusability
  • Element References
  • Elements have a name and a type

53
Processing XML
  • Parsing is a process that involves breaking the
    text of an XML document into pieces (start tag,
    end tag, text, PIs,)
  • We can call the pieces tokens
  • Many parsers alread exist for reading valid XML

54
Types of Parsers
  • Pull Parser The application asks the parser to
    give it the next token. It pulls the token
    from the parser
  • Push Parser The parser sends notifications to
    the application about the types of tokens it
    encounters during parsing. Simple API for XML
    (SAX) defines an event-driven

55
Types of Parsers
  • One-step Parser The parser reads the whole
    document and generates a parse tree. XML DOM
    (Document Object Model) describes these types of
    trees
  • Hybrid Parser Combines the other three
    techniques to produce a specialized parser. For
    example, a one-step approach is combined with
    pull parsing

56
Parsing in Java
  • Java API for XML Processing (JAXP) exists to
    instantiate XML parsers using either SAX or DOM
  • JDOM is the Java communitys attempt to develop
    an API that fits Java computational patterns
    better than SAX or DOM. JDOM isnt complete at
    this point

57
Processing Architecture
Character Stream
Application
XML Doc
Standard XML APIs
Serializer
Parser
58
Data-Oriented XML Processing
  • Parsing or generating XML is syntax-oriented
  • Application may want a higher view of the data
    using an operation-centric approach

Syntax oriented APIs
Data Abstraction Layer
Application Logic
59
Invoice Checker
  • Package com.skatestown.invoice
  • Import java.io.
  • /
  • SkatesTown Invoice Checker
  • /
  • Public interface InvoiceChecker
  • Void checkInvoice(InputStream invoiceXML) throws
  • Exception

60
CheckInvoice()
  • Obtain an XML parser
  • Parse the XML from the input stream
  • Initialize a running total
  • Fild all order items, calculate subtotals, add to
    running total
  • Add tax to the total
  • Add shipping and handling
  • Compare running total to invoice total
  • If they are different, throw an exception
  • Otherwise, return

61
Data-Centric Approach
  • Working with XML is reduced to mapping XML to and
    from application data
  • Converting data to XML is called marshalling
  • Converting data from XML is called unmarshalling

62
Schema Compilers
  • Schema compilers are tools that analyze XML
    schema and code-generate marshalling and
    unmarshalling modules
  • Binding Customization can help the Schema
    compiler bind the XML data to specific data
    structures
  • The Java community has developed tools and an API
    for mapping schema to Java data types Java
    Architecture for XML Binding (JAXB)

63
SAX Parsing Architecture
Parse( )
SAXParser Factory
Content Handler
SAXParser
SAX Reader
Error Handler
DTD Handler
XML
Entity Handler
64
SAX Callback Interfaces
  • void startDocument( )
  • void endDocument( )
  • void startElement(String namespaceURI,
  • String qName,
  • Attributes atts)
  • Void characters(char ch, int start, int length)
Write a Comment
User Comments (0)
About PowerShow.com