Structured-Document Processing Languages Spring 2006 - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Structured-Document Processing Languages Spring 2006

Description:

... web pages by attaching style (fonts, colours, margins, ...) to HTML/XML documents ... A query is a side-effect-free expression ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 29
Provided by: pekkakil
Category:

less

Transcript and Presenter's Notes

Title: Structured-Document Processing Languages Spring 2006


1
Structured-Document Processing Languages Spring
2006
  • Course Review

Repetitio mater studiorum est!
2
Goals of the Course
  • Learn about central models and languages for
  • manipulating
  • representing
  • transforming and
  • querying
  • structured documents (or XML)
  • "Generic XML processing technology"

3
Methodological Goals
  • Some central professional skills
  • consulting of technical specifications
  • experimenting with SW implementations
  • Ability to think?
  • to find out relationships
  • to apply knowledge in new situations
  • ("Pidgin English" for scientific communication)

4
XML?
  • Extensible Markup Language is not a markup
    language!
  • does not fix a tag set nor its semantics (like
    markup languages like HTML do)
  • XML is
  • A way to use markup to represent information
  • A metalanguage
  • supports definition of specific markup languages
    through XML DTDs or Schemas
  • E.g. XHTML a reformulation of HTML using XML

5
XML Encoding of Structure Example
S
E
W
W
A1
world!
Hello
  • ltSgt

ltWgt
ltWgt
lt/Wgt
Hello
world!
6
Basics of XML DTDs
  • A Document Type Declaration provides a grammar
    (document type definition, DTD) for a class of
    documents
  • Syntax (in the prolog of a document
    instance) lt!DOCTYPE rootElemType SYSTEM
    "ex.dtd" lt!-- "external subset" in file ex.dtd
    --gt lt!-- "internal subset" may come here --gt
    gt
  • DTD is the union of the external and internal
    subset

7
How do Declarations Look Like?
  • lt!ELEMENT invoice (client, item)gt
  • lt!ATTLIST invoice num NMTOKEN REQUIREDgt
  • lt!ELEMENT client (name, email?)gt
  • lt!ATTLIST client num NMTOKEN REQUIREDgt
  • lt!ELEMENT name (PCDATA)gt
  • lt!ELEMENT email (PCDATA)gt
  • lt!ELEMENT item (PCDATA)gt
  • lt!ATTLIST item
  • price NMTOKEN REQUIRED
  • unit (FIM EUR) EUR gt

8
Element type declarations
  • The general form is lt!ELEMENT elementTypeName
    (E)gtwhere E is a content model
  • regular expression of element names
  • Content model operators E F alternation E,
    F concatenation E? optional E zero or
    more E one or more (E) grouping

9
XML Schema Definition Language
  • XML syntax
  • schema documents easier to manipulate by programs
    (than the special DTD syntax)
  • Compatibility with namespaces
  • can validate documents using declarations from
    multiple sources
  • Content datatypes
  • 44 built-in datatypes (including primitive Java
    datatypes, datatypes of SQL, and XML attribute
    types)
  • mechanisms to derive user-defined datatypes

10
XML Namespaces
  • ltxslstylesheet version"1.0" xmlnsxsl"http//ww
    w.w3.org/1999/XSL/Transform" xmlns"http//www.w3.
    org/TR/xhtml1/strict"gtlt!-- XHTML is the
    default namespace --gt ltxsltemplate
    match"doc/title"gt
  • ltH1gt
  • ltxslapply-templates /gt
  • lt/H1gt
  • lt/xsltemplategt
  • lt/xslstylesheetgt

11
3. XML Processor APIs
  • How can applications manipulate structured
    documents?
  • An overview of document parser interfaces
  • 3.1 SAX an event-based interface
  • 3.2 DOM an object-based interface
  • 3.3 JAXP Java API for XML Processing

12
A SAX-based application
Application Main Routine
Parse()
startDocument()
Callback Routines
startElement()
characters()
endElement()
ltA i"1"gt
lt/Agt
Hi!
13
DOM What is it?
  • An object-based, language-neutral API for XML and
    HTML documents
  • Allows programs and scripts to build, navigate,
    and modify documents
  • In contrast to Serial Access XML could think as
    Directly Obtainable in Memory

14
ltinvoice form"00"
type"estimated"gt ltaddressdatagt ltnamegtJohn
Doelt/namegt ltaddressgt
ltstreetaddressgtPyynpolku 1
lt/streetaddressgt ltpostofficegt70460 KUOPIO
lt/postofficegt lt/addressgt
lt/addressdatagt ...
DOM structure model

form"00" type"estimated"
invoice
...
addressdata
address
name
Document
streetaddress
postoffice
John Doe
Element
70460 KUOPIO
Pyynpolku 1
Text
NamedNodeMap
15
Overview of XSLT Transformation
16
JAXP (Java API for XML Processing)
  • An interface for plugging-in and using XML
    processors in Java applications
  • includes packages
  • org.xml.sax SAX 2.0 interface
  • org.w3c.dom DOM Level 2 interface
  • javax.xml.parsers initialization and use of
    parsers
  • javax.xml.transform initialization and use of
    transformers (XSLT processors)
  • Included in JDK starting from vers. 1.4

17
JAXP Using a SAX parser (1)
.newSAXParser()
XML
f.xml
18
JAXP Using a DOM parser (1)
.newDocumentBuilder()
f.xml
19
JAXP Using Transformers (1)
.newTransformer()
XSLT
20
CSS - Cascading Style Sheets
  • A stylesheet language
  • mainly to specify the representation of web pages
    by attaching style (fonts, colours, margins, )
    to HTML/XML documents
  • Example style ruleH1 color blue
    font-weight bold

21
CSS Processing Model (simplified)
  • 0. Parse the document into a tree
  • 1. Match style rules to elements of the tree
  • annotate each element with a value assigned for
    each relevant property
  • inheritance and, in case of competing rules,
    elaborate "cascade" rules applied to select which
    value is assigned
  • 2. Generate a formatting structure of the
    annotated document tree
  • consists of nested rectangular boxes
  • 3. Render the formatting structure
  • display, print, audio-synthesize, ...

22
XSL Transformation Formatting
XSLT script
I
II
23
Page regions
  • A simple page can contain 1-5 regions, specified
    by child elements of the simple-page-master

24
Top-level formatting objects
  • Slightly simplified

foroot
folayout-master-set
fopage-sequence
foflow
(fosimple-page-master fopage-sequence-master)
foregion-body
foregion-after?
foregion- start?
foregion-before?
foregion- end?
25
XQuery in a Nutshell
  • Functional expression language
  • A query is a side-effect-free expression
  • Strongly-typed (XML Schema) types may be
    assigned to expressions statically, and results
    can be validated
  • Extends XPath 2.0 (but not all axes required)
  • common for XQuery 1.0 and XPath 2.0
  • Functions and Operators, W3C Cand. Rec. 11/2005
  • Roughly XQuery ? XPath' XSLT' SQL'

26
FLWOR ("flower") Expressions
  • Constructed from for, let, where, order by and
    return clauses (SQL select-from-where)
  • Form (ForClause LetClause) WhereClause?
    OrderByClause? "return" Expr
  • FLWOR binds variables to values, and uses these
    bindings to construct a result (an ordered
    sequence of nodes)

27
XQuery Example
for pn in distinct-values( doc(sp.xml)//pno)
let spdoc(sp.xml)//sp_tuplepnopn where
count(sp) gt 3 order by pn return
ltwell_supplied_itemgt ltpnogtpnlt/pnogt
ltavgpricegt avg(sp/price) lt/avgpricegt
ltwell_supplied_itemgt
28
Course Main Message
  • XML is a universal way to represent information
    as tree-like data structures
  • There are specialized and powerful technologies
    for processing it
  • Worst hype has settled
  • Lots of RD activities going on
Write a Comment
User Comments (0)
About PowerShow.com