Title: XML is a set of rules for building markup languages. It is not just glorified HTML or only for the i
1Characterizing XML
- XML is a set of rules for building markup
languages. It is not just glorified HTML or only
for the internet. - XML is a family of technologies that can do
everything from formatting documents to filtering
data. - XML is a philosophy for information handling that
seeks to maximize usefulness and flexibility of
data by refining it to its purest and most
structured form.
2XML A Family of Technologies
- XML to Formatted Presentation (HTML, PDF, user
created format, etc.) - Combining what we wanted to previously keep
separate markup and style. - Why?
- Author can concentrate on meaning
- Designer can concentrate on appearance
- More options for presentation
3XML Application DocBookhttp//www.oasis-open.or
g/docbook/
- DocBook a markup language (DTD) for technical
documentation of computer hardware and software. - Consists of several hundred elements.
- Elements include ltbibliographygt, ltkeywordgt,
ltitemized listgt, ltfiguregt, lttablegt
4DocBook vs. Word Processor
- lttablegt prevents a user from creating an image to
hold data which cannot be searched. - Papers would be easier to write! Each publisher
could still use their own display format
(different CSSs or XML Schemas) and users only
need to worry about content. - Searching for scientific papers on a topic would
be easier and faster. - Search methods not possible now could be done.
- Can search by element instead of just by keyword.
- Can have more complex querying
5XML Applications and Tools
- XML Application a markup language derived from
XML rules (e.g. DocBook) - XML Software applications set of components,
each performing a crucial step on an assembly
line (XML processors) - XML Tools commercially available programs that
help a user work with XML
6XML Processors
- Set of components, each performing a crucial step
on an - assembly line
- 1. Parser translate XML markup and data into
tokens - 2. Event Switcher routes tokens to event
handling routines (CSS) - 3. Tree Representation a tree structure is
built if more complicated processing of an XML
document is needed - a. Simple hierarchy of nodes
- b. Object Model each node represented as an
object - 4. Tree Processor
- -traverses a tree so operations can be done on
the tree model - -validity checker to a full transformation
engine
7XML Software Applications
http//www.garshol.priv.no/download/xmltools/cat_i
x.html
- Editing and Composition
- Electronic Delivery
- Control Information and Development
- Conversion
- Document Storage and Management
- Parsers and Engines
8XML Software Applications
- XML has a lot of different parts and at first
seems to be very complex. - Each part by itself is simple.
- There is no need to be an expert in all of XML to
be able to use it productively.
9XML Software Applications
- Editing and composition (Tools for interactive
creation, modification and composition of XML
documents.) - XML Editors
- Text emacs, vi
- Graphical HTML-Kit, etc.
- Electronic delivery (Tools for electronic
delivery and display of XML documents) - XML Browsers
- Amaya (W3C) - HTML/XHTML browser/editor w/CSS and
XLinks support - InDelv XML Client XSLT style sheets for
display, supports XPath and XPointer - IE5.5 displays XML with CSS or XSLT
- Mozilla(Netscape) displays XML with CSS
- IBM Alphaworks DTD aware
10XML Software Applications
- Control information and development (Tools for
creating, modifying and documenting DTDs, XSL
style sheets etc.) - CSS Editors/DTD Editors (similar to XML editors
some overlap) - DTD Documenters
- LiveDTD - parses XML DTDs and generates
documentation HTML files from the DTDs with
cross-links to element and parameter entity
definitions. - DTD Generators
- Data Descriptors by Example -automatically
generate an XML DTD or schema from a set of
document instances. - Rhythmyx XSpLit - claims to be able to
automatically generate an XML DTD and an XSLT
style sheet from a sample HTML document. The XSLT
style sheet can then be combined with XML
documents that conform to the XML DTD to produce
HTML pages with the same design as the original,
but with new content. - SAXON includes a small application that can
generate DTDs from sample input files - DTD Parsers
- Schema Converters
- DTD2RELAX - converts a DTD into a RELAX schema
module.
11XML Software Applications
- Control information and development (cond)
- XSL Checkers
- XSL Lint checks XSLT style sheets for mistakes
- XSL Trace debugger for XSLT style sheets
- XSL Converters
- XSLT is a subset of the more general XSL, these
programs convert from XSL to XSLT - XSLT Editors
- XSLT Generators
- WH2FO - reads HTML files produced by Microsoft
Word and converts it into an XML document, with
two XSLT style sheets one for conversion back to
HTML and one for conversion to XSL-FO (Extensible
Style Sheet Language for Formatting Objects
another subset of XSL) - Rhythmyx XSpLit
12XML Software Applications
- Conversion (Tools for scripted creation and
modification of XML documents.) - General N-converters convert from non-XML
(usually word processing document) to XML - General S-converters process XML documents
(Transformation) - Publishing Converters
- TeXML - an XML DTD and a Java application called
TeXMLatte. TeXMLatte takes TeXML documents and
converts them to TeX. This can be used with, for
example, an XSL XML-to-TeXML conversion to
produce TeX output from XML source documents.
TeXML can also convert to plain text.
13XML Software Applications
- Document Storage and Management (Tools for
supporting document management, such as document
databases and search engines.) - XML document database systems (systems for
persistently storing XML documents and providing
access to their structure and individual parts.) - Lore - a DBMS built specifically for the XML data
model, complete with query language, query
optimizer, indexing, multi-user support and
recovery. - XML-DBMS - a Java library that can be used to
move data from XML to a relational database and
also back again. - XML search engines
- Xset - an XML search engine oriented towards
performance. It keeps its working set in memory
(using paging to support large documents) and can
be accessed through RMI. The query language is
very simple. - sgrep - a general tool for searching and indexing
text that supports XML (and SGML). It also has
its own very powerful query language.
14XML Software Applications
- Parsers and Engines (XML parsers, parsing
toolkits, HyTime engines and DSSSL engines - DOM implementations set of Java interfaces
declaring methods that the developer should
create - DSSSL engines for formatting SGML docs
- RDF parsers
- SGML/XML parsers
- XLink/XPointer engines
- Jaxon - parse XPath expressions, and evaluate
them against XML tree representation - XML Middleware - General software packages for
making XML-aware applications of some form. - XML Parsers translate XML markup data into
tokens to be processed - XML Validators - Software for validating XML
documents by other means than DTDs. - XSL engines - Engines that support the XSL
formatting objects specification. - XSLT engines - Engines that support the XSL
Transformations specification.
15DTD vs. XML Schema
- DTD drawbacks
- Enforcing an elements range of occurrences
- A fruit_basket can have between 5 and 7 banana
elements - lt!ELEMENT fruit_basket ( (banana, banana,
banana, banana, banana) (banana, banana,
banana, banana, banana, banana) (banana,
banana, banana, banana, banana, banana, banana)gt - Enforcing a numbering scheme on child elements
- A fruit_basket can have 3 bananas each numbered 1
through 3 - Cannot be done with a DTD
- lt!ELEMENT fruit_basket (banana)gtlt!ELEMENT
banana EMPTYgtlt!ATTLIST banana banana_number (1
2 3) "1" gt
16DTD vs. XML Schema
- Enforcing an elements range of occurrences
- Use XML Schema
- ltxsdcomplexType name"fruit_basket"gt
ltxsdelement name"banana" minOccurs"9"
maxOccurs"11"/gtlt/xsdcomplexTypegt
17DTD vs. XML Schema
- Enforcing a numbering scheme on child elements
XSLT style sheet - ltxslstyle sheet xmlnsxsl"http//www.w3.org/1999
/XSL/Transform" xmlns"http//www.w3.org/1999/x
html" version"1.0"gt lt!-- Process
fruit_basket element(s) --gt ltxsltemplate
match"fruit_basket"gt lthtmlgt
ltbodygt lt!-- Validate number of banana
children in fruit_basket --gt
ltxslchoosegt lt!-- Note escaped form
of boolean gt and lt operators --gt
ltxslwhen test"count(banana) gt 8 and
count(banana) lt 12"gt lth3gt of
banana children OKlt/h3gt
lt/xslwhengt ltxslotherwisegt
lth3gtWhoops! of banana children is
ltxslvalue-of select"count(banana)"/gtlt/h3gt
lt/xslotherwisegt
lt/xslchoosegt lt!-- Set up table of info
about banana children --gt lttable
border"1"gt lttrgt
ltthgtbanana lt/thgt
ltthgtbanana_numberlt/thgt
lt/trgt lt!-- Process all banana
children of fruit_basket --gt
ltxslapply-templates select"banana"/gt
lt/tablegt
18DTD vs. XML Schema
- lt/bodygt lt/htmlgt lt/xsltemplategt
- lt!-- Process banana element(s) --gt
ltxsltemplate match"banana"gt lt!-- Each
banana element goes in its own table row --gt
lttrgt ltthgtltxslvalue-of
select"position()"/gtlt/thgt lttdgt
lt!-- Test for banana's position matching
banana_number attribute value--gt
ltxslchoosegt ltxslwhen
test"position() _at_banana_number"gt
OK lt/xslwhengt
ltxslotherwisegt
ltstronggtWhoops!lt/stronggt... ltxslvalue-of
select"_at_banana_number"/gt
lt/xslotherwisegt9557xnbo
lt/xslchoosegt lt/tdgt lt/trgt
lt/xsltemplategtlt/xslstyle sheetgt
19Cascading Style Sheets
- Limitations to CSS
- Its simplicity limits more complex formatting
- Elements are processed in the order of their
appearance - Arithmetic operations on element positions or
values cannot be done - May be replaced by XSL-FO
- More detailed than CSS
- Is an XML application
- Is more closely tied to XMLs nested-container
structure
20XSLT (subset of XSL)
- XSLT (Extensible Style sheet Language for
Transformation) - Allows data in a document to be used later for
applications that do searches, queries and other
sophisticated operations. - Transform a document into something else