XPipe - An XML Processing Methodology - PowerPoint PPT Presentation

About This Presentation
Title:

XPipe - An XML Processing Methodology

Description:

XML SIG, NY USA. Feb 12, 2002. Sean McGrath. CTO ... XML SIG NY, Sean McGrath http://www.propylon.com ... NY, 2002. Lunch is a complex, hierarchical structure ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: XPipe - An XML Processing Methodology


1
XPipe - An XML Processing Methodology
  • XML SIG, NY USA
  • Feb 12, 2002
  • Sean McGrath
  • CTO
  • Propylon

2
What is XPipe?
  • It is an architecture / methodology /framework
    for developing robust, scaleable, manageable XML
    processing systems.
  • based on proven mechanical manufacturing
    techniques. Specifically
  • The Assembly Line Principle
  • Component assembly and component re-use

3
What is XPipe?
  • An open source project hosted on Sourceforge
  • http//xpipe.sourceforge.net
  • A contribution to the blossoming meme of using
    pipeline based processing to tame the burgeoning
    complexity of XML transformations
  • (If you do not find XML transformation
    complicated, you are not sufficiently well
    informed.)
  • (And no, XSLT does not solve all your problems)

4
What is XPipe?
  • A way of thinking about systems that focuses on
    structured dataflows rather than Object APIs
  • It is also
  • A Scandinavian sewage treatment technology
  • An exhaust pipe system for high performance
    engines
  • A VT100 based strategy game for DECs VAX/VMS
    Operating System

5
Contents of this talk
  • The XPipe philosophy
  • Major functional elements
  • Some examples
  • The XGrid and Commoditized XML Processing
  • Some anticipated objections (and answers)
  • Relationship to other technologies

6
Contents of this talk
  • Current status
  • Current problems
  • Future plans
  • Some (contentious) musings
  • Something cold to drink

7
XPipe Philosophy
  • XML is all about (potentially) complex,
    hierarchical data structures

8
XPipe Philosophy
Cars are complex, hierarchical structures
Henry Fords Model T Ford Assembly Line 1914
9
XPipe Philosophy
Lunch is a complex, hierarchical structure
Lunch Assembly Line. NY, 2002
10
XPipe Philosophy
We are complex, hierarchical structures
11
XPipe philosophy
  • What have these scenes got it common?
  • Complex construction of cars, tuna melts and
    tendons made possible and efficient through
  • assembly line manufacturing
  • re-usable component processes and component
    materials
  • Why not apply this approach to XML
    manufacturing?

12
XPipe philosophy
  • Why does the assembly line approach work?
  • Transformation task decomposition
  • Re-usable transformation components
  • Transformation decomposition is the key to
    complexity management. Just ask
  • Henry Ford
  • Herbert Simon (The Two Watchmakers The
    Architecture of Complexity)
  • George Miller (7/-2)
  • Adam Smith (An Inquiry into the Nature And Causes
    of the Wealth of Nations,1776)
  • Any electrical or chemical engineer.

13
XPipe philosophy
  • Component re-use is the key to productivity
  • Ask any form of engineer (electrical, chemical
    etc.) apart from software engineers
  • Component re-use remains a holy grail in software
    engineering
  • XPipe is yet another attempt

14
XPipe philosophy
  • A lot of data processing for the forseable future
    will consist of XML to XML transformation
  • A lot of non-XML data processing can consist of
    XML to XML transformations with the addition of
    top and tail transformations
  • Mantra
  • Get data into XML as quickly as possible
  • Keep it in XML until the last possible minute
  • Bring all your XML tools to bear on solving the
    data processing problem

15
XPipe philosophy
Input XML
Output XML
Top Transformation
Tail Transformation
Non-XML Input
Non-XML Output
16
XPipe philosophy
  • The philosophy hinges on the fact that every
    complex XML transformation can be broken down
    into a series of smaller ones than can be chained
    together

17
XPipe philosophy
  • Only so many ways to re-arrange an XML tree
    structure
  • A finite number of fundamental transformations,
    from which all higher order transformations can
    be derived

18
XPipe philosophy
  • Transformation Decomposition leads to
  • a series of small, manageable, stand alone
    problems with an XML input spec and an XML
    output spec.
  • Can build, test, use and then re-use these
    transformation components
  • Very team development friendly
  • High cohesion, loose coupling just like the
    professor advised

19
XPipe philosophy
  • Pipeline approach means you can mix nmatch
    black-box components that internally use whatever
    paradigm best suited the problem
  • Lexical
  • SAX
  • DOM
  • XSLT
  • XDuce, Pyxie, Haskell, AF-NG

20
Sample XPipe
DB /CMS
Character Set Mods
Add Doctype validate strip doctype
Lexical
Re-arrange Elements
Validation
Lexical
DOM
Stats FTP
Schematron/ RelaxNG/ Rhino
SQL Replace
Jython
XHTML Generate
Java
XSLT
21
XPipe philosophy
  • Assertion developers would use a component
    based approach to XML processing if they did not
    have to write the plumbing (orchestration,
    exception handling) themselves
  • Gee, this problem is complex. Maybe Ill do it
    in multiple stages! Gee, now I have to
    orchestrate the stages somehow. Batch files/shell
    scripts/driver program all ugly and error
    prone. Maybe Ill just write a single program
    after all

22
XPipe philosophy
  • Professional developers spend 50 percent of
    their time writing plumbing Adam Bosworth
  • XPipe aims to look after the plumbing letting
    developers concentrate on the interesting stuff

23
Philosophy Summary
  • Preambles
  • Make things as complex as necessary but not more
    complex than necessary
  • Solve all the worlds problems but only one at a
    time
  • Dont even think about performance until it is
    too late then it will look after itself
  • Only increase complexity linearly w.r.t.
    functionality and only in elevator pitch sized
    functionality quanta

24
Philosophy Summary 12
  • Data processing data transformation w.r.t.
    time.
  • XML is the current runaway winner in the
    self-descriptive data stakes and a very good QDDL
    (Quiescent Data Description Language)

25
Philosophy summary 22
  • Inside every complex XML transformation is a
    sequence of simpler XML transformations trying to
    get out a Pipe
  • Decomposed transformation new transformations
    already componentized transformations -gt
    Component Reuse
  • Inside every graph transformation (read
    workflow or business process model) is a
    combination of simple Pipes trying to get out

26
XPipe Philosophy
Leveled architetecture levels build on one
another but any level is usable independently of
higher levels
Out
Level 2 - XRigs
In
Out
Level 1 - XPipes
In
Out
Level 0 - XComponents
In
Out
27
Major Functional Elements XComponents
In
Out
  • Developed in any language that runs on the Java
    Virtual Machine (Jython, Java, XSLT, Rhino
    (JavaScript) etc.)
  • All XComponents are standalone programs of the
    form
  • Name InputXML OutputXML ErrorXML
    Optional Args

28
Major Functional Elements - XComponents
  • XComponents described in XML form. An XComponent
    consists of
  • Metadata (keywords etc.)
  • Documentation
  • Pre and Post Conditions
  • Unit Tests (input,output XML stream pairs
    Pre/Post Conditions)
  • Code (Java / Jython / XSLT / Exec)

29
Major Functional Elements XPipes
In
Out
  • A linear assembly of XComponents that together
    achieve some useful transformation function
  • Described in XML
  • Documentation
  • Metadata (keywords etc.)
  • Pre/Post conditions
  • Unit Tests (input,output XML stream pairs
    Pre/Post Conditions)
  • References to XComponents (URIs) which are
    resolved when the XPipe is installed/executed

30
Major Functional Elements XRigs
Out
In
In
Out
  • An assembly of XPipes that together achieve some
    useful transformation function
  • Described in XML
  • Documentation
  • Metadata (keywords etc.)
  • Pre/Post conditions
  • Unit Tests (input,output XML stream pairs
    Pre/Post Conditions)
  • References to XPipes (URIs) which are resolved
    when the XRig is installed/executed

31
Major Functional Elements
  • Unit Testers
  • XComponent, XPipe and XRig level Test Harnesses
  • Executives
  • XComponent, XPipe and XRig level Execution
    Environments (on-the-fly, disk install, compiled,
    web service)
  • (Executing an Xcomponent is identical to
    executing an XPipe of arity 1, is identical to
    executing an XRig of arity 1)

32
Major Functional Elements
  • Executives
  • Uniprocessor Execution
  • Executed on 1 CPU, possibly with separate threads
    for each instantiated X
  • Multiprocessor Execution (Vapor)
  • XML based protocol to implement Job Shop work
    distribution over a P2P network (XJCL)

33
Major Functional Elements XPipe Monitor (Vapor)
34
Major Functionality Elements Miscellany (Vapor)
  • Whizzy GUI Component and Pipe Editors
  • XComponent Creators
  • Wrap Java, XSLT etc. into XComponent compliant
    XML, Ant build target
  • XComponent Proxies pretend to be a simple
    XComponent but invoke some external functionality
    from Windows DLL to SOAP end-point
  • XPipe masquerading as XComponent this could be
    a very powerful paradigm

35
Major Functionality Elements Miscellany (Vapor)
  • Compilers / Packers
  • Pack XPipes/XRigs into standalone XPipes/XRigs
    for distribution (with or without an executive)
  • Compile pure XSLT XPipe into a self contained
    translet (self contained or as an XComponent)
  • Compile away/optimize intermediate files via a
    variety of tricks (Jackson Inversion, Java IO
    hook, shadow marshalling etc.)

36
Simple XComponent examples
  • Fundamental Operation Rename Element
  • Rename
  • Input ltfoogtbazlt/foogt
  • Output ltbargtbazlt/bargt

foo
bar
baz
baz
37
Simple XComponent examples
  • Fundamental Operation - Peel
  • Input ltfoogtltbargtbazlt/bargtlt/foogt
  • Output ltfoogtbazlt/foogt

foo
foo
bar
baz
baz
38
Simple XComponent examples
  • Compound Operation - Matryoshka
  • Input
  • ltfoogtltbargtbazlt/bargtlt/foogt
  • Output
  • ltfoogtlt/foogtltbargtlt/bargtbaz

foo
bar
foo
bar
baz
baz
39
Simple XComponent examples
  • KlingonCloak
  • Input
  • ltfoogtltbargtbazlt/bargtlt/foogt
  • Output
  • lttag namefoogtlttag namebargtbazlt/taggtlt/taggt

foo
tag typefoo
bar
tag typebar
baz
baz
40
Sample XComponents
  • Once you start thinking in terms of Pipes
    components appear everywhere
  • Regular fragmentations
  • Doctype changer
  • Namespace normalizer
  • Character set transcoder
  • Hash generator
  • Architectural Forms
  • RelaxNG/Schematron etc
  • A validator can be thought of as a component in
    an XPipe that mirrors its input on its output

41
Sample XComponents
  • Reading a file is an XML to XML transformation
  • ltfilegtlewisscarrol.xmllt/filegt
  • ltpoemgtltlinegtTwas brillig, and the slithy tomes,
    did gyre and gimbal in the wavelt/linegtlt/poemgt

42
Sample XComponents
  • Arithmetic is an XML to XML transformation
  • ltexprgt1 2lt/exprgt
  • ltresgt3lt/resgt

43
Sample XComponents
  • Unix pipe utilities e.g. tr
  • hello world
  • HELLO WORLD

44
Sample XComponents
  • Conditionals are XML to XML transformation tee
    junctions triggered by XPaths

if XPath TRUE branch
In
if XPath
if XPath FALSE branch
45
Validation as an XComponent
XML A
XML A
RelaxNG Schematron Jython/Java/JACL XComponent
Input
Output
Validation Log
Error
46
Some related open technologies
  • - Unix Pipes
  • SAX Filters
  • TRAX
  • XBeans
  • Cocoon
  • axKit
  • Ant
  • JXTA
  • Translets
  • TupleSpaces

47
The XGrid
  • Grid Technologies computational power on tap
    (http//www.gridforum.org)
  • The XGrid computational power on tap to
    execute XPipes/XRigs

48
The XGrid
Out
In
Out
DMZ
49
Some objections (with some answers)
  • It will be slow
  • No it wont - Premature optimization is the root
    of all evil!
  • Speed is a three headed monster. Im old enough
    to have left the X axis and currently heading for
    Y through Z

The 3 Axes to Speed
50
Some objections (with some answers)
  • It will be slow (cont.)
  • Massive Parallelism will kill all von Neumann
    throughput arguments
  • Documents per second, not seconds per document
    throughput is the true measure of XML processing
    speed
  • Document fulcra Locality of reference (Denning)
    applies to XML processing (more on this later)
  • A myriad of compile time optimizations on
    XPipes possible
  • Keep the architecture simple and speed will
    sort itself out

51
Some objections (with some answers)
  • Component based software? Harumph! We have heard
    that one before
  • XPipe is data flow based not API based (COM, VBX,
    CORBA). They payload is what is important not
    the plumbing
  • Information integration (needed on the server
    side) not application integration (needed on the
    client side)

52
Document fulcra and the scatter/gather pattern
  • For any given task t to be performed on documents
    conforming to schema s, there is a fragment
    expression that can be used to chop any document
    into n pieces on which t can be performed
    independently
  • These points are called fulcra and are a function
    of (t,s)

53
Document fulcra and scatter/gather pattern
  • Having identified the fulcra-
  • Chop the input document into fragments scatter
    phase
  • Perform t
  • Join all the processed fragments together to
    constitute the output document gather phase
  • Three stage XPipe scatter gather are (or more
    accurately soon will be) standard XPipe components

54
Document Fulcra
Input Doc
Scatter
n fragments
TIME
Invoke t
t
t
t
t
t
n fragments
Gather
Output Doc
55
Document Fulcra
  • For data-oriented XML, the fulcra often coincide
    with the record iteration in the XML schema and
    may be independent of t.
  • For document-oriented XML, the fulcra are much
    more dependent on t.
  • ltColloquialgtA good fulcra based scatter/gather
    will make performance head north faster, cheaper
    and with a high upper limit than any amount of
    hand-crafted, genius level XML coding of your
    transformations.lt/Colloquialgt

56
The XSLT/DOM -gt SAX non-sequiter
  • XSLT and DOM are memory bound trade off between
    ease of use and resource usage ease of use
    favoured
  • SAX is not memory bound trade off between ease
    of use and resource usage low resource usage
    favoured
  • On xml-dev users often advised to rewrite their
    apps using SAX! Ugh!

57
XSLT/DOM -gt XPipe
  • XPipe and scatter/gather allow you to keep the
    ease of use of XSLT/DOM with the finite resource
    utilization of SAX
  • As long as you can identify a good fulcrum
    function
  • They exist more often than not
  • If they exist, they are very easily found

58
Current status
  • The philosophy is known to work
  • Seven years agrowing in consulting company (IDM
    1995, Digitome)
  • Uniprocessor XPipe used to develop
  • 80-C pipe from Hub notation for a complex
    document type to a legacy mainframe display
    notation. 120 page spec.
  • 20-C pipe for semantic validation of legislation
    documents

59
Current Status
  • Version 0.6
  • Schemas for XPipes and XComponents on
    xpipe.sourceforge.net. feedback required
  • Sample components (Java/XSLT/Jython) and some
    documentation
  • Simple, illustrative XComponent and XPipe
    uniprocessor executive

60
Current Status
  • Object model for XCompontents in Jython Java
    (David Starr)
  • Object model for Xpipes in Jython
  • Execution, testing utilities in Jython
  • Start of a NetBeans based XComponent editor

61
Current Status
  • Uniprocessor XPipe used to develop
  • 80-C pipe from Hub notation for a complex
    document type to a legacy mainframe display
    notation. 120 page spec.
  • 20-C pipe for semantic validation of legislation
    documents
  • Xpipe and XComponent validators

62
Current Status
  • Some aspects of the XComponent model need testing
  • Parameters
  • Exec XComponents
  • Pre/Post condition checking
  • This will be a point release in late Feb. Then
    focus on developing the XComponent repository in
    parallel with core dev.
  • Scatter/Gather raises some interesting scheduling
    issues currently being grappled with
  • Balance between developer-hit and ease of
    execution current in favour of low developer-hit

63
Current Problems
  • No GUI stuff and not enough documentation?
  • Everybody agrees that an XML document is a tree
    but
  • The content and structure of the tree depends on
    the parser
  • The content and structure of re-generated XML
    (The round-tripping problem)
  • Roll on XML-SW!

64
Current Problems
  • Naming things
  • Taxonomy of XTLs (XML Transformation Languages)
  • Taxonomy of re-usable XComponents and XPipes

65
Current Problems
  • Flexible transformation scheduling is hard
  • Optimal transformation scheduling is very hard
  • Calling all process engineers help!

66
Future Plans
  • Evangelize the idea that DTD validated XML 1.0 is
    just Well Formed XML that has been through a pipe
    consisting of
  • A transclusion component (entity expansion)
  • A macro pre-processor (conditional marked
    sections)
  • An attribute decorator (implied/fixed attributes)
  • A grammar checker

67
Valid XML
Well Formed XML
Paremeter Entity Expansion
Conditional Sections
General Entity Expansion
Attribute Decoration
Grammer Validation
Valid XML
68
Future plans
  • When DOCTYPE goes away (which it will), provide
    all DTD functionality as a set of XComponents)

69
Future Plans
  • Getting to the point where we can grow the
    XComponent repository is priority 1
  • XRigs, XPipes, and XComponents as web services
    (SOAP/XML-RPC, WSDL, UDDI etc.)
  • Getting the P2P and Grid Technology communities
    input into XGrid/XJCL
  • See if a P2P execution environment for
    XRigs/XPipes can be shortcircuited e.g. JXTA
  • Getting help to develop the XPipe reference
    implementation on Sourceforge

70
Future Plans
  • Development of commercial implementations of
    XPipe integrated with leading EAI systems
    (Ongoing)
  • Use of SCADA tools to develop XPipe process
    control and monitoring systems
  • Use of UML tools to create XPipes and XRigs using
    state transition diagrams

71
Future Plans
  • Use of Animation Engineering techniques for CAXTE
    tools (Computer Aided XML Transformation
    Engineering)
  • Digging around swarm intelligence, hierarchy
    theory, complexity theory, self-assembly,
    bio-informatics and nanofabrication for concepts
    and tools applicable to XML transformations

72
In conclusion
  • XPipe is simple
  • Simplicity works!
  • Plenty of evidence outside of XML engineering
    that this approach will work
  • Plenty of lore and tools from other fields of
    science can be brought to bear to build systems
    using the XPipe approach

73
Musings 1 - Debugging
  • XPipe is very debugging friendly
  • log2(N) time required for fault diagnosis
  • Probes in the form of loggers, RelaxNG
    validators, easily plug-inable to a pipe to watch
    what is going on.
  • Pre/Post condition on/off switch is a useful
    design by contract debugger
  • Unit testing at Rig, Pipe and Component level
    allows layer at a time re-assembly after a fault
    has been fixed.

74
Musings 2 Inbetweening and XComponent
development
  • Transformation analysts spec the transformation
  • Only need to code new components
  • Spec XComponent or XPipe with doc, pre/post
    etc. but no code
  • Built in JIT-style acceptance test
  • Outsource friendly and third-party market friendly

75
Musing 3 - Web Services
  • First generation will be a total blind alley
    RPC
  • Document Oriented Messaging not Object Oriented
    Messaging the next stage in encapsulation and
    loose coupling something like XPipe will be a
    pre-requisite.

76
Musing 4 Parametric Typing of XComponents
  • Numerous XComponents that do the same thing, not
    necessarily duplication
  • Space
  • Time
  • Infoset considerations

77
Musing 5 Pre-validation Transformation
  • Killing ourselves seeking one-shot expressivity
    in schema validation languages
  • Many complex validations become a lot simpler if
    you do some transformation(s) first
  • Co-occurrence constraints
  • Contextual constraints
  • Clear analog with formatting (pre-flow
    transformation(s) flow)

78
Musing 6 location, location, location
  • Abstraction 1 keep code and data on the same
    high-speed bus monolithic systems
  • Abstraction 2 allow code to be downloaded from
    the Web sandbox required owing to security
    issues
  • Abstraction 3 leave the code out there and
    move the data bandwidth issues and data gtgt code

79
Musing 6 location, location, location
  • Monolithic bad (have to install stuff which
    is very 20th century)
  • Sandbox bad (the better the sandbox the less
    useful the code running in it.)
  • XGrid Design as if data pulled by the code
    (easy model) but DMZ the code data the only
    thing the flows over the firewall is the
    transformed data

80
Thank you
  • http//xpipe.sourceforge.net
Write a Comment
User Comments (0)
About PowerShow.com