XML Data Binding: Encoding for High-Performance Content-Based Event Routing - PowerPoint PPT Presentation

About This Presentation
Title:

XML Data Binding: Encoding for High-Performance Content-Based Event Routing

Description:

'PSL conducts research on Web technologies, collaborative work, virtual ... DISCUS: Decentralized Information Spaces for Composition and Unification of Services ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 20
Provided by: philip60
Category:

less

Transcript and Presenter's Notes

Title: XML Data Binding: Encoding for High-Performance Content-Based Event Routing


1
XML Data BindingEncoding for High-Performance
Content-Based Event Routing
  • Gail Kaiser
  • Phil Gross
  • Columbia University
  • Programming Systems Lab

2
Overview
  • PSL Intro
  • MEET Project
  • Encoding Conversion Efficiency
  • Encoding Size Efficiency
  • Encoding Classification Efficiency

3
Programming Systems Lab
  • PSL conducts research on Web technologies,
    collaborative work, virtual worlds,
    process/workflow, extended transaction models,
    software development environments and tools,
    software engineering, information management, and
    distributed programming systems
  • Lately, lots of XML stuff

4
PSL XML-related Research
  • FlexML Flexible XML
  • Open-ended XML streams that may include new
    tags
  • Dynamic schema and semantics discovery and
    composition
  • XUES XML-based Universal Event Service
  • Event Packager Data mining over XML structured
    data
  • Event Distiller XML event poset pattern matching
  • Learning new application-domain events to
    recognize
  • DISCUS Decentralized Information Spaces for
    Composition and Unification of Services
  • Rapid and secure application composition using
    Web Services
  • Trust Evolution PGP Trust KeyNote real-world
    business

5
MEET
  • Multiply Extensible Event Transport
  • Content-based multicast routing
  • Must be efficient enough for embedded and
    high-performance applications

6
MEET Motivations
  • Personal Life Recorder (sensor oriented)
  • GroupWork Recorder (computer/DB oriented)
  • Parallel/Grid computing
  • Distributed simulation
  • Battlefield C4I
  • Last, but not least
  • Dissertation submission

7
Relationship to Other Work
  • Generally modeling communication like
  • What actually goes over the line is afterthought
  • But with N-Way Internet-scale communication
  • Millions of publishers and subscribers
  • We can (must!) do better than ASCII text
  • Line speed gt 250 assembly instructions per
    packet

8
MEET Extensibility
  • Want to scale up, to millions of pubs and subs
  • Want to scale down, to embedded and wireless
  • No single solution satisfactory at all scales
  • Composed of hot-swappable subsystems
  • Router, transports, clock/causality, types, etc.

9
Why Types
  • Event data is not just an opaque bag of bits
  • Subscriptions are Boolean functions over events
  • Type safety would be nice
  • What type system to use?

10
Initial MEET Type Design
  • Initial design calls for supporting Java, C, and
    XML Schema defined objects out of the box
  • XML Schema used as Ur-language/Esperanto for
    conversions
  • Subscriptions are arbitrary boolean functions on
    datatypes
  • XML Schema is not ideal ur-type
  • Excessively complex, verbose, etc.

11
Encodings for Efficiency
  • Java, C, XML, ASN.1 have well-defined but
    proprietary encodings for instances
  • Would be nice to have an independent encoding
    scheme with some desirable properties missing
    from the above
  • Fast serialization/deserialization
  • Elimination of redundant information from message
    sequences
  • Data organized for rapid classification/routing

12
Conversion Efficiency
  • Need to get to and from wire format as fast as
    possible
  • Leverage homogeneity to eliminate unnecessary
    conversions, e.g., network byte order
  • ECho system from Eisenhauer et. al., Georgia Tech
  • Using native data for ultra-low latency
  • Necessary for HPC

13
Size Efficiency
  • Ideal for single message is self-describing data
  • With multiple messages of same type, one can pull
    out redundant type info, e.g., schema
  • Goal is to go further If 90 of content of
    messages is the same, generate a new subtype with
    fixed values
  • From self-describing to all-schema is a continuum

14
Classification Efficiency
  • When bits start arriving serially at the router,
    would like to begin cut-through routing as soon
    as possible
  • Avoid the curse of IP/IPv6 source address first
  • Want key routing bits as close to the front as
    possible
  • Want data in fixed locations

15
Fast Classifying First Things First
  • In the packet, type info first (after magic)
  • Would like to represent type codes as bit string
    with most significant info e.g. parent type
    first, followed by subtype identifier,
    sub-subtype, etc.
  • Need access to type hierarchy
  • Popular classification fields at the front
  • Need to tag with popularity metadata
  • subscribers will want to select on me

16
Fast Classifying Fixed Positions
  • Would like to avoid scanning through long or
    variable-length fields
  • Long/Variable data needs to be in a separate
    channel/section
  • Primitives and fixed-length references at the
    front
  • References point into data section
  • Classifier can jump large, uninteresting data
    quickly

17
Plus Schema Format
  • Wed like the schema format to be amenable to
    programmatic manipulation and analysis
  • For instance, when negotiating formats, wed like
    to be able to compute how our original format
    offer differs from the counter-offer
  • XML Schema is pretty good for this

18
Conclusions
  • Efficient instance transfer is an interesting
    case for data-binding
  • Special needs for efficiency
  • But we can negotiate our own format among the
    communicating parties
  • Some explicit support for this in a general
    data-binding solution could help acceptance

19
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com