OneSAF: XML Performance in Simulation 03SSIW030 - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

OneSAF: XML Performance in Simulation 03SSIW030

Description:

Manipulating XML requires CPU resources, memory usage, and may be network intensive ... Depending on the implementation, XML can be CPU, memory, and network intensive ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 27
Provided by: Caro423
Category:

less

Transcript and Presenter's Notes

Title: OneSAF: XML Performance in Simulation 03SSIW030


1
OneSAF XML Performance in Simulation03S-SIW-030
  • Boaventura (Ben) DaCosta
  • Robin Outar

2
Agenda
  • OneSAF Overview
  • XML Overview
  • Document Object Model (DOM) Overview
  • Simple API for XML (SAX) Overview
  • XML Performance Measures
  • XML Parsing
  • Validation
  • Namespaces
  • Performance Benchmarks
  • XML Encoding
  • OneSAF XML Performance Tips
  • Future Efforts

3
OneSAF Overview
A composable, next generation Computer-Generated
Forces (CGF) that can represent a full range of
operations, systems, and control process (TTP)
from entity up to brigade level, with variable
level of fidelity that supports multiple Army
Modeling and Simulation (MS) domains (ACR, RDA,
TEMO) applications
Software only
Automated Composable Extensible Interoperable
Platform Independent
Fielded to National Guard Armories RDECs /
Battle Labs Reserve Training Centers All Active
Duty Brigades and Battalions
Designed to eventually replace legacy entity
based Simulations BBS - ModSAF - JANUS - CCTT
SAF AVCATT SAF
4
eXtensible Markup Language Overview
XML is a meta-language, not a programming language
XML is a family of technologies which includes
XML, XML Schema, XSL/T, XPATH
XML helps define rules (grammar) for designing
structured data
XML is a language for describing languages
XML was created by the World Wide Web Consortium
(W3C) and released in 1998
Think of structured data as spreadsheets, address
books, configuration files, etc.
5
eXtensible Markup Language Overview
lttablegt lttrgtlttdgtEMPNOlt/tdgtlttdgtEMPNAMElt/tdgtlt/tr
gt lttrgtlttdgt123456lt/tdgtlttdgtJohn Adamslt/tdgtlt/trgt
lt/tablegt
XML looks like HTML, but it isnt
HTML specifies what each tag means and how the
text will be formated
HTML
HTML tags elements for presentation
XML uses tags to delimited data and what the tags
mean is left to the client
lt?xml version1.0?gt ltEMPLOYEEgt ltEMPNOgt123456
lt/EMPNOgt ltEMPNAMEgtJohn Adamslt/EMPNAMEgt lt/EMPLOY
EEgt
XML tags elements as data
XML is a vehicle for sharing and interchanging
structured data
XML
6
Document Object Model (DOM) Overview
Documents are typically logically structured in
memory as a hierarchical tree
  • A DOM is a platform-independent and
    language-neutral API, which allows applications
    to dynamically
  • read, (2) manipulate, and (3) write
  • the content, structure, and style of both HTML
    and XML documents

7
Simple API for XML (SAX) Overview
SAX is considered by many as the de facto
standard
SAX supports only reading
StartDocument() startElement(X)
startElement(Y1)
startElement(Z)
characters(foo")
endElement(Z) endElement(Y1)
endElement(X) endDocument()
ltXgt ltY1gt ltZgt foo
lt/Zgt lt/Y1gt lt/Xgt
ltXgt ltY1gt ltZgt foo lt/Zgt lt/Y1gt lt/Xgt
XML Document
SAX Callbacks
SAX is supported in a number of
lanugages Microsoft's MSXML 3.0, Pascal, SAX in
C, Xerces-C, and others
SAX is faster than DOM
8
XML Performance Measures
  • OneSAF XML Performance has been measured in two
    ways
  • The amount of memory and execution speed in the
    parsing and/or translation of XML documents
  • A number of factors may affect these measures
  • Physical memory constraints (RAM)
  • Parsers used
  • Validation of XML documents
  • Use of namespaces
  • XML encoding

9
XML Parsing
  • XML documents must be parsed in order to access
    the data stored in them
  • Parsing and validation of XML data on OneSAF is
    currently being achieved using Xerces
  • Xerces is an XML parser that complies with XML
    Schema and provides support for XML validation
    and eXtensible Stylesheet Language
    Transformations (XSLT)
  • Biggest problem with using DOM is memory
    limitations
  • A one megabyte XML document can use as much as
    ten megabytes of RAM
  • Biggest problem with using SAX is its read only
  • OneSAF uses both a combination of DOM and SAX

10
XML Validation
  • Validation ensures that data content conforms to
    the grammar and structure as defined by the DTD
    or XML Schema that it references
  • Validation is an important part of the OneSAF
    data architecture in that it provides a level of
    confirmation and verification that the data
    stored in any one XML document conforms to the
    grammar and structure, which defines it
  • OneSAF currently employs the use of Xerces 1.4.3
    (with current efforts moving towards Xerces 2) to
    validate all XML content both against XML Schema
    and DTDs.
  • Use W3C XML Schema Recommendation 1.0
  • DTDs only supported from Legacy Systems

11
XML Namespaces
  • Namespaces allow documents to use multiple markup
    vocabularies from external sources through URI
    references
  • Namespaces promote reuse of markup instead of
    re-inventing it

12
XML Validation and Namespace Performance
  • OneSAF has examined XML Performance using both
    version 1.4.3 and version 2 of the Xerces parser
  • Performance benchmarking and testing was
    performed under the following environment
  • XML Document Specifications File size 1.95 MB
    (2,055,937 bytes) with Element count 40735
  • Software
  • JProbe 4.0.2 (Benchmarking Software)
  • Junit Testing Framework (Unit Testing Software)
  • Xerces 1.4.3 and Xerces 2.0 (XML Parsing API)
  • JDK 1.4.1 (JVM version)
  • OneSAF Software DOMReader and TestDOMReader
    class
  • Operating System Microsoft Windows 2000 with
    Service Pack 3
  • Hardware Pentium III 1.0 GHZ with 512 RAM and 60
    GB HDD

13
XML Validation and Namespace Performance
  • OneSAF Xerces Benchmarking Results are summarized
    here

14
XML Validation and Namespace Overall Performance
  • Xerces 1.4.3
  • Overall, turning on namespace and validation
    support resulted in increased memory consumption
    of approximately 44.5 and processing that is
    approximately 1.8 times slower than if both had
    been turned off
  • Xerces 2
  • Overall, turning on namespace and validation
    support resulted in increased memory consumption
    of approximately 1 and processing that is
    approximately 2.2 times slower than if both had
    been turned off

15
XML Validation and Namespace Overall Performance
  •  The performance increase between the use of
    Xerces 1.4.3 and 2 is significant
  • When validation and namespace are both turned on,
    even though documents took slightly longer to
    parse, the memory consumed during parsing was
    less than half
  • This has allowed OneSAF to double the size of XML
    documents it was able to originally successfully
    parse in DOM using Xerces 1.4.3

16
XML Encoding
  • The mostly used encodings are USASCII
    (US-ASCII) and Unicode (UTF-8 and UTF-16)
  • W3C requires that all processors automatically
    support UTF-8 and UTF-16
  • US-ASCII is guaranteed to be a single byte and
    map directory to the equivalent Unicode value --
    FAST
  • UTF-8 and UTF-16 results in multiple byte
    sequences being read and converted for each
    character -- SLOW
  • Use US-ASCII if characters DO NOT go beyond the
    ASCII range Otherwise use UTF-8 or UTF-16

17
XML Encoding
  • US-ASCII is guaranteed to be a single byte and
    map directory to the equivalent Unicode value --
    FAST
  • UTF-8 and UTF-16 results in multiple byte
    sequences being read and converted for each
    character -- SLOW
  • OneSAF testing and research resulting in UTF-8
    being the best encoding

18
XML General Performance Tips
  • Understand your data
  • Forecast what a typical fielded-XML document size
    might be
  • Examine the worst-case scenario
  • Determine the best XML technologies for the short
    and long term
  • For example, documents may be small during
    development and DOM may suffice as a solution,
    but will these documents grow once fielded? Will
    DOM still be a good solutions then? Is this
    solution scalable?
  • Dont use XML where it doesnt make sense
  • Avoid using XML when there is no existing or
    future purpose
  • Doing a lot of translation and/or parsing may
    result in bad performance
  • Manipulating XML requires CPU resources, memory
    usage, and may be network intensive

19
XML General Performance Tips
  • More than one XML solution may be needed
  • A number of XML technologies may be necessary to
    accomplish a task
  • For example, both DOM and SAX may have to be used
  • Dont limit an implementation to only one XML
    technology
  • Examine hardware requirements
  • Depending on the implementation, XML can be CPU,
    memory, and network intensive
  • Make certain development and fielded hardware can
    support XML use
  • If memory is limited, avoid DOM, consider SAX

20
XML Parsing Performance Tips
  • Keep XML documents small
  • The bigger the documents, the higher the
    parsing/translation costs and the worse the
    performance
  • If documents are too large, consider logically
    breaking them up into smaller XML documents
  • Reduce the character count
  • Replace elements with attributes where it makes
    sense
  • Avoid excessive use of spaces because parsers
    must scan through it
  • Use tabs in place of spaces if possible
  • Avoid lengthy element and attribute names
  • ltsystemRepositoryServiceIndexFileRootElementTaggt
    ? ltindexgt

21
XML Parsing Performance Tips
  • Explicit or Meta-model
  • Look at whether an explicit or meta-model
    approach should be used
  • Meta-model approaches simplify schemas, but
    increase grammar needed in content documents
    which may also inhibit validation
  • Consider redesigning the XML grammar and
    structure in order to decrease the amount of
    elements and attributes
  • Avoid default values in attributes
  • Too many simply slows down processing
  • Avoid external entities and DTDs
  • Doing so causes overhead

22
XML Parsing Performance Tips
  • Reuse parser instances whenever possible
  • Dont create a new parser each time you need one
  • Create a pool of reusable parser instances
    (especially if in a multi-threaded environment
    and multiple parsers need to be run at once)
  • Turn validation off when not needed
  • Validation is expensive
  • Only validate when you have to
  • If using DTDs, avoid using DOCTYPE in XML
    documents. Some parsers will read the DTD if
    DOCTYPE is specified even if validation is turned
    off

23
XML Parsing Performance Tips
  • Check parser configuration carefully
  • Parsers may perform differently depending on
    whether DTDs or XML Schema is used
  • Check for the recommended parser configuration
    (if there is one)
  • Check for default features being turned on
  • Only use what you need

24
XML Parsing Performance Tips
  • Use the appropriate encoding
  • The three most common encoding schemes are ASCI
    ("US-ASCII"), or Unicode ("UTF-8" or "UTF-16").
  • The W3C XML 1.0 Recommendation requires parsers
    to assume UTF-8 if no encoding is specified.
  • US-ASCII is the fastest to parse because each
    character is guaranteed to be a single byte and
    map directly to their equivalent Unicode value.
  • Documents needing Unicode characters beyond the
    ASCII range must use either "UTF-8" or "UTF-16".
  • Multiple byte sequences must be read and
    converted for each character resulting in a
    performance hit.

25
Summary
  • OneSAF is still learning from the use of XML
  •  
  • XML is a maturing technology
  • Today it has become more than just a document
    markup language, but a viable vehicle in which to
    share and interchange structured data
  • As the popularity of XML grows and becomes more
    widespread, better solutions will become
    available addressing the performance concerns
    being tackled today
  •  

26
Contact Information
  • Boaventura (Ben) DaCosta
  • Dynamics Research Corporation
  • 407-380-1200
  • bdacosta_at_drc.com
  •  
  • Robin Outar
  • Science Applications International Corporation
  • 321-235-7660
  • routar_at_ideorlando.org
Write a Comment
User Comments (0)
About PowerShow.com