Customizing Middleware to Improve Performance and Footprint - PowerPoint PPT Presentation

About This Presentation
Title:

Customizing Middleware to Improve Performance and Footprint

Description:

Test run TAO's performance-tests/Latency/Collocation. TAO Implementation & Automation ... Collocation specialization Macros Strategies (Invocation classes) ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 19
Provided by: nanbo
Category:

less

Transcript and Presenter's Notes

Title: Customizing Middleware to Improve Performance and Footprint


1
Customizing Middleware to Improve Performance
and Footprint
Arvind S. Krishna arvindk_at_dre.vanderbilt.edu
Institute for Software Integrated Systems
Vanderbilt University Nashville, Tennessee
2
Motivation (1/2)
  • Where are we right now?
  • Maturation of Distributed Object Computing
    Middleware (DOC)
  • ACETAO middleware
  • Open-source implementation of CORBA and Real-time
    CORBA
  • Highly optimized implementation implementing
    almost all features of CORBA
  • From Stovepiped to reusable architectures

Functionality factored in middleware
  • Product Line Architectures
  • Set of Systems that share common core features
  • Families of systems then built using core
    features
  • Reduce time to market pressures, cost
    productivity etc
  • Example Boeing Bold Stroke Architecture

Product line architectures minimize cost for
building variants
3
Motivation (2/2)
  • Model Driven Development Paradigm (MDD)
  • Reduces costs of building new families of
    systems
  • Compose different systems at modeling level
  • Model Check for correctness
  • Code-generators synthesize artificats XML
    deployment information, configuration
    information, benchmarking code..

Models capture System properties structure and
behavior
  • Middleware for Product-Lines
  • Still general purpose layered
  • Enables different variants to be hosted by
    different configurations
  • However not optimized for each variant

Information propagation
What we need? Optimizations that customize
middleware based on system invariants
4
Customizing Middleware via Partial Evaluation
  • Partial Evaluation
  • Technique of automatically specializing programs
    based on ahead of time known parameters
  • Two level mechanism
  • First level annotating information
  • Second level involves synthesizing code
  • Templates and Template meta-programming
  • Research will examine
  • Techniques used in programming languages can be
    used in middleware
  • Move from a general purpose to a more specialized
    architecture

Optimized Implementation Stack
General Purpose Layered Architecture
Optimize the known knowns leave known uknowns
to the middleware and use exceptions for unknown
unknowns
5
Existing Middleware Optimizations
  • Footprint Reduction Optimization
  • Micro ORB Architecture ? Virtual Component
    Pattern
  • Micro POA Architecture ? Pluggable components
  • Request Demux/Dispatch Optimizations
  • Connection Management ? Acceptor-Connector
    pattern, Reactor
  • Buffer Management Strategies
  • Request Demultiplexing ? Active Demultiplexing
    Perfect Hashing
  • Arent these optimizations enough?
  • Have worked really well for different
    applications in domains
  • General purpose middleware is still layered
  • Techniques that will fold layers (code and
    run-time checks) to improve performance
  • Will add more to the general purpose optimizations

6
Capturing System Invariants in Models (1/2)
  • Example System
  • Basic Simple (BasicSP) three component
    Distributed Real-time Embedded (DRE) application
    scenario
  • Timer Component triggers periodic refresh rates
  • GPS Component generates periodic position
    updates
  • Airframe Component processes input from the GPS
    component and feeds to Navigation display
  • Navigation Display displays GPS position
    updates
  • Hypothesis ? Solution Approach
  • Use early binding parameters to tailor middleware
  • Techniques applied could range from
  • Conditional Compilation
  • Optimize/Stub skeleton generation
  • Strategy pattern to handle alternatives
  • Program Specialization Invariants
  • Must hold for all specializations
  • output(porig) output (pspl)
  • speed (pspl) gt speed(porig)

Boeing Product line scenario Representative DRE
application rate based
ACE_wrappers/TAO/CIAO/DaNCE/examples/BasicSP
CoSMIC/examples/BasicSP
7
Capturing System Invariants in Models (1/2)
Component Deployment
Component Interactions
Same Endianess
Periodic Timer
Single method interfaces
Collocated Components
  • Mapping Ahead of Time (AOT) System Properties to
    Specializations
  • Periodicity ? Pre-create marshaled Request
  • Single Interface Operations ? Pre-fetch POA,
    Servant, Skeleton servicing request
  • Same Endianess ? Avoid de-marshaling (byte order
    swapping)
  • Collocated Components ? Specialize for target
    location (remove remoting)
  • Same operation invoked ? Cache CORBA Request
    header/update arguments only

8
Specializations Implemented in TAO
  • Client Side Specialization
  • Request Header Caching
  • Pre-creating Requests
  • Marshaling checks
  • Target Location
  • Server Side Specialization
  • Specialize Request Processing
  • Avoid Demarshaling checks
  • Cumulative Effect
  • More than additive increase of adding
    specializations
  • For example
  • Client side request caching
  • Server side specialize request processing
  • 11 3?

9
Specialize for Target Location (1/2)
Intent Specialize a path based on knowledge that
objects are collocated
  • Model Invariants
  • All communication between GPS, Airframe and
    Display components are collocated
  • All Invocations are local
  • Do not need remoting code (Connection code not
    required)
  • Transformations to TAO (foot-print)
  • Eliminate Connection handling code
  • Connection Strategies, Flushing Strategies
  • Eliminate Invocation classes
  • Remote Invocation classes
  • One way and two way invocation classes
  • Transformations to TAO (performance)
  • Eliminate Remoting Checks
  • Object Proxy checks for remoting
  • Invocation Adapter checks for remoting for each
    invocation
  • Checks for one-way or two-way invocation

10
Specialize for Target Location (2/2)
  • TAO Implementation Automation
  • All implementations present in branch
    TAO_PE_Collocation
  • Specialization implemented by Conditional
    compilation technique (TAO_HAS_COLLOCATION) flag
    to remove remoting
  • Profiled optimistic case of absolute no remoting
    (i.e. no code to handle requests and replies)
  • Configuration
  • 2.4.21-27.0.1.ELsmp 1 SMP Redhat kernel
  • Athlon dual processor 2 GHz processor
  • 1 GB RAM and 256 KB cache for each processor
  • Test run TAOs performance-tests/Latency/Collocati
    on

Optimization Performance Improvements CORBA Compliance Automation
Code subsetting removed connection related code Performance elimination of remoting checks libTAO 6 (100 kB of reduction) Application 15 Improved by 10 (over and above Thru_POA) collocation Compliant with CORBA specification Realized by macros Invocation classes can be separated out as libraries
11
Specialize CORBA Request Header (1/4)
Intent Avoid the considerable overhead of
creating new CORBA requests and replies for each
of a series of request calls
  • Model Invariants
  • Timer Component periodically sends same event
  • Operations to retrieve data from the models are
    also the same.
  • Update Rather than Create
  • Do not create new Request each time
  • Use old request and re-use the Request Header
  • Various levels of re-use possible
  • Reuse only Request Header
  • Reuse both Request Header Message Specific
    Header
  • Reuse entire request

This approach similar to TCP header prediction
12
Specialize Request Header (2/4)
  • Request Header Caching
  • First level specialization Cache only the
    Request Header Part
  • Everything else in the request is variable
  • Avoid marshaling de-marshaling costs for the
    header part alone
  • Implemented at client side
  • TAO Implementation
  • First request creates the entire request (code
    flow same as normal path)
  • Cache header information (marshaled)
  • Update only the total size and ID after request
    creation on subsequent messages
  • Implemented via conditional compilation

Optimization Performance Improvements CORBA Compliance Automation
Cache GIOP Request Header part Roundtrip throughput improved by 50-100 calls/sec Compliant with CORBA specification Realized by macros Not much gain by doing this
13
Specialize CORBA Request Header (3/4)
  • TAO Implementation
  • Move buffer pointer to start of data segment
  • Write out the arguments for the call
  • Update the total size of the request (SIZE) and
    REQUEST_ID fields in the request
  • Message Specific Header Caching
  • Cache both Request Header and Message Specific
    Header
  • Object Key is the same
  • Service context information (same)
  • Operation name same e.g., get_data

Server side ? Only when Thread per connection
used GIOP Formats ? Only for GIOP 1.2 as 1.0 and
1.1 service contexts are written first
Optimization Performance Improvements CORBA Compliance Automation
Cache Request Header Request Message Roundtrip throughput improved by 300 350 calls/sec ( 5 ) Latency 3 µsecs ( 5) Compliant with CORBA specification (service contexts) Realizable by using policies at object level at client side
14
Specialize CORBA Request Header (4/4)
  • Intent
  • Instead of caching only the header (Request
    Message specific) pre-create entire CORBA request
  • Model Invariants
  • Timer component sends trigger (heart beats) to
    recipient component. Similar situation for
    timeouts
  • Request and data contents are the same
  • Proposed TAO implementation
  • Special IDL flag that will pre-create (marshal
    the request)
  • Each time same request is sent to the client
  • Update request ID of the request only
  • Save cost of request construction and marshaling

Optimization Performance Improvements CORBA Compliance Automation
Entire CORBA Request Avoids marshaling data completely Can eliminate multiple layers by directly sending request Not Compliant with spec IDL compiler can pre-create and generate entire request
15
Specialized Request Processing (1/2)
  • Intent
  • Resolve the mapping of incoming requests to the
    POA, Servant, Skeleton, and operation to which
    they are dispatched only once, then use these pre
    computed results to optimize the dispatch of
    subsequent requests
  • Model Invariants
  • get_data operation invokes operation on the same
    component, located in the same POA serviced by
    the same servant and operation
  • Once Per Connection Resolution of Dispatch
  • TAO provides Active Demultiplexing Perfect
    Hashing for O(1) lookup time bound
  • Caching just POA may not give a lot of
    performance improvement

16
Specialized Request Processing (2/2)
  • TAO Implementation
  • As the operation names are the same We directly
    cache the skeleton and advance the current buffer
    pointer to beginning of arguments
  • The length is calculated only for the first
    request and re-used. Cost amortized over number
    of operations
  • Implemented via TAO_CACHE_SERVANT_REF conditional
    compilation macro
  • TAO_ROOT/performance-tests/Latency/Single-Threade
    d

Optimization Performance Improvements CORBA Compliance Automation
Cache skeleton directly Round-trip latency 6µsecs (5) Throughput 300 calls/sec ( 5) Caching Skeletons not compliant Cannot be used in Default Servant and Servant Locator classes Provide policies at POA (now that it is refactored) to implement this layer folding Implemented as separate IIOPConnection handler class
This is similar to Direct Collocation
optimization for a collocated request
17
Specialize Marshaling/De-marshaling
  • Intent
  • To mask endianess GIOP Request header contains a
    flag that indicates endianess of the request
  • If different endianess, do byte swapping
  • Model Invariants
  • The two machines on which the components are
    hosted have the same endianess (byte order) No
    checks for byte order required
  • ACE Implementation
  • ACE_CDR streams provide for ACE_SWAP_ON_WRITE and
    ACE_DISABLE_SWAP_ON_READ macros that can be used
    to eliminate checks for byte-ordering
  • Macros and not set by default. Model interpreters
    could generate configuration setting to enable
    these macros

Optimization Performance Improvements CORBA Compliance Automation
Demarshaling check elimination Will improve more than 10 if conditions for a normal CORBA request Improvements in both client and server side Used in conjunction with header caching optimizations Compliant with CORBA specification Conditional compilation techniques
18
Concluding Remarks Future Work
  • Specialization techniques can be used as a
    technique for folding layers based on system
    invariants
  • Current implementation first cut uses
    conditional compilation strategies. Examine more
    appropriate strategies for implementing these
    specialization
  • Request Header Caching Strategies controlled by
    svc.conf
  • Specialize Request Processing POA request
    processing policy
  • Marshaling/de-marshaling ACE level
  • Pre-create request IDL Generated code
  • Collocation specialization Macros Strategies
    (Invocation classes)

Examine specialization at the Component
Middleware level and Infrastructural Middleware
level
Write a Comment
User Comments (0)
About PowerShow.com