A Framework for Assessing the RealTime Performance of Generic Code - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

A Framework for Assessing the RealTime Performance of Generic Code

Description:

A Framework for Assessing the Real-Time Performance of Generic Code ... 30,000 calls to allocate. One copy constructor and one allocate for each push_back. ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 47
Provided by: marka47
Category:

less

Transcript and Presenter's Notes

Title: A Framework for Assessing the RealTime Performance of Generic Code


1
A Framework for Assessing the Real-Time
Performance of Generic Code
Mark A. Rybka Washington University in Saint Louis
Advisors Dr. Ron K. Cytron, Dr. Christopher D.
Gill Department of Computer Science and
Engineering Washington University
This work supported in part by DARPA under
contracts F33615-00-C-1697 and F33615-03-C-4111
The author thanks The Boeing Company for
supporting his graduate studies
2
Context
What was the research about?
Set out to understand the performance
characteristics of the C Standard Template
Library (STL)
LISTltTYPE, ALLOCgt
DEQUEltTYPE, ALLOCgt
VECTORltTYPE, ALLOCgt
3
Context, continued
What was the research about?
Developed a framework for profiling the
performance of three different interfaces of the
STL
4
Context, continued
What was the research about?
  • Used the framework to examine the performance of
    push_back and pop_back operations of the STL
    sequence containers
  • Noticed patterns in the three different
    instances of the framework developed
  • Used patterns to describe a conceptual
    architecture for assessing the performance of
    generic code

5
Context, continued
Issues and Challenges Addressed
Black box versus White Box Testing
  • STL and any generic code defines an interface
    that is a black box without visibility of an
    implementation
  • Some aspects of the implementation may need to
    be addressed to perform testing accurately
  • Black box testing Goal is to identify
    problems, not to prove absence of problems

6
Context, continued
Issues and Challenges Addressed
  • Determining the interface-to-subinterface
    interactions
  • Determining percentages of time spent in
    subinterfaces during operations

CONTAINERltTYPE, ALLOCltTYPEgt gt
?
?
TYPE
ALLOClt?gt
7
Context, continued
Issues and Challenges Addressed
  • Reducing interference from the test platform or
    framework
  • Correlation of spikes in the data to software
    versus system

8
Real-time Systems
A real-time system is one in which correctness of
the system depends not only on the LOGICAL
RESULTS but also on THE TIME AT WHICH the results
are provided
  • Scheduling analysis requires reliable estimates
    of the running time of the programs tasks
  • If bounds are overestimated, CPU resources can
    be wasted, or scheduling would be deemed
    infeasible
  • If bounds are underestimated, deadlines may be
    missed, causing system failure

9
Real-time Systems, continued
Items that must be addressed to ensure execution
time is bounded
  • Understanding of the runtime system
  • OS process priorities
  • Memory system and potential paging
  • Understanding of software subcomponents
    (middleware)
  • Must be able to predict the time bounds of a
    component
  • Must be made ready for real-time, which implies
    a mechanism to profile performance readiness
  • Non-predictability is contagious

10
Type Independent Performance Framework
Requirements
11
Experimental Procedure
Description
  • One test run consists of 30,000 push_backs on
    the container, one after the other
  • Time for each operation is output
  • Interface-to-subinterface interactions output
  • Type of container changed between tests
  • Allocator changes
  • Container changes
  • Contained Type changes

12
Experimental Data, continued
Handling System Jumps
The STL specification says constant time inserts
on a list. Why are there jumps in the data, is
it system noise or the list implementation?
13
Experimental Data, continued
Handling System Jumps
FILTERED OUT
FILTERED OUT
  • Software test does the same thing every time it
    is run
  • Multiple runs can be performed and jumps can be
    compared
  • Jumps that do not occur between test runs are
    not attributed to the software
  • 10 test runs were performed for each test
  • Spikes that did not occur in every test were
    filtered out

14
Experimental Data, continued
Handling System Jumps
Unfiltered
True Behavior of List
The STL specification says constant time inserts
on a list. The list implementation has jumps in
it, systems spikes have been filtered out.
15
Experimental Data, continued
List using an ACE cached allocator
Using a different allocator made the list
performance tightly bounded. The allocator has
been isolated as the cause of the jumps (after
system noise filtering).
16
Experimental Data, continued
Another example of system noise filtering
Unfiltered
True Behavior of List
Filtering on this run definitely highlights the
need for the system noise filtering. Note the
first push_back operation jump remains through
filter.
17
Experimental Data, continued
Comparing the performance of the sequence
containers
A comparison of list, vector, and deque with
SimpleClass for push_back.
18
Experimental Data, continued
List
  • In 30,000 push_back calls
  • 30,000 T copy constructors
  • 30,000 calls to allocate
  • One copy constructor and one allocate for each
    push_back. Spikes due to allocator, as shown.

19
Experimental Data, continued
Deque
  • In 30,000 push_back calls
  • 30,937 T copy constructors
  • 937 T destructors
  • 945 calls to allocate
  • 8 calls to deallocate
  • Does not do an allocate on every push_back. Some
    spikes due to allocator, as shown.

20
Experimental Data, continued
Deque
21
Experimental Data, continued
Vector
  • In 30,000 push_back calls
  • 62,767 T copy constructors
  • 32,767 T destructors
  • 16 calls to allocate
  • 15 calls to deallocate
  • Large spikes occur with 1 allocate, 1 deallocate,
    and a bunch of copy constructors and destructors.
    ACEallocator does not make tightly bounded.

22
Experimental Data, continued
Comparing the performance of the sequence
containers
A comparison of list, vector, and deque for
push_back.
30,000 push_back operations invoke.
23
Experimental Data, continued
Comparing the performance of the sequence
containers
A comparison of list, vector, and deque for
push_back.
Deque using default allocator wins with average
and worst! W/A ratio similar to list with
default allocator!
  • Deque does not need to allocate on every
    push_back like list, lowering its average
  • Deque does not need to mass copy and destroy
    like vector, lowering its worst

24
Experimental Data, continued
Comparing the performance of the sequence
containers
A comparison of list, deque and vector with more
complicated TYPE, map, for push_back.
25
Experimental Data, continued
Comparing the performance of the sequence
containers
  • In 30,000 push_back calls
  • 30,000 T copy constructors
  • 30,000 calls to allocate
  • One copy constructor and one allocate for each
    push_back. Spikes not all due to allocator,
    ACEallocator does not fully help! Is TYPE now
    the problem?

26
Experimental Data, continued
Comparing the performance of the sequence
containers
Copy constructor for map uses the default
allocator causing jumps! Map using ACEallocator
makes list behavior tightly bounded again.
27
Experimental Data, continued
Deque
  • In 30,000 push_back calls
  • 30,714 T copy constructors
  • 714 T destructors
  • 721 calls to allocate
  • 7 calls to deallocate
  • Moving to ACEallocator eliminated the spikes
    causes by deque accessing allocator as before.
    How is Map copy constructor affecting deque?

28
Experimental Data, continued
Deque
Map using ACEallocator does not have the spikes
associated with its copy constructor. Graph
looks more like behavior of deque with
SimpleClass again.
29
Experimental Data, continued
Vector
Changing allocator for both vector and map did
not change overall container behavior
significantly.
30
Experimental Data, continued
Comparing the performance of the sequence
containers
A comparison of list and deque for push_back.
Deque with default allocator beats list on
average and worst! W/A ratio similar to list with
ACE allocator!
  • Deque performs better than list with map as
    type. Map copy constructor spike occurs with an
    allocate call in list case, not necessarily an
    allocate call made in deque case
  • Deque with default allocator performs better
    than with ACE allocator! Remember, allocator
    given to deque is not the cause of the worst
    spike. Map copy constructor spike not as large
    due to deque use of default allocator

31
Type Independent Performance Framework Design
Patterns
Problem/Context Need a module to represent a
single operation on an interface.
32
Type Independent Performance Framework Design
Patterns, continued
Problem/Context Need a module to represent the
usage-pattern of a type for a test run, including
operations that change object state.
33
Type Independent Performance Framework Design
Patterns, continued
Problem/Context Need a module to represent an
already executed test run, correlating the Test
Signature and the results.
34
Type Independent Performance Framework Design
Patterns, continued
Problem/Context Need a module to execute the
tests and measure the time. Must take a Test
Signature and return a Test Record.
35
Type Independent Performance Framework Design
Patterns, continued
Problem/Context Need a module to query time.
36
Type Independent Performance Framework Design
Patterns, continued
Problem/Context Need a module that can
statistically reduce the data found in Test
Records.
37
Type Independent Performance Framework Design
Patterns, continued
Problem/Context Need a module that can store
Test Records for future queries to save time
associated with rerunning tests.
38
Performance Framework Realized on STL
  • Type independent performance framework design
    patterns realized on three different interfaces,
    each related to the C Standard Template Library
    (STL)
  • Sequence Containers - Exercises and queries the
    performance of the push_back and pop_back
    operations of provided type
  • Allocators - Exercises and queries the
    performance of the allocate and deallocate
    operations of the provided type
  • Container types - Exercises any type that may
    be used in a container (default constructor,
    destructor, copy constructor, assignment operator
    of any type)

39
Performance Framework Realized on STL
Container Tester
How does a container interface with the
subcomponents? This became a unique
responsibility of the Container Tester. Used
signature generation technique.
40
Performance Framework Realized on STL
Allocator Tester
Which allocator is actually used in the container
operations? This became a unique responsibility
of the Allocator Tester. Used rebind allocator
interception technique.
41
Conclusions
Black box versus White Box Testing
  • Profiling behavior of a black box library can
    only find problems not prove absence of problems
  • Finding problems is useful for diagnosis and
    solution
  • Profiling behavior can build confidence

42
Conclusions, continued
Issues and Challenges Addressed
White box testing versus black box testing
  • Some items needed white box examination. In
    particular
  • Allocator rebind type
  • No assurance that only one type is used for all
    allocator requests by a container
  • Nothing in interface says that rebind even
    occurs
  • First run-time interception point for allocator
  • Explaining software jumps in data needs some
    implementation knowledge
  • Examination for specializations, in case of
    interface-to-subinterface interactions

43
Conclusions, continued
Issues and Challenges Addressed
  • Determining the interface-to-subinterface
    interactions
  • Used signature generation techniques
  • Determining percentages of time spent in
    subinterfaces during operations
  • Given generated subinterface test signature, can
    execute to see how much time sub-operations take
  • Can provide percentage of time in sub-operations
    on a per test basis

44
Conclusions, continued
Issues and Challenges Addressed
  • Reducing interference from the framework
  • Run test of known performance through framework
    and should see expected behavior
  • Get confidence that parts of framework executed
    during that test did not interfere
  • Correlation of spikes in the data to software
    versus system
  • Deterministically running the software tests
    means spikes in software should occur in same
    place between tests requires some white box
    information
  • Spikes that do not occur in all tests are
    considered system noise and removed

45
Conclusions, continued
C/STL Workarounds
  • Container Rebinding taking a container
    parameterized on one type and rebinding to
    associated types
  • Useful for creating a signature generating
    container in a generic way
  • Generic programming rule provide rebind on all
    parameterized types?
  • Compile time access to rebound allocator types
  • Some way to know more about how a container
    interfaces with the allocator
  • Type-to-string conversions
  • Some way to get a string from a type, built-in
    to the language

46
Conclusions, continued
Future Work
  • Extend framework template to other interfaces
  • Generative containers
  • May use existing Test Signature concept
  • Container choice is left to system based on
    provided usage-pattern
  • Code is self-documenting
  • Compile-time Test Records, stored in C
Write a Comment
User Comments (0)
About PowerShow.com