Title: A Framework for Assessing the RealTime Performance of Generic Code
1A Framework for Assessing the Real-Time
Performance of Generic Code
Mark A. Rybka Washington University in Saint Louis
Advisors Dr. Ron K. Cytron, Dr. Christopher D.
Gill Department of Computer Science and
Engineering Washington University
This work supported in part by DARPA under
contracts F33615-00-C-1697 and F33615-03-C-4111
The author thanks The Boeing Company for
supporting his graduate studies
2Context
What was the research about?
Set out to understand the performance
characteristics of the C Standard Template
Library (STL)
LISTltTYPE, ALLOCgt
DEQUEltTYPE, ALLOCgt
VECTORltTYPE, ALLOCgt
3Context, continued
What was the research about?
Developed a framework for profiling the
performance of three different interfaces of the
STL
4Context, continued
What was the research about?
- Used the framework to examine the performance of
push_back and pop_back operations of the STL
sequence containers - Noticed patterns in the three different
instances of the framework developed - Used patterns to describe a conceptual
architecture for assessing the performance of
generic code
5Context, continued
Issues and Challenges Addressed
Black box versus White Box Testing
- STL and any generic code defines an interface
that is a black box without visibility of an
implementation - Some aspects of the implementation may need to
be addressed to perform testing accurately - Black box testing Goal is to identify
problems, not to prove absence of problems
6Context, continued
Issues and Challenges Addressed
- Determining the interface-to-subinterface
interactions - Determining percentages of time spent in
subinterfaces during operations
CONTAINERltTYPE, ALLOCltTYPEgt gt
?
?
TYPE
ALLOClt?gt
7Context, continued
Issues and Challenges Addressed
- Reducing interference from the test platform or
framework - Correlation of spikes in the data to software
versus system
8Real-time Systems
A real-time system is one in which correctness of
the system depends not only on the LOGICAL
RESULTS but also on THE TIME AT WHICH the results
are provided
- Scheduling analysis requires reliable estimates
of the running time of the programs tasks - If bounds are overestimated, CPU resources can
be wasted, or scheduling would be deemed
infeasible - If bounds are underestimated, deadlines may be
missed, causing system failure
9Real-time Systems, continued
Items that must be addressed to ensure execution
time is bounded
- Understanding of the runtime system
- OS process priorities
- Memory system and potential paging
- Understanding of software subcomponents
(middleware) - Must be able to predict the time bounds of a
component - Must be made ready for real-time, which implies
a mechanism to profile performance readiness - Non-predictability is contagious
10Type Independent Performance Framework
Requirements
11Experimental Procedure
Description
- One test run consists of 30,000 push_backs on
the container, one after the other - Time for each operation is output
- Interface-to-subinterface interactions output
- Type of container changed between tests
- Allocator changes
- Container changes
- Contained Type changes
12Experimental Data, continued
Handling System Jumps
The STL specification says constant time inserts
on a list. Why are there jumps in the data, is
it system noise or the list implementation?
13Experimental Data, continued
Handling System Jumps
FILTERED OUT
FILTERED OUT
- Software test does the same thing every time it
is run - Multiple runs can be performed and jumps can be
compared - Jumps that do not occur between test runs are
not attributed to the software - 10 test runs were performed for each test
- Spikes that did not occur in every test were
filtered out
14Experimental Data, continued
Handling System Jumps
Unfiltered
True Behavior of List
The STL specification says constant time inserts
on a list. The list implementation has jumps in
it, systems spikes have been filtered out.
15Experimental Data, continued
List using an ACE cached allocator
Using a different allocator made the list
performance tightly bounded. The allocator has
been isolated as the cause of the jumps (after
system noise filtering).
16Experimental Data, continued
Another example of system noise filtering
Unfiltered
True Behavior of List
Filtering on this run definitely highlights the
need for the system noise filtering. Note the
first push_back operation jump remains through
filter.
17Experimental Data, continued
Comparing the performance of the sequence
containers
A comparison of list, vector, and deque with
SimpleClass for push_back.
18Experimental Data, continued
List
- In 30,000 push_back calls
- 30,000 T copy constructors
- 30,000 calls to allocate
- One copy constructor and one allocate for each
push_back. Spikes due to allocator, as shown.
19Experimental Data, continued
Deque
- In 30,000 push_back calls
- 30,937 T copy constructors
- 937 T destructors
- 945 calls to allocate
- 8 calls to deallocate
- Does not do an allocate on every push_back. Some
spikes due to allocator, as shown.
20Experimental Data, continued
Deque
21Experimental Data, continued
Vector
- In 30,000 push_back calls
- 62,767 T copy constructors
- 32,767 T destructors
- 16 calls to allocate
- 15 calls to deallocate
- Large spikes occur with 1 allocate, 1 deallocate,
and a bunch of copy constructors and destructors.
ACEallocator does not make tightly bounded.
22Experimental Data, continued
Comparing the performance of the sequence
containers
A comparison of list, vector, and deque for
push_back.
30,000 push_back operations invoke.
23Experimental Data, continued
Comparing the performance of the sequence
containers
A comparison of list, vector, and deque for
push_back.
Deque using default allocator wins with average
and worst! W/A ratio similar to list with
default allocator!
- Deque does not need to allocate on every
push_back like list, lowering its average - Deque does not need to mass copy and destroy
like vector, lowering its worst
24Experimental Data, continued
Comparing the performance of the sequence
containers
A comparison of list, deque and vector with more
complicated TYPE, map, for push_back.
25Experimental Data, continued
Comparing the performance of the sequence
containers
- In 30,000 push_back calls
- 30,000 T copy constructors
- 30,000 calls to allocate
- One copy constructor and one allocate for each
push_back. Spikes not all due to allocator,
ACEallocator does not fully help! Is TYPE now
the problem?
26Experimental Data, continued
Comparing the performance of the sequence
containers
Copy constructor for map uses the default
allocator causing jumps! Map using ACEallocator
makes list behavior tightly bounded again.
27Experimental Data, continued
Deque
- In 30,000 push_back calls
- 30,714 T copy constructors
- 714 T destructors
- 721 calls to allocate
- 7 calls to deallocate
- Moving to ACEallocator eliminated the spikes
causes by deque accessing allocator as before.
How is Map copy constructor affecting deque?
28Experimental Data, continued
Deque
Map using ACEallocator does not have the spikes
associated with its copy constructor. Graph
looks more like behavior of deque with
SimpleClass again.
29Experimental Data, continued
Vector
Changing allocator for both vector and map did
not change overall container behavior
significantly.
30Experimental Data, continued
Comparing the performance of the sequence
containers
A comparison of list and deque for push_back.
Deque with default allocator beats list on
average and worst! W/A ratio similar to list with
ACE allocator!
- Deque performs better than list with map as
type. Map copy constructor spike occurs with an
allocate call in list case, not necessarily an
allocate call made in deque case - Deque with default allocator performs better
than with ACE allocator! Remember, allocator
given to deque is not the cause of the worst
spike. Map copy constructor spike not as large
due to deque use of default allocator
31Type Independent Performance Framework Design
Patterns
Problem/Context Need a module to represent a
single operation on an interface.
32Type Independent Performance Framework Design
Patterns, continued
Problem/Context Need a module to represent the
usage-pattern of a type for a test run, including
operations that change object state.
33Type Independent Performance Framework Design
Patterns, continued
Problem/Context Need a module to represent an
already executed test run, correlating the Test
Signature and the results.
34Type Independent Performance Framework Design
Patterns, continued
Problem/Context Need a module to execute the
tests and measure the time. Must take a Test
Signature and return a Test Record.
35Type Independent Performance Framework Design
Patterns, continued
Problem/Context Need a module to query time.
36Type Independent Performance Framework Design
Patterns, continued
Problem/Context Need a module that can
statistically reduce the data found in Test
Records.
37Type Independent Performance Framework Design
Patterns, continued
Problem/Context Need a module that can store
Test Records for future queries to save time
associated with rerunning tests.
38Performance Framework Realized on STL
- Type independent performance framework design
patterns realized on three different interfaces,
each related to the C Standard Template Library
(STL) - Sequence Containers - Exercises and queries the
performance of the push_back and pop_back
operations of provided type - Allocators - Exercises and queries the
performance of the allocate and deallocate
operations of the provided type - Container types - Exercises any type that may
be used in a container (default constructor,
destructor, copy constructor, assignment operator
of any type)
39Performance Framework Realized on STL
Container Tester
How does a container interface with the
subcomponents? This became a unique
responsibility of the Container Tester. Used
signature generation technique.
40Performance Framework Realized on STL
Allocator Tester
Which allocator is actually used in the container
operations? This became a unique responsibility
of the Allocator Tester. Used rebind allocator
interception technique.
41Conclusions
Black box versus White Box Testing
- Profiling behavior of a black box library can
only find problems not prove absence of problems - Finding problems is useful for diagnosis and
solution - Profiling behavior can build confidence
42Conclusions, continued
Issues and Challenges Addressed
White box testing versus black box testing
- Some items needed white box examination. In
particular - Allocator rebind type
- No assurance that only one type is used for all
allocator requests by a container - Nothing in interface says that rebind even
occurs - First run-time interception point for allocator
- Explaining software jumps in data needs some
implementation knowledge - Examination for specializations, in case of
interface-to-subinterface interactions
43Conclusions, continued
Issues and Challenges Addressed
- Determining the interface-to-subinterface
interactions - Used signature generation techniques
- Determining percentages of time spent in
subinterfaces during operations - Given generated subinterface test signature, can
execute to see how much time sub-operations take - Can provide percentage of time in sub-operations
on a per test basis
44Conclusions, continued
Issues and Challenges Addressed
- Reducing interference from the framework
- Run test of known performance through framework
and should see expected behavior - Get confidence that parts of framework executed
during that test did not interfere - Correlation of spikes in the data to software
versus system - Deterministically running the software tests
means spikes in software should occur in same
place between tests requires some white box
information - Spikes that do not occur in all tests are
considered system noise and removed
45Conclusions, continued
C/STL Workarounds
- Container Rebinding taking a container
parameterized on one type and rebinding to
associated types - Useful for creating a signature generating
container in a generic way - Generic programming rule provide rebind on all
parameterized types? - Compile time access to rebound allocator types
- Some way to know more about how a container
interfaces with the allocator - Type-to-string conversions
- Some way to get a string from a type, built-in
to the language
46Conclusions, continued
Future Work
- Extend framework template to other interfaces
- Generative containers
- May use existing Test Signature concept
- Container choice is left to system based on
provided usage-pattern - Code is self-documenting
- Compile-time Test Records, stored in C