Title: The Diary of a Datum: An Approach to Modeling Runtime Complexity in FrameworkBased Applications
1The Diary of a DatumAn Approach to Modeling
Runtime Complexity in Framework-Based Applications
- Nick Mitchell, Gary Sevitsky (speaker), Harini
Srinivasan - IBM T.J. Watson Research Center
- Oct. 16, 2005
2Background
- Applications are built more and more by
integrating libraries and frameworks - Lots of standard frameworks (J2EE, servlets, XML,
JSPs, eMF, ) - Plus industry-specific frameworks, in-house
frameworks - Our research group has been diagnosing
performance problems in large-scale
framework-based Java applications for more than
five years - High volume web-based servers
- Client-side applications built on large
frameworks like Eclipse
3Problem
- It takes a lot of work to perform very simple
tasks, even after tuning at the application level
- Conversion of a stock purchase date field from
SOAP to a Java business object field
Source SOAP client, Trade benchmark v.3.1
4What are these applications doing that is so
expensive?
- Not what you would expect.
- Example accessing the database?
- Inefficiencies in multiple layers of frameworks
to process queries are the source of many
performance problems. - Example expensive sort algorithm?
- More often the problem is in the coupling of the
sort algorithm and the comparator, or the sort
algorithm and the UI framework that calls it - In general, problems are not due to poor
algorithms. Nor are they located in a few hot
methods or paths.
5What is costing so much?
- Most activity is transformation of data
- To meet the requirements of framework APIs or
external standards - Each transformation often contains many smaller
transformations - Much effort is also spent facilitating these
transformations - e.g. initializing converters or looking up
schemas - Usually there is little or no change to the
information content
6From customer application Diary of a timecard
- One timecard record has 11 fields
- Each step can be very expensive
- and usually contains many smaller transformations
7How can we understand the sources of inefficiency
and runtime complexity?
- We would like to view a run in terms that make
these transformations visible - Existing performance tools are focused on control
flow, and report in terms of methods, paths,
packages. - Most of the work in these applications is
massaging data. This work doesnt line up with
methods, paths, packages. - We would like to understand the general causes of
cost and complexity in these applications - So we can compare diverse implementations
- So we can surface more general characteristics
API design practices, implementation practices,
opportunities for automated optimization, etc. - Existing performance tools only help find
specific bottlenecks
8Approach
- Structure a run into a hierarchy of diaries
- Organized according to the transformation of
logical content - e.g. flow of an Employee record from SOAP to Java
to HTML - Metrics for cost and complexity
- Manual approach right now
- Lots of opportunities for automation
- Allows insights into single implementations, and
comparisons across diverse implementations
9Example
- Conversion of a stock purchase date field from
SOAP to a Java business object field
Source SOAP client, Trade benchmark v.3.1
10From Trade Diary of a Date (SOAP parsing level)
- Detail of just the first step of the previous
slide
11From Trade Diary of a Date (Java
SimpleDateFormat parsing)
- Detail of SimpleDateFormat parse step from
previous slide
12From Trade Diary of a year/month/day
- Detail of extract and parse subfield from
previous slide - Six transformations to parse a year!
13Metrics of cost and complexity
- Cost aggregate costs by transformation
- Aids understanding by measuring something
accomplished. - e.g. 268 calls, 70 objects to parse a field
- Complexity count transformations
- Shows the complexity hidden in each step
- Histogram by level shows how far afield
- e.g. 36 transformations parsing subfields
- These metrics enable comparisons across diverse
implementations
14Ongoing research
- Validation by hand on applications (large and
small examples) - Automation of structuring into diaries
- Combination of static and dynamic analysis
- Automation will also enable further validation of
approach - Classification of transformations
- Developing a framework-independent vocabulary for
what transformations accomplish - e.g. various kinds of change in physical
representation - e.g. various kinds of change in logical content
- Developing metrics based on classification
- Enables descriptive characterization of a run
- Also gives us a more formal definition of
transformation