Title: The Mathematics of Performance Management and Capacity Planning Overview Descriptive and Predictive
1The Mathematics of Performance Management and
Capacity Planning - OverviewDescriptive and
Predictive Analytics in the Age of Virtual
Systems
Tim Browning Presented at the Greater Atlanta
Computer Measurement Group Fall Conference,
October 22, 2008
2On Mathematics Statistics
- There are two kinds of statistics, the kind you
look up and the kind you make up. Rex Stout,
Death of a Doxy
How many times can you subtract 7 from 83, and
what is left afterwards? You can subtract it as
many times as you want, and it leaves 76 every
time. Author Unknown
In ancient times, they had no statistics, so they
had to fall back on lies. Stephen B. Leacock
3Goals Performance Engineering and Capacity
Management
- Goals of Performance Engineering
- Monitor/Manage/Predict System Performance
- Reflect and Understand Customer Experience
- Foundation of evidence-based Capacity Management
- Goals of Capacity Management
- Assure Computing Supply is available to Meet
Business Demand - Determine Best use of existing resources
(optimization)
4Probability, Probity and Authority
- Before the seventeenth century, legal evidence in
Europe was considered of greater weight if a
person testifying had probity. Empirical
evidence was barely a concept. Probity was a
measure of authority, so evidence came from
authority. A noble person had probity. Yet today,
probability is the very measure of the weight of
empirical evidence in science, arrived at from
inductive or statistical inference. - The term 'probable' (Latin probabilis) meant
approvable, and was applied in that sense, to
opinion and to action. A probable action or
opinion was one such as sensible people would
undertake or hold, in the circumstances. - Even so, the jury of executive opinion, in the
business-government Enterprise, is most often
swayed by the consensus of expert opinion,
usually at considerable cost.
5- Probability and Statistics are not the same -
They are related, but circuitously related - Probability can be viewed either as the long-run
frequency of occurrence or as a measure of the
plausibility of an event given incomplete
knowledge - but not both. - Statistics are functions of the observations
(data) that often have useful and even surprising
properties. - So we see the relationship(s) between probability
and statistics - From the observations we compute statistics that
we use to estimate population parameters, which
index the probability density, from which we can
compute the probability of a future observation
from that density. - In general, probability asks what is likely to
happen and statistics describes what has already
happened (and forms the basis for what is likely) - In statistics, you dont know how a process works
but are able to observe the outcomes in
probability you already know how a process works
but want to know how to predict what will happen.
The combination is the foundation of statistical
inference.
6- Descriptive Statistics are used to describe the
basic features of the data gathered from an
experimental study in various ways. They provide
simple summaries about the sample and the
measures. Together with simple graphics analysis,
they form the basis of virtually every
quantitative analysis of data. - Two objectives for formulating a summary
statistic - To choose a statistic that shows how different
units seem similar. Statistical textbooks call
one solution to this objective, a measure of
central tendency. - To choose another statistic that shows how they
differ. This kind of statistic is often called a
measure of statistical variability.
7- Central Tendency
- Central middle value, center
- Tendency Expected value, most frequent,
representative - Arithmetic Mean
- The arithmetic mean is the most common measure of
central tendency. - It is simply the sum of the numbers divided by
the number of numbers. - The symbol M is used for the mean of a
population. The symbol M is - used for the mean of a sample. The formula for m
is shown below -
- where SX is the sum of all the numbers in the
numbers in the sample and - N is the number of numbers in the sample. As an
example, the mean of - the numbers
-
- 12368 4
- regardless of whether the numbers constitute the
entire population or just a sample from the
8- Other, less common measures of central tendency
- Median is the middle value the point where half
the values lie on each side of the number, i.e.
half are larger and half are smaller. The
middle of the distribution of values. - The number separating the higher half of a
sample, a population, or a probability
distribution, from the lower half. - If you divide a distribution into 4ths
(quartiles), then the median is the 2nd quartile. - Useful in performance management in the presence
of outliers where we are more concerned about
frequency of occurrence relative to a central
value than a theoretical average that many not
even occur in the data. For example, response
time. - Percentiles group data by putting equal numbers
of data into each group. The nth percentile is
the point below which n of the data are found. - Useful in performance as it provides a very good
view of the users experience. - Useful in capacity planning for sizing a system
based on accommodation of its historical high
points. For example, the 90th percentile of CPU
busy.
9- When to use the arithmetic mean
- When your data contains no outliers (extreme
values that are not typical or normative). - When the variability is low between values, for
example in utilization metrics.. when the
variability is less than 20. - What can you do about outliers (dirty data)?
- Eliminate them (i.e. they are few and unlikely to
reoccur). - Use a weighted mean that discounts the outliers.
The weighted mean is similar to an arithmetic
mean (the most common type of average), where
instead of each of the data points contributing
equally to the final average, some data points
contribute more than others. - Use the Geometric Mean which has remarkable
insensitivity to outliers.
10The Dirty Data Experiment with the Geometric Mean
11The Dirty Data Experiment with the Weighted Mean
(1/19)-(1/19)0.2
(1/19)((1/19)0.2)/18
A convex combination is a linear combination of
points (which can be vectors, scalars, etc.)
where all coefficients are non-negative and sum
up to 1.
12There are liars, outliers, and out-and out
liars.
- What are outliers?
- Extreme values not typical of the group
- Rare events that do not fit within the range of
other data values. - Non-normative data, anomalous, exceptional, etc.
- How are they detected?
- Visually using statistical graphics
- Statistical Filtering
- Interquartile fencing less than lower quartile
greater than upper quartile - More advanced methods Grubbs Test, etc
- There is no such thing as a simple test!
13- The Geometric Mean
- Instead of adding the set of numbers and then
dividing the sum by the count of numbers in the
set, n, the numbers are multiplied and then the
nth root of the resulting product is taken. - For instance, the geometric mean of two numbers,
say 2 and 8, is just the square root (i.e., the
second root) of their product, 16, which is 4. As
another example, the geometric mean of 1, ½, and
¼ is the cube root (i.e., the third root) of
their product (0.125), which is ½.
In SQL-eese SELECT EXP(AVG(LN(Response_Time)))
as GEOMEAN FROM
14- The geometry part of the Geometric Mean
- Consider a line where the beginning is at point
A and the end is at point B, where is the
middle (point B)?
C
A
B?
15- Measures of variability
- Variance the amount of spread in the data
around the mean. - Standard Deviation square root of the variance
- In a normal distribution approx 2/3 of the data
are within one standard deviation of the mean on
either side
In performance large response time Std Devns are
usually bad you want it to be low and
repeatable. Wide variations upset people more
than long, but consistent times.
16- The Geometric Standard Deviation
- The antilog of the standard deviation of the
natural log transformed values of x or
In SQL-eese SELECT EXP(STDDEV(LN(Response_Time))
) as GEOSTDEV FROM the_data WHERE Response_Time
gt0
17- Correlation and Regression
- Correlation How things vary together (or not)
the strength and direction of a linear
relationship between two random variables or the
departure of two variables from independence. - There are severalPearson, being the most common
in performance analysis (but mis-named) - Probably the most misused statistical tool.
- Obtained by dividing the covariance of two
variables by the product of their standard
deviations.
18- Linear Regression and its cousins (non-linear,
multi-, and logistic, etc.) are all methods for
fitting curves or lines to data in a
statistically optimal manner. The best way of
drawing a line since the invention of the
straight edge Pat Artis. - Often used by managers to observe trends and
predict the future (or explain the past). Often
misused for the same purpose. - In statistics, linear regression is a form of
regression analysis in which the relationship
between one or more independent variables and
another variable, called dependent variable, is
modeled by a least squares function, called
linear regression equation. This function is a
linear combination of one or more model
parameters, called regression coefficients. A
linear regression equation with one independent
variable represents a straight line. The results
are subject to statistical analysis.
19- Linear regression in Excel
- Using Graphical techniques
20Examples of Capacity/Performance Reporting in use
now
Traditional time series line charts
21Advanced Statistical Graphics
3-D Performance Surface Multi-temporal density
plot Expected high/low/actual
22SAP CCMS Metrics via SAS/Graph
23Application Response Time Modeling
System Unresponsive
APPLICATION RESPONSE TIME
Small Changes, Large Impact
Large Changes, Small Impact
l
INCREASING APPLICATION WORKLOAD
24How does Modeling differ from Trending in
prediction?
Application Modeling vs. Linear Regression via
Trending
Date predicted Via Trending
Date predicted Via Modeling
Application Response Time
SLA Threshold
System Load Measurement
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Application Workload
25Modern Dynamic Systems are Challenging to
Understand
26(No Transcript)
27- Response to Capacity/Performance Crisis
- I. System/Application tuning, re-engineering,
and optimization - Benefit Considerable merit is obtained
sometimes in the hundreds of percent
improvements. Achieved via system administrative
action (usually parametric changes for the OS)
and by algorithmic and parametric
re-specification (for the application). No
capital expense. Efficient use of resources. - Detriments The effects may not be enduring for
dynamic systems as version/release changes and
application functionality changes can, and do,
degrade performance tuning effects quickly. Often
system reinitiatlization (reboot, IPL) is
required and creates an availability/service
delivery issue. Application re-engineering for
performance may be, and often is, cost
prohibitive and/or unsupported by executive
management. - 2. Capacity Increase via upgrade/replacement or
technology refresh - Benefit Reduces risk of unsupported/unrecoverabl
e infrastructure conditions. The effect is
usually long term. Accommodates increased
application functionality for business utility. - Detriments Capital expense may be incurred.
Inefficiencies remain. Risk management to avoid
undersizing or oversizing requires expensive
predictive modeling tools. Predictive analytics
requires advanced skills in tech staffing. Risks
associated with new technologies which may
increase complexity (e.g. virtualization). Costs
may be unsupported by executive management.
28Modeling? Why?
Reactive Problem Solving vs Modeling damage
grows rapidly with time the longer the error
goes undiscovered, the more useless and damaging
work based on the error will be done when the
error is discovered, it and all the associated
damage has to be removed the system will then
need therapy to recover the death rate
increases dramatically with late discovery
alternatively, the survival rate increases
dramatically with early discovery "Crude
measures of the right things are better than
precise measures of the wrong things." - from Jim
Clemmer's article, "Strategic Measurements Guide
Change and Improvement"
29Summary of Performance Analysis Techniques
30- Predictive Analytics Benefits
- Predictive analytics provide a practical way to
detect problems and allow early correction as
well as avoid resource saturation conditions. - Simulation provides a practical way to detect
such problems and allow early correction.
Avoiding the use of simulation substantially
increases the risk of failure. - Analytical modeling provides fast and accurate
answers based on existing performance data. It
allows for a variety of what-if scenarios to be
easily crafted to determine the best course of
action when systems are experiencing change. - Statistcal Forecasting and Analysis provides
descriptive and predictive aspects of IT
performance data topology thru the use of
measures of central tendency, variability,
correlation, linear regression, and statistical
pattern recognition.
31SAP-specific Capacity Planning Methodology for CCE
- We want to acquire capacity to provide required
service levels for sustained busy periods.
Typical examples - Month end closing
- Busy daily window (e.g., 0900 to 1100)
- Mondays
- Complete batch window on time to deliver
operational reports or schedule
deliveries/shipment/print picking papers/etc - The best approach is to choose the percentile you
want to satisfy - The 90th percentile of hourly mips across the
month is reflective of busy daily periods - Likewise the 95th percentile reflects the
sustained busy where there is a pronounced
financial systems month end closing effect - In legacy OLTP we often see peak to average
ratios between 1.51 and 21 based on the
definition of peak (e.g, 90th vs 95th) - This really is a view of sustained busy
- No one can afford to buy for absolute peaks (99th
or 100th percentile)
32Capacity Planning for the Newly Virtual
- Three Essential Elements
- measurement to ascertain critical data like IT
resource availability, utilization and usage
patterns - second-level analysis to focus on the long-term
needs of the enterprise rather than the immediate
concern to bump up resources - business realignment to ensure that IT is keeping
pace with business needs, not the other way
around
33Capacity Planning for the Newly Virtual
- Over half (54) of the virtual-server adopters
have experienced a net growth in capacity, while
only 7 reported a net decrease (ESG Research) - Focus on understanding our virtualization
factors - Effect of non-concurrent peaks of multiple
workloads - Follow the sun in a global operation
- Better understanding of these effects can be
gained by looking at the 90th/95th percentiles - Landscape dimensions
- a workload level,
- a platform (processor complex) level,
- a Sysplex / Cluster level
- Server/Lpar level, etc.
- The virtualization analysis will tell us how
much we can over-commit resources - The 95th percentile of the sums vs the sum of the
95th percentiles - It is often the case that we have the ability to
load to 115 with the sum of the 95th percentiles
34Organizational Support
- Institutionalize the process
- The resource reporting and modeling is actually
the easy part of this - The more difficult and more important part of
institutionalizing the process is connecting the
application blueprinting/design process to the
capacity planning process - This creates the understanding of the business
drivers which is key to scaling factors and
calibration - This is also a potential trigger for alerting the
organization to the need for a risk mitigation
plan. For example, step function workload
increases with new workloads which should lead to
a performance testing activity
35Organizational Support for Capacity Planning
- Market the lesser-known benefits of capacity
planning - Strengthened relationships with developers and
end users. Communication, negotiation, and a
sense of joint ownership can all combine to
nurture a healthy, professional relationship
between IT and its customers - Improved communications with suppliers. Involving
key suppliers and support staffs with your
capacity plans can promote effective
communications among these groups - Increased collaboration with other infrastructure
groups. Network services, technical support,
database administration, operations, desktop
support, and even facilities may all play a role
in capacity planning. In order for the plan to be
thorough and effective, all these various groups
must support and collaborate with each other. - Promotion of a culture of strategic planning as
opposed to tactical firefighting. One of the most
significant benefits of developing an overall and
ongoing capacity-planning program is the
institutionalizing of a strategic-planning
culture
36Author/Contact