EIN 4905/ESI 6912 Decision Support Systems Excel presentation

About This Presentation

Transcript and Presenter's Notes

Title: EIN 4905/ESI 6912 Decision Support Systems Excel

1
Spreadsheet-Based Decision Support Systems
Chapter 7 Statistical Analysis with Excel
Prof. Name
name_at_email.com Position
(123) 456-7890 University Name
2
Overview

7.1 Introduction
7.2 Understanding Data
7.3 Relationships in Data
7.4 Distributions
7.5 Summary

3
Introduction

Performing basic statistical analysis of data
using Excel functions
Statistical features of the Data Analysis ToolPak
Trend curves for analyzing data patterns
Basic linear regression techniques in Excel
Several different distribution functions in Excel

4
Understanding Data

Statistical Functions
Descriptive Statistics
Histograms

5
Statistical Functions

AVERAGE
Finds the mean of a set of data.
AVERAGE(range or range_name)
MEDIAN
Finds the middle number in a list of sorted data.
MEDIAN(range or range_name)
STDEV
Finds the standard deviation of a set of data.
This is equal to the square root of the variance,
which measures the difference between the mean of
the data set and the individual values.
STDEV.P(range or range_name)
STDEV.S(range or range_name)

6
Figures 7.1 and 7.2
7
Figures 7.3 and 7.4
8
Data Analysis ToolPak

An Excel Add-In which includes several
statistical analysis techniques
To ensure that it is an active Add-in
Display Excel Options dialog box
Select Options from the list of options in the
File tab.
Select the Add-Ins tab on the left side of the
dialog box.
Select Analysis ToolPak listed on the Add-ins
window.

9
Descriptive Statistics

Provides a list of statistical information about
your data set including
Mean
Median
Standard deviation
Variance
Click on Data gt Analysis gt Data Analysis command
to display the Data Analysis dialog box.
Choose the Descriptive Statistics option and
click OK.

10
Descriptive Statistics (contd)

The Input Range refers to the location of the
data set.
Check the option button Columns or Rows to
indicate how your data is grouped.
If there are labels in the first row of each
column of data, then check the Labels in First
Row box.
The Output Range refers to where the results of
the analysis will be displayed in the current
worksheet.
Check the Summary Statistics box to calculate the
most commonly used statistics.

11
Figure 7.7

Quarterly stock returns for three different
companies are recorded. We want to know
Average stock return
Variability of stock returns
Which quarters had the highest and lowest stock
returns

12
Figures 7.8 and 7.9
13
Figure 7.10

Almost all of the data points lie between 2s and
2s from the mean.
Outliers are data that are inconsistent with the
main pattern of data.

14
Figure 7.11

The standard deviation is used to identify
outliers in a data set.

15
Figure 7.12

Conditional Formatting with the Formula Is option
is used to identify outliers.
Select the column of values in the data set and
fill in the Conditional Formatting dialog box to
highlight outlier points.

16
Figure 7.13

The cell that holds an outlier is highlighted.

17
More Descriptive Statistics

Confidence Level for Mean
The mean is calculated using the specified
confidence level (for example, 95 or 99), the
standard deviation, and the size of the sample
data.
The confidence level and calculated mean are then
added to the analysis report.
You can compare the actual mean to this
calculated mean based on the specified confidence
level.
Kth Largest
Gives the largest ranked data value for a
specified value of k.
For k 1, the maximum data value would be
returned.
Kth Smallest
Gives the smallest ranked data value for a
specified value of k.
For k 1, the minimum data value would be
returned.

18
Descriptive Statistics Functions

PERCENTILE.INC
Returns a value for which a desired percentile k
of the specified data_set falls below.
PERCENTILE.INC(data_set, k)
For example, for the MSFT data, the value for
which 95 of the data falls below is
PERCENTILE.INC(B4B27,0.95) 0.108
PERCENTILE.EXC
Excludes the value of k-th percentile from the
calculations
PERCENTILE.EXC (data_set, k)
For the MSFT data, the value for which 95 of the
data falls below is
PERCENTILE.EXC(B4B27,0.95) 0.135

19
Descriptive Statistics Functions (contd)

PERCENTRANK.INC
Returns the percentile of the data_set which
falls below a given value.
PERCENTRANK.INC(data_set, value)
For example, percent of the MSFT data falls below
the value 0.108, inclusive of 0.108 is
PERCENTRANK.INC(B4B27, 0.108) 0.95, or 95
PERCENTRANK.EXC
Calculates the same percentile, exclusive of the
value of k.
PERCENTRANK.EXC(data_set, value)
For example, percent of the MSFT data falls below
the value 0.135, exclusive of 0.135 is
PERCENTRANK.EXC(B4B27, 0.135) 0.95, or 95

20
Histograms

Histograms calculate the number of occurrences,
or frequency, with which values in a data set
fall into various intervals.
Choose the Histogram option from the Analysis
ToolPak list.

21
Histograms (contd)

The Input Range is the range of the data set.
The Bin Range is used to specify the location of
the bin values.
Bins are the intervals into which values can
fall they can be defined by the user or can be
evenly distributed among the data by Excel.
The Output Range is the location of the output,
or the frequency calculations for each bin.
The chart options include a simple Chart Output
(the actual histogram), Cumulative Percentage for
each bin value, and a Pareto organization of the
chart.

22
Figures 7.15 and 7.16
23
Figures 7.17 and 7.18

To create your own bin values, make a list of
upper bounds for each interval.

24
Figure 7.19
25
Histograms (contd)

To change the format of a Histogram
Click on the histogram to activate the Chart
Tools contextual tabs.
Use the commands listed on these tabs to change
the design, layout and format of the histogram.

26
Histograms (contd)

There are four basic shapes to a histogram
Symmetric has peaks and dips with equal
amplitude
A curve with only one peak is also symmetric
that is, there is a central high part and almost
equal lower parts to the left and right of this
peak.
Positively skewed has a peak on the left and
many lower points (stretching) to the right.
Negatively skewed has a peak on the right and
many lower points (stretching) to the left.
Multiple peaks imply that more than one source,
or population, of data is being evaluated.

27
Relationships in Data

Trend Curves
Regression

28
Data Relationships

Relationships in data are usually identified by
comparing two variables the dependent variable
and the independent variable.
The dependent variable is the variable we are
most interested in. By understanding its current
behavior we can better predict its future
behavior.
The independent variable is the variable we use
as the comparison in order to make this
prediction.

29
Trend Curves

Trend curves are used to graph and analyze these
relationships between data.
Trend curves graph the data with
The independent variable on the x-axis
The dependent variable on the y-axis
To add a trend curve to your chart
Click on the data points in an XY Scatter chart
to activate Chart Tools contextual tabs.
Click on the Chart Tools Layout gt Analysis gt
Trendline command.
Select a trend curves from the trendlines options
listed.

30
Trend Curves (contd)

There are six types of trend curves which Excel
can model
Exponential
Linear
Logarithmic
Polynomial
Power
Moving Average

31
Trend Curves (contd)

Double click on a trendline to activate the
Format Trendline dialog box.
We can modify
The type of the trendline by selecting one of the
options listed.
The trendlines name.
We can specify a period forward or backward for
which we want to predict the behavior of our
dependent variable.

32
Linear Trend Curves

Number of Units Produced each month and the
corresponding Monthly Plant Cost are recorded.
The company needs to estimate plant costs based
on the planned production amounts.
The dependent variable is therefore the Monthly
Plant Cost and the independent variable is the
Units Produced.

33
Figure 7.25

Begin this analysis by making an XY Scatter chart
of the data.

34
Figure 7.26

Right-click on any of the data points and choose
Add Trendline from the short-cut menu.
The Format Trendline dialog box appears.
Select Linear from the Types listed.
Select Display Equation on Chart checkbox.

35
Figure 7.27

The trendline and the equation are then added to
the chart.

36
Figure 7.28

Use the displayed equation to predict future
values.
Check the accuracy of the equation by calculating
the error from the known data.
Linear trends have the relationship y ax - b

37
Figure 7.29

Copy the formula for Predicted Cost to the rest
of the rows to calculate the predicted monthly
costs.

38
Exponential Trend Curves

Sales data for ten years is recorded.
We want to predict sales for the next few years.
The independent variable is Years and our
dependent variable is Sales.

39
Figure 7.31

Exponential trends have the following
relationship
y ae(bx) or
y aEXP(bx)
Build a XY Scatter chart of the data.
Right-click on a data point to add the trendline.
Choose the Exponential curve to fit the data.

40
Figures 7.32 and 7.33
41
Figure 7.34

We use the formula to predict sales values for
future years.
However, the Exponential trend curve has a
sharply increasing slope that may not be accurate
for many situations.

42
Power Trend Curves

We are given yearly Production values and yearly
Unit Cost for production.
We want to determine the relationship between
Unit Cost and Production in order to predict
future Unit Costs.

43
Figure 7.36

Power trends have the relationship y axb
Begin by creating the XY Scatter chart.
Right-click on a data point to add a trendline.
Choose a Power curve to fit the data.

44
Figures 7.37 and 7.38
45
Regression Analysis

We can use some regression analysis parameters to
ensure that the relationships we have chosen for
our data are good fits.
These parameters include
R-Squared value
Standard error
Slope
Intercept

46
R-Squared Value

The R-Squared value measures the amount of
influence the independent variable has on the
dependent variable.
The closer the R-Squared value is to 1, the
stronger the relationship is between the
independent and dependent variables.
If the R-Squared value is closer to 0, then there
may not be a relationship between these two
variables.

47
Figure 7.39

We fit a Linear trendline to the Monthly Plant
Cost per Units Produced chart.
The R-Squared value is 0.8137, which is fairly
close to 1, implying a good fit.

48
Figure 7.40

We fit an Exponential trendline to the Sales per
year chart.
The R-Squared value is 0.9828, which is fairly
close to 1, implying a sound fit.

49
Figure 7.41

We fit a Power trendline to the Unit Cost per
Cumulative Production chart.
The R-Squared value is 0.9062, which is fairly
close to 1, implying a good fit.

50
Figure 7.42

The RSQ Excel function can calculate the
R-squared value from a set of data.
RSQ(y_range, x_range)
Note that this function only works with Linear
trend curves.

51
Standard Error

The standard error measures the accuracy of any
predictions made.
It can be calculated in Excel using the STEYX
function
STEYX(y_range, x_range)
This function can also only be used for Linear
trend curves.

52
Slope and Intercept

Two Excel functions can be used with a linear
regression line of a collection of data.
SLOPE function
SLOPE(y_range, x_range)
INTERCEPT function
INTERCEPT(y_range, x_range)

53
Distributions

Many distributions have Excel functions
associated with them.
These functions are equivalent to using
distribution tables.
That is, given certain parameters of a set of
data for a particular distribution, you would
look at a distribution table to find the
corresponding area from the distribution curve.
Some common distributions are
Normal
Exponential
Uniform
Binomial
Poisson
Beta
Weibull

54
Normal Distribution

The parameters for this distribution are simply
the value we are interested in finding the
probability for, and the mean and standard
deviation of the set of data.
The function we use with the Normal distribution
is NORM.DIST
NORM.DIST(x, mean, std_dev, cumulative)

55
Normal Distribution (contd)

The cumulative parameter will be seen in many
Excel distribution functions.
This parameter can take the values True or False
to determine if you want the value returned from
the cumulative distribution function or the
probability density function, respectively.
The cumulative distribution function (cdf) will
find the probability that a value in the data set
is less than or equal to x.
The probability density function (pdf) will find
the probability that a value is exactly equal to
x.

56
Figure 7.45

Annual drug sales at a local drugstore are
Normally distributed with a mean of 40,000 and
standard deviation of 10,000.
The probability that the actual sales for the
year are 42,000 is 0.58, or 58.

57
Figure 7.46

What is the probability that annual sales will be
between 35,000 and 49,000?
To find this value, we will subtract the cdf
values for these two bounds.
NORM.DIST(49000, 40000, 10000, True)
NORM.DIST(35000, 40000, 10000, True)
This will return a 0.51 probability, or 51
chance.

58
Standard Normal Distribution

The Standard Normal distribution function is a
Normal distribution function with mean 0 and the
standard deviation 1.
The STANDARDIZE function will convert the x value
from a data set with mean not equal to 0 and a
standard deviation not equal to 1 into a value
which does assume a mean of 0 and a standard
deviation of 1.
STANDARDIZE(x, mean, std_dev)
The resulting standardized value is then used as
the main parameter in the NORM.S.DIST function
NORM.S.DIST(standardized_x, cumulative)

59
Figure 7.47

Consider the same example used previously to find
the probability that a drugstores annual sales
are 42,000.

60
Uniform Distribution

The Uniform distribution does not actually have a
corresponding Excel function.
A simple formula can also be used to model this
discrete distribution.
1 / (b a)
Given that a value x is Uniformly distributed
between a and b, we can use this formula to
determine the probability that x will take an
integer value in this interval.

61
Figure 7.48

Consider any values for a and b, then use the
formula to calculate the Uniform value.

62
Poisson Distribution

The Poisson distribution has only one parameter,
the distribution mean.
The function we use for this distribution is
POISSON.DIST
POISSON.DIST(x, mean, cumulative)
The value returned by the Poisson distribution is
the probability that the number events which
occur within a time interval is either between 0
and x (cdf), or equal to x (pdf).

63
Figure 7.49

For example, consider a bakery which serves an
average of 20 customers per hour.
Find the probability that at most 35 customers
will be served in the next two hours.

64
Exponential Distribution

The Exponential distribution has only one
parameter lambda 1 / mean of the data set.
The function we use for this distribution is
EXPON.DIST
EXPON.DIST(x, lambda, cumulative)
The Exponential distribution is commonly used for
modeling interarrival times.

65
Figure 7.50

Let us use the same example with the bakery data.
Arrival rate is said to be 20 customers per hour.
Interarrival mean, or the Exponential mean, is 1
/ arrival rate. Therefore, for this example, the
interarrival mean is 1/20 hours per customer
arrival.
To find the probability that a customer arrives
in 10 minutes, we would set
x 10/60 0.17 hours
lambda 1/(1/20) 20 hours
EXPON.DIST(0.17, 20, True)

66
Binomial Distribution

The Binomial distribution has the following
parameters the number of trials and the
probability of a success.
We are trying to determine the probability that
the number of successes is less than or equal to
(using cdf) or equal to (pdf) some x value.
The function we use for this distribution is
BINOM.DIST
BINOM.DIST(x, trials, prob_success, cumulative)

67
Figure 7.51

Suppose a survey shows that 40 percent of people
pay more attention to ads in the newspaper, and
60 percent pays more attention to ads on
television.
What is the probability that out of 100 people
surveyed, 50 of them respond more to ads on
television?

68
Beta Distribution

The Beta distribution has the following
parameters alpha, beta, A, and B.
Alpha and beta are determined from the data set
A and B are optional bounds on the x value for
which you want the Beta distribution value
The function we use for this distribution is
BETA.DIST
BETA.DIST(x, alpha, beta, A, B)
If A and B are omitted, then a standard
cumulative distribution is assumed and they are
given the values 0 and 1, respectively.

69
Figure 7.52

Determine the probability that a team can
complete a project in 10 days. Estimated total
time needed is1 to 2 weeks these estimates will
be the bound values, or the A and B parameters.
Use a mean and standard deviation of 12 and 3
days to compute the alpha and beta parameters.

70
Weibull Distribution

The Weibull distribution has two parameters
alpha and beta.
The function we use for this distribution is
WEIBULL.DIST
WEIBULL.DIST(x, alpha, beta, cumulative)
The Weibull distribution is most commonly used to
model reliability functions.

71
Figure 7.53

On average, a lightbulb will last 1,200 hours,
with a standard deviation of 100 hours. We can
use these values to calculate alpha and beta.
We can now use the WEIBULL distribution to
determine the probability that a lightbulb will
be reliable for 55 days 1,320 hours.

72
Inverse Functions

When we build simulation models, we need to
generate random numbers in Excel which are within
a given distribution.
To accomplish this we must use the inverse
function of the corresponding distribution
function.
An inverse function returns the inverse of the
cumulative probability function.
These functions are listed under the Formulas gt
Function Library gt More Functions drop-down menu
on the Ribbon.
Some of the inverse functions of more common
distributions are BETA.INV, BINOM.INV,
LOGNORM.INV, NORM.INV.

73
Figure 7.54

The format of the inverse functions is
DIST.INV(probability, distribution_parameters)
The probability parameter is a number between 0
and 1 associated with the given distribution.
We will use the RAND function as the value for
this parameter to generate a number between 0 and
1.

74
Summary

The Analysis ToolPak is an Excel Add-In that
includes statistical analysis techniques such as
Descriptive Statistics, Histograms, Exponential
Smoothing, Correlation, Covariance, Moving
Average, and others.
The Descriptive Statistics option provides a list
of statistical information about a data set,
including the mean, median, standard deviation,
and variance.
Histograms calculate the number of occurrences,
or frequency, which values in a data set fall
into various intervals.
Relationships in data are usually identified by
comparing the dependent variable and the
independent variable.
There are six basic trend curves that Excel can
model Exponential, Linear, Logarithmic,
Polynomial, Power, and Moving Average.
Some of the more common distributions that can be
recognized when performing a statistical analysis
of data are the Normal, Exponential, Uniform,
Binomial, Poisson, Beta, and Weibull
distributions.
Inverse distribution functions such as BETA.INV,
BIONOM.INV, LOGNORM.INV and NORM.INV are used in
simulation models to generate random numbers from
a specific distribution.

EIN 4905/ESI 6912 Decision Support Systems Excel PowerPoint PPT Presentation