Title: MBA 7020 Business Analysis Foundations Descriptive Statistics June 20, 2005
1MBA 7020Business Analysis Foundations
Descriptive StatisticsJune 20, 2005
2Agenda
Confidence Interval
1. Measures of Central Location Mean, Median,
Mode 2. Measures of Variation The Range,
Variance and Standard Deviation 3. Measures of
Association Covariance and Correlation
3Describing Data Summary Measures
1. Measures of Central Location Mean,
Median, Mode 2. Measures of Variation
The Range, Variance and Standard
Deviation 3. Measures of Association
Covariance and Correlation
4Mean
1. It is the Arithmetic Average of data
values 2. The Most Common Measure of
Central Tendency 3. Affected by Extreme Values
(Outliers)
Sample Mean
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Mean 5
Mean 6
5Median
- Important Measure of Central Tendency
- In an ordered array, the median is the middle
number. - If n is odd, the median is the middle number.
- If n is even, the median is the average of the 2
- middle numbers.
- Not Affected by Extreme Values
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Median 5
Median 5
6Mode
- A Measure of Central Tendency
- Value that Occurs Most Often
- Not Affected by Extreme Values
- There May Not be a Mode
- There May be Several Modes
- Used for Either Numerical or Categorical Data
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14
0 1 2 3 4 5 6
No Mode
Mode 9
7Agenda
Confidence Interval
1. Measures of Central Location Mean, Median,
Mode 2. Measures of Variation The Range,
Variance and Standard Deviation 3. Measures of
Association Covariance and Correlation
8Measures of Variability
Variation
Variance / Standard Deviation
Coefficient of Variation
Range / Percentiles
Population
Sample
9 The Range
- Measure of Variation
-
- Difference Between Largest Smallest
- Observations
- Range
-
- Ignores How Data Are Distributed
Range 12 - 7 5
Range 12 - 7 5
7 8 9 10 11 12
7 8 9 10 11 12
10 Percentile Scores
- Arrange data in ascending order.
- The middle number is the median.
- The number halfway to the median is the first
quartile. - The number halfway past the median is the 3rd
quartile. - A number with (no more than) 66 of the values
less than it is the 66th percentile, and so forth.
11Box Plot
Smallest
Largest
Q1
Q3
Median
12 Variance
- Important Measure of Variation
- Shows Variation About the Mean
- For the Population
- For the Sample
For the Population use N in the denominator.
For the Sample use n - 1 in the denominator.
13 Standard Deviation
- Most Important Measure of Variation
- Shows Variation About the Mean
- For the Population
- For the Sample
For the Population use N in the denominator.
For the Sample use n - 1 in the denominator.
14 Sample Standard Deviation
For the Sample use n - 1 in the denominator.
s
Data 10 12 14
15 17 18 18 24
n 8 Mean 16
s
4.2426
15 Comparing Standard Deviations
Data A
Mean 15.5 s 3.338
11 12 13 14 15 16 17 18
19 20 21
Data B
Mean 15.5 s .9258
11 12 13 14 15 16 17 18
19 20 21
Data C
Mean 15.5 s 4.57
11 12 13 14 15 16 17 18
19 20 21
16Agenda
Confidence Interval
1. Measures of Central Location Mean, Median,
Mode 2. Measures of Variation The Range,
Variance and Standard Deviation 3. Measures of
Association Covariance and Correlation
17Coefficient of Variation
- Measure of Relative Variation
- Always a
- Shows Variation Relative to Mean
- Used to Compare 2 or More Groups
- Formula (for Sample)
18Comparing Coefficient of Variation
- Stock A Average Price last year 50
- Standard Deviation 5
- Stock B Average Price last year 100
- Standard Deviation 5
Coefficient of Variation Stock A CV
10 Stock B CV 5
19Shape
- Describes How Data Are Distributed
- Measures of Shape
-
- Symmetric or skewed
Right-Skewed
Left-Skewed
Symmetric
Mean
Median
Mode
Mean
Median
Mode
Median
Mean
Mode
20Agenda
Confidence Interval
Descriptive Summary Measures
21Confidence Interval
- Sample Mean Margin of Error (MOE)
- Called a Confidence Interval
- To Compute Margin of Error, One of Two Conditions
Must Be True - The Distribution of the Population of Incomes
Must Be Normal, or - The Distribution of Sample Means Must Be Normal.
22A Side-Trip Before Constructing Confidence
Intervals
- What is a Population Distribution?
- What is a Distribution of the Sample Mean?
- How Does Distribution of Sample Mean Differ From
a Population Distribution? - What is the Central Limit Theorem?
23Estimating the Population Mean Income of Lexus
Owners
Assume Small Population of Lexus Owners Incomes
(N 200)
24Distribution of N 200 Incomes (Population Mean )
30
75 125 175 225 275 325
25Constructing a Distribution of Samples of Size 5
from N 200 Owners
26Distribution of Sample Mean Incomes (Column 7)
Distribution of Sample Means Near Normal!
27Central Limit Theorem
- Even if Distribution of Population is Not Normal,
Distribution of Sample Mean Will Be Near Normal
Provided You Select Sample of Five or Ten or
Greater From the Population. - For a Sample Sizes of 30 or More, Distribution of
the Sample Mean Will Be Normal, with - mean of sample means population mean, and
- standard error population deviation /
sqrt(n) - Thus Can Use Expression
28Why Does Central Limit Theorem Work?
- As Sample Size Increases
- Most Sample Means will be Close to Population
Mean, - Some Sample Means will be Either Relatively Far
Above or Below Population Mean. - A Few Sample Means will be Either Very Far Above
or Below Population Mean.
29Impact of Side-Trip on MOE
- Determine Confidence, or Reliability, Factor.
- Distribution of Sample Mean Normal from Central
Limit Theorem. - Use a Normal-Like Table to Obtain Confidence
Factor. - Determine Spread in Sample Means (Without Taking
Repeated Samples)
30Drawing Conclusions about a Population Mean
Using a Sample Mean
Select Simple Random Sample
Compute Sample Mean and Std. Dev. For n lt 10,
Sample Bell-Shaped? For n gt10 CLT Ensures Dist of
Normal
Draw Conclusion about Population Mean
31Federal Aid Problem
- Suppose a census tract with 5000 families is
eligible for aid under program HR-247 if average
income of families of 4 is between 7500 and
8500 (those lower than 7500 are eligible in a
different program). A random sample of 12
families yields data on the next page.
32Federal Aid Study Calculations
Representative Sample
7,300 7,700 8,100 8,400 7,800 8,300 8,500
7,600 7,400 7,800 8,300 8,600
33Estimated Standard Error
- Measures Variation Among the Sample Means If We
Took Repeated Samples. - But We Only Have One Sample! How Can We Compute
Estimated Standard Error? - Based on Constructing Distribution of Sample Mean
Slide, Will Estimated Standard Error Be Smaller
or Larger Than Sample Standard Deviation (s)? - Estimated Std. Error ______ than s.
- Estimated Standard Error Expression
34Confidence Factor for MOE
Can Use t-Table Provided Distribution of Sample
Mean is Normal
3595 Confidence Interval
- Interpretation of Confidence Interval
- 95 Confident that Interval 7,983 280
Contains Unknown Population (Not Sample) Mean
Income. - If We Selected 1,000 Samples of Size 12 and
Constructed 1,000 Confidence Intervals, about 950
Would Contain Unknown Population Mean and 50
Would Not. - So Is Tract Eligible for Aid???
36Sample Means versus Sample Proportion
Mean
Proportion of
- Americans Who Believe that Japan is 1 Economic
Power - Circuit Boards with One or More Failed Solder
Connections - African-Americans Who Pass CPA
- Income/Loss
- Time to Complete Loan Papers
- Number of Fat Calories in Burger
- Breaking Strength of Cellular Phone Housing
Means and Proportions Not the Same!!!!
37Similarities and Differences Between Sample
Means and Proportions
- Sample Means
- Computed from Data that Are Measured.
- Estimate Population Means.
- Sample Proportions
- Computed from Data that Are Counted.
- Estimate Population Proportions.
38Drawing Conclusions about a Population Proportion
From a Sample Proportion
Select Simple Random Sample
Compute Sample Proportion Check for Normality -
Table 7.8
Draw Conclusion About Population Proportion, p
39Japan Business Survey
- N 200 Californians
- Yes 116
- No 84
Is Japan the Foremost Economic Power Today?
4090 Confidence Interval on P