1 / 28

Measures of Central Tendency

MARE 250 Dr. Jason Turner

Centracidal Tendencies

The measure of central tendency indicates where

along the measurement scale the sample or

population is located can be determined via

various measures Three most important Mean Med

ian Mode

Mean Girls

Mean most commonly used measure of center sum

of the observations divided by the number of

observations

The Median

"As we were driving, we saw a sign that said

"Watch for Rocks." Martha said it should read

"Watch for Pretty Rocks." I told her she should

write in her suggestion to the highway

department, but she started saying it was a joke

- just to get out of writing a simple letter! And

I thought I was lazy! Jack Handy

The median is typically defined as the middle

measurement in an ordered set of data Separates

the bottom 50 of the data from the top 50

The Mode

Oh, no way - where? Holy crap, he's with a

girl! But he's the guy from Depeche Mode!

That's impossible! Come on, he's in Depeche

Mode! - The Monarch

The mode is typically defined as the most

frequently occurring measurement in a set of

data The mode is useful if the distribution is

skewed or bimodal (having two very pronounced

values around which data are concentrated)

You are so totally skewed!

The mean is sensitive to extreme (very large or

small) observations and the median is

not Therefore you can determine how skewed

your data is by looking at the relationship

between median and mean

Mean is Greater than the Median

Mean and Median are Equal

Mean is Less Than the Median

Resistance Measures

A resistance measure is not sensitive to the

influences of a few extreme observations Median

resistant measure of center Mean not

Resistance of Mean can be improved by using

Trimmed Means a specified percentage of the

smallest and largest observations are removed

before computing the mean Will do something like

this later when exploring the data and evaluating

outliers(their effects upon the mean)

How To on Computer

On Minitab Your data must be in a single

column Go to the 'Stat' menu, and select 'Basic

stats', then 'Display descriptive stats'.

Select your data column in the 'variables' box.

The output will generally go to the session

window, or if you select 'graphical summary' in

the 'graphs' options, it will be given in a

separate window. This will give you a number of

basic descriptive stats, though not the mode.

Measures of Dispersion and Variability

MARE 250 Dr. Jason Turner

Please Disperse!

Alright everyone, disperse immediately. We are

prepared to use force a-- what, what? We're not

prepared, Eddie? Someone call 911! Chief Wiggum

Measure of Dispersion of the Data - an indication

of the spread of measurements around the center

of the distribution 2 of the most frequently

used Range Standard Deviation

The Range

Range - the difference between the highest and

lowest values in the observations This is

useful, but may be misleading when the data has

one or more outliers (single measurements that

are exceptionally large or small relative to the

other data) It is not relative to the central

location Range Max - Min

The Variance

Variance - the average of the squared deviations

from the mean The most widely used measure of

spread, and one that will be used often in

various statistical applications

The Variance

Degrees of Freedom - quantity (n -1) Used

instead of n to provide an unbiased estimate of

the population variance As the sample size (n)

increases (and n approaches N) Value of the

population and sample variance will become more

similar

Standard Deviation

Standard Deviation the positive square root of

the variance Indicates how far (on average)

the observations in the sample are from the mean

of the sample The more variation in a data

set, the larger its standard deviation

Quartiles

Median divides data into 2 equal parts 50

bottom, 50 top Quartiles into quarters 4

equal parts A dataset has 3 quartiles Q1

is the number that divides the bottom 25 from

top 75 Q2 is the median bottom 50 from top

50 Q3 is the number that divides the bottom

75 from top 25

Quartiles

Interquartile Range

Interquartile Range (IQR) the difference

between the first and third quartiles IQR Q3

Q1

The IQR gives you the range of the middle 50 of

the data

Outlier, Outlier

Outliers observations that fall well outside

the overall pattern of the data Requires special

attention May be the result of Measurement or

Recording Error Observation from a different

population Unusual Extreme observation

Pants on Fire

Must deal with outliers (Yes, really!) If error

can delete otherwise judgment call Can use

quartiles and IQR to identify potential outliers

The Outer Limits

Lower and Upper Limits Lower limit is the

number that lies 1.5 IQRs below the first

quartile Lower Limit Q1 - 1.5 IQR Upper

limit is the number that lies 1.5 IQRs above

the first quartile Upper Limit Q3 1.5 IQR

The Outer Limits

If a value is outside the Outer Limits of a

dataset it is an

Outlier

Five-Number Summary

5-Number Summary Min, Q1, Q2, Q3, Max Written

in increasing order Provides information on

Center and Variation Are used to construct

Box-Plots

Boxplots

Boxplot (Box-and-Whisker-Design) based on

the 5-number summary provide graphic display

of the center and variation

Q1

Q2

Q3

Min

Max

0

70

Boxplots

Modified Boxplot includes outliers

Potential Outlier

0

70

Note that Min Max are determine after outliers

are removed!

Boxplots

Boxplots

Boxplots summarize information about the shape,

dispersion, and center of your data. They can

also help you spot outliers. The left edge of the

box represents the first quartile (Q1), while the

right edge represents the third quartile (Q3).

Thus the box portion of the plot represents the

interquartile range (IQR), or the middle 50 of

the observations

Q1

Q2

Q3

Min

Max

0

70

Boxplots

The line drawn through the box represents the

median of the data The lines extending from the

box are called whiskers. The whiskers extend

outward to indicate the lowest and highest values

in the data set (excluding outliers) Extreme

values, or outliers, are represented by dots. A

value is considered an outlier if it is outside

of the box (greater than Q3 or less than Q1) by

more than 1.5 times the IQR

Potential Outlier

0

70

Boxplots

Use the boxplot to assess the symmetry of the

data If the data are fairly symmetric, the

median line will be roughly in the middle of the

IQR box and the whiskers will be similar in

length If the data are skewed, the median may

not fall in the middle of the IQR box, and one

whisker will likely be noticeably longer than the

other