STATISTICS A BRIEF INTRODUCTION - PowerPoint PPT Presentation

1 / 83
About This Presentation
Title:

STATISTICS A BRIEF INTRODUCTION

Description:

Definitions. Population: ... Square root of the variance: 4.25 cm2 = 2.06 cm ... The word distribution refers to a pattern of variation for a given variable ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 84
Provided by: matc7
Category:

less

Transcript and Presenter's Notes

Title: STATISTICS A BRIEF INTRODUCTION


1
STATISTICS A BRIEF INTRODUCTION
2
Why Learn AboutStatistics?
  • Statistics provides tools that are used in
  • Quality control
  • Research
  • Measurements
  • Sports

3
In This Course
  • We will use some of these tools in the lab, so
    will introduce now some
  • Ideas
  • Vocabulary
  • A few calculations

4
Variation
  • There is variation in the natural world
  • People vary
  • Measurements vary
  • Plants vary
  • Weather varies

5
Variation Cont
  • Variation among organisms is the basis of natural
    selection and evolution

6
Example
  • 100 people take a drug and 75 of them get better
  • 100 people dont take the drug but 68 get better
    without it
  • Did the drug help?

7
Variability Is A Problem
  • There is variation in peoples response to the
    illness
  • There is variation in peoples response to the
    drug
  • So its difficult to figure out if the drug
    helped
  • Statistics helps with this type of problem

8
Statistics
  • Provides mathematical tools to help arrive at
    meaningful conclusions in the presence of
    variability

9
Statistics Cont
  • Might help researchers decide if a drug is
    helpful or not
  • This is a more advanced application of statistics
    than we will get into

10
Statistics Cont
  • Lets begin with some basic vocabulary

11
Definitions
  • Population
  • Entire group of events, objects, results, or
    individuals, all of whom share some unifying
    characteristic

12
Populations
  • Examples
  • All of a persons red blood cells
  • All the enzyme molecules in a test tube
  • All the college graduates in the U.S.

13
Sample
  • Sample Portion of the whole population that
    represents the whole population

14
Sample Cont
  • Example It is virtually impossible to measure
    the level of hemoglobin in every cell of a
    patient
  • Rather, take a sample of the patients blood and
    measure the hemoglobin level

15
Population
SAMPLE
16
More About Samples
  • Representative sample sample that truly
    represents the variability in the population --
    good sample

17
Representative Sample
  • A sample is random if all members of the
    population have an equal chance of being drawn
  • A sample is independent if the choice of one
    member does not influence the choice of another
  • Samples need to be taken randomly and
    independently in order to be representative

18
Sampling
  • How we take a sample is critical and often
    complex
  • If sample is not taken correctly, it will not be
    representative

19
Example
  • How would you sample a field of corn?
  • Think about how to get a good sample

20
Variables
  • Variables
  • Characteristics of a population (or a sample)
    that can be observed or measured
  • Called variables because they can vary among
    individuals

21
Variables
  • Examples
  • Blood hemoglobin levels
  • Activity of enzymes
  • Test scores of students

22
Variables Cont
  • A population or sample can have many variables
    that can be studied
  • Example
  • Same population of six year old children can be
    studied for
  • Height
  • Shoe size
  • Reading level
  • Etc.

23
Data
  • Data Observations of a variable (singular is
    datum)
  • May or may not be numerical
  • Examples
  • Heights of all the children in a sample
    (numerical)
  • Lengths of insects (numerical)
  • Pictures of mouse kidney cells (not numerical)

24
Always Uncertainty
  • Even if you take a sample correctly, there is
    uncertainty when you use a sample to represent
    the whole population
  • Various samples from the same population are
    unlikely to be identical
  • So, need to be careful about drawing conclusions
    about a population, based on a sample there is
    always some uncertainty

25
Sample Size
  • If a sample is drawn correctly, then, the larger
    the sample, the more likely it is to accurately
    reflect the entire population
  • If it is not done correctly, then a bigger sample
    may not be any better
  • How does this apply to the corn field?

26
Inferential Statistics
  • A branch of statistics
  • Wont talk about it much
  • Deals with tools to handle the uncertainty of
    using a sample to represent a population
  • Helps with problems like the drug study,
    mentioned earlier

27
Descriptive Statistics
  • Chapter 11 in your textbook
  • Descriptive statistics is one area within
    statistics
  • The is the type of statistics we will use

28
Descriptive Statistics
  • Provides tools to DESCRIBE, organize and
    interpret variability in our observations of the
    natural world

29
Example Problem
  • In a quality control setting, 15 vials of product
    from a batch are tested. What is the sample?
    What is the population?
  • In an experiment, the effect of a carcinogenic
    compound was tested on 2000 lab rats. What is
    the sample? What is the population?

30
Example Problem Cont
  • A clinical study of a new drug was tested on
    fifty patients. What is the sample? What is the
    population?

31
Answers
  • 15 vials, the sample, were tested for QC. The
    population is all the vials in the batch.
  • The sample is the rats that were tested. The
    population is probably all lab rats.
  • The sample is the 50 patients tested in the
    trial. The population is all patients with the
    same condition.

32
Example Problem
  • An advertisement says that 2 out of 3 doctors
    recommend Brand X.
  • What is the sample? What is the population?
  • Is the sample representative?
  • Does this statement ensure that Brand X is better
    than competitors?

33
Answer
  • Many abuses of statistics relate to poor
    sampling. The population of interest is all
    doctors. No way to know what the sample is. The
    sample could have included only relatives of
    employees at Brand X headquarters, or only
    doctors in a certain area. Therefore the
    statement does not ensure that the majority of
    doctors recommend Brand X. It certainly does not
    ensure that Brand X is best.

34
Describing Data Sets
  • Draw a sample from a population
  • Measure values for a particular variable
  • Result is a data set

35
Data Sets
  • Individuals vary, therefore the data set has
    variation
  • Data without organization is like letters that
    arent arranged into words

36
Data Sets Cont
  • Numerical data can be arranged in ways that are
    meaningful or that are confusing or deceptive

37
Descriptive Statistics
  • Provides tools to organize, summarize, and
    describe data in meaningful ways
  • Example
  • Exam scores for a class is the data set
  • What is the variable of interest?
  • We can summarize the data with the class
    average, what does this tell you?

38
Descriptive Statistics Cont
  • A measure that describes a data set, such as the
    average, is sometimes called a statistic
  • Average gives information about the center of the
    data

39
Median And Mode
  • Two other statistics that give information about
    the center of a set of data
  • Median is the middle value
  • Mode is most frequent value

40
Measures OfCentral Tendency
  • Measures that describe the center of a data set
    are called Measures of Central Tendency
  • Mean, median, and the mode

41
Hypothetical Data Set
  • 2 5 6 7 8 3 9 3 10 4 7 4 6 11 9
  • Simplest way to organize them is to put in
    order
  • 2 3 3 4 4 5 6 6 7 7 8 9 9 10 11
  • By inspection they center around 6 or 7

42
Mean
  • Mean is basically the same as the average
  • Add all the numbers together and divide
  • by number of values
  • 2 3 3 4 4 5 6 6 7 7 8 9 9 10 11
  • What is the mean for this data set?

43
Nomenclature
  • Mean 6.3 read X bar
  • The observations are called X1, X2, etc.
  • There are 15 observations in this example, so the
    last one is X15
  • Mean Xi
  • ___n
  • Where n number of values

44
Example
  • Data set
  • 2 3 3 4 5 6 7 8 9
  • What is the mode?
  • What is the median?

45
Mean Of A Population Versus The Mean Of A Sample
  • Statisticians distinguish between the mean of a
    sample and the mean of a population
  • The sample mean is
  • The population mean is
  • It is rare to know the population mean, so the
    sample mean is used to represent it

46
Dispersion
  • Data sets A and B both have the same average
  • A 4 5 5 5 6 6
  • B 1 2 4 7 8 9
  • But are not the same
  • A is more clumped around the center of the
    central value
  • B is more dispersed, or spread out

47
Measures Of Dispersion
  • Measures of central tendency do not describe how
    dispersed a data set is
  • Measures of dispersion do they describe how much
    the values in a data set vary from one another

48
Measures Of Dispersion
  • Common measures of dispersion are
  • Range
  • Variance
  • Standard deviation
  • Coefficient of variation

49
Calculations Of Dispersion
  • Measures of dispersion, like measures of central
    tendency, are calculated
  • Range is the difference between the lowest and
    highest values in a data set

50
Calculations Of Dispersion Cont
  • Example
  • 2 3 3 4 4 5 6 6 7 7 8 9 9 10 11
  • Range 11-2 9 or, 2 to 11
  • Range is not particularly informative because it
    is based only on two values from the data set

51
Calculating Variance And Standard Deviation
  • Variance and standard deviation measure of the
    average amount by which each observation varies
    from the mea
  • Example
  • 4cm 5cm 6cm 7cm 7cm 7cm 9cm 11cm
  • This is a data set, the lengths of 8 insects

52
Calculating Variance And Standard Deviation
  • 4cm 5cm 6cm 7cm 7cm 7cm 9cm 11cm
  • The mean is 7 cm
  • How much do they vary from one another?
  • Intuitively might see how much each point varies
    from the mean
  • This is called the deviation

53
Calculation OfDeviations From Mean
  • 4cm 5cm 6cm 7cm 7cm 7cm 9cm 11cm
  • Value-Mean Deviation
  • in cm
  • (4-7) - 3
  • (5-7) - 2
  • (6-7) - 1
  • (7-7) 0
  • (7-7) 0
  • (7-7) 0
  • (9-7) 2
  • (11-7) 4

54
Calculation OfDeviations From Mean Cont
  • Value-Mean
    Deviation
  • (in cm)
  • (4-7) - 3
  • (5-7) - 2
  • (6-7) - 1
  • (7-7) 0
  • (7-7) 0
  • (7-7) 0
  • (9-7) 2
  • (11-7) 4
  • Sum of deviations 0

55
Calculation OfDeviations From Mean Cont
  • Sum of the deviations from the mean is always
    zero
  • Therefore, cannot use the average deviation to
    describe the dispersion in the data set
  • Therefore, mathematicians decided to square each
    deviation so they will get positive numbers

56
Calculation OfDeviations From Mean Cont
  • Value-Mean Deviation Squared Deviation
  • (in cm)
  • (4-7) - 3 9 cm2
  • (5-7) - 2 4 cm2
  • (6-7) - 1 1 cm2
  • (7-7) 0 0
  • (7-7) 0 0
  • (7-7) 0 0
  • (9-7) 2 4 cm2
  • (11-7) 4 16 cm2
  • total squared deviation sum of
    squares 34 cm2

57
Variance
  • Total squared deviation (sum of squares) divided
    by the number of measurements
  • 34 cm2 4.25 cm2
  • 8

58
Standard Deviation
  • Square root of the variance
  • 4.25 cm2 2.06 cm
  • Note that the SD has the same units as the data
  • Note also that the larger the variance and SD,
    the more dispersed are the data

59
Variance And SDOf Population Vs Sample
  • Statisticians distinguish between the mean and SD
    of a population and a sample
  • The variance of a population is called sigma
    squared, s2
  • Variance of a sample is S2

60
Variance And SDOf Population Vs Sample Cont..
  • The standard deviation of a population is called
    sigma, s
  • Standard deviation of a sample is S or SD

61
Standard Deviation Of A Sample
(Xi - )2 n -1
62
Example Problem
  • A biotechnology company sells cultures of E.
    coli. The bacteria are grown in batches that are
    freeze dried and packaged into vials. Each vial
    is expected to have 200 mg of bacteria. A QC
    technician tests a sample of vials from each
    batch and reports the mean weight and SD.

63
Example Problem Cont
  • Batch Q-21 has a mean weight of 200 mg and a SD
    of 12 mg. Batch P-34 has a mean weight of 200 mg
    and as SD of 4 mg. Which lot appears to have
    been packaged in a more controlled fashion?

64
Answer
  • The SD can be interpreted as an indication of
    consistency. The SD of the weights of Batch P-34
    is lower than of Batch Q-21. Therefore, the
    weights for vials for Batch P-34 are less
    dispersed than those for Batch Q-21 and Batch
    P-34 appears to have been better controlled.

65
FrequencyDistributions
  • So far, talked about calculations to describe
    data sets
  • Now talk about graphical methods

66
The Weights Of 175 Field Mice
  • (in grams)
  • 19 22 20 24 22 19 27 20 21 22 20 22 24 24 21 2
    5 19 21 20 23 25 22 19 17 20 20 21 25 21 22 27 22
    19 22 23 22 25 22 24 23 20 21 22 23 21 24 19 21 22
    22 25 22 23 20 23 22 22 26 21 24 23 21 25 20 23 2
    0 21 24 23 18 20 23 21 22 22 25 21 23 22 24 20 21
    23 21 19 21 24 20 22 23 20 22 19 22 24 20 25 21 22
    22 24 21 22 23 25 21 19 19 21 23 22 22 24 21 23 2
    2 23 28 20 23 26 21 22 24 20 21 23 20 22 23 21 19
    20 26 22 20 21 22 23 24 20 21 23 22 24 21 23 22 2
    4 21 22 24 20 22 21 23 26 21 22 23 24 21 23 20 20
    21 25 22 20 22 21 21 23 22

67
The Weights Of 175 Field Mice Cont
  • This table of raw data is hard to interpret
  • Begin by making a frequency table

68
FrequencyDistribution Table Of The Weights Of
Field Mice
  • Weight Frequency
  • (in grams)
  • 1
  • 11
  • 25
  • 25
  • 34
  • 40
  • 27
  • 19
  • 10
  • 4
  • 2
  • 1

69
Frequency Table
  • Tells us that most mice have weights in the
    middle of the range, a few are lighter or heavier
  • The word distribution refers to a pattern of
    variation for a given variable

70
Frequency Table Cont
  • It is important to be aware of patterns, or
    distributions, that emerge when data are
    organized by frequency
  • The frequency distribution can be illustrated as
    a frequency histogram

71
Frequency Histogram
  • X axis is units of measurement, in this example,
    weight in grams
  • Y axis is the frequency of a particular value
  • For example, 11 mice weighed 19 g
  • The values for these 11 mice are illustrated as a
    bar

72
Frequency Histogram Cont
  • Note that when the mouse data were collected, a
    mouse recorded as 19 grams actually weighed
    between 18.5 g and 19.4 g.
  • Therefore the bar spans an interval of 1 gram

73
FIRST FOUR BARS
F R E Q U E N C Y
17 18 19 20
WEIGHTS IN GRAMS
74
Constructing AFrequency Histogram
  • Divide the range of the data into intervals
  • It is simplest to make each interval (class) the
    same width
  • No set rule as to how many intervals to have
  • For example, length data might be 1-9 cm, 10-19
    cm, 20-29 cm and so on

75
Constructing AFrequency Histogram Cont
  • Count the number of observations that are in each
    interval
  • Make a frequency table with each interval and the
    frequency of values in that interval
  • Label the axes of a graph with the intervals on
    the X axis and the frequency on the Y axis

76
Constructing AFrequency Histogram Cont
  • Draw in bars where the height of a bar
    corresponds to the frequency of the value
  • Center the bars above the midpoint of the class
    interval
  • For example, if the interval is 0-9 cm, then the
    bar should be centered at 4.5 cm

77
NormalFrequency Distribution
  • If weights of very many lab mice were measured,
    would likely have a frequency distribution that
    looks like a bell shape, also called the normal
    distribution

78
NormalDistribution
F R E Q U E N C Y
WEIGHT
79
Normal Distribution
  • Very important
  • Examples
  • Heights of humans
  • Measure same thing over and over, measurements
    will have this distribution

80
Calculations AndGraphical Methods
  • Related
  • The center of the peak of a normal curve is the
    mean, the median and the mode
  • Values are evenly spread out on either side of
    that high point

81
Calculations AndGraphical Methods Cont
  • The width of the normal curve is related to the
    SD
  • The more dispersed the data, the higher the SD
    and the wider the normal curve
  • Exact relationship is in text, not go into it
    this semester

82
Example Problem
  • A technician customarily performs a certain
    assay. The results of 8 typical assays are
  • 32.0 mg 28.9 mg 23.4 mg 30.7 mg
  • 23.6 mg 21.5 mg 29.8 mg 27.4 mg
  • If the technician obtains a value of 18.1 mg,
    should he be concerned? Base your answer on
    estimation.
  • Perform statistical calculations to see if the
    answer if out of the range of two SDs.

83
Answer
  • The average appears to be in the mid-twenties and
    hovers around 5. Therefore, 18.1 mg appears a
    bit low.
  • Mean 27.16 mg, SD 3.87 mg. The mean 2SD
    is 19.4 mg, so 18.1 mg appears to be outside the
    range and should be investigated
Write a Comment
User Comments (0)
About PowerShow.com