What is statistics PowerPoint PPT Presentation

presentation player overlay
1 / 37
About This Presentation
Transcript and Presenter's Notes

Title: What is statistics


1
What is statistics
  • Statistical training is necessary and important
    for many reasons. In almost any area of work, you
    must be able to read, interpret, and apply the
    results of a statistical analysis of research
    data.
  •  
  • What is Statistics all about? Statistics
    involves the collection, organization,
    interpretation and presentation of data.
  • Studying statistics will help you to understand
    the information and to reach correct conclusion.

2
Example 1.1 The nutrition chart (of sandwich) in
McDonalds menu
3
Chapter 1 Population and Sample
  • An experiment unit(or subject) is the smallest
    entity that is of interest in a statistical
    study.
  • A variable is any characteristics that can be
    measured on each experiment unit in a statistical
    study.
  • An observation is a value that the variable
    assumes for a single unit.
  • The collection of observations assumed by the
    variables in the study is called a data set.
  • For example 1.1
  • experiment unit A McDonalds item
  • Variables calories, protein, carbohydrates,
    total fat, saturated fat,
    cholesterol, and sodium
  • Population all the items in the McDonalds menu

4
  • The population is the collection of all objects
    or items that are of interest in a statistical
    study. The individual objects in the population
    are the experimental units or subjects.
  • A sample is a finite portion (subset) of the
    population that is used to study the
    characteristics of concern in the population. The
    number of objects in this sample is called sample
    size.
  • Bias is a systematic tendency of the sample to
    misrepresent the population
  • A simple random sample (s.r.s) of size n consists
    of n elements chosen from the population in such
    a way that all samples of that size have the same
    chance of being selected

5
For example, (the lottery sampling)
  • 100 balls are mixed thoroughly in a bag. draw 10
    balls randomly from the bag. Try twice
    with/without replacement.

6
  • Example1.2 Now we are interested in the heights
    of MSU students. We measured the heights (and/or
    weight, gender) of all the students and recorded
    as follows X1, X2,. Randomly select 50
    students to measure their heights. Explain the
    concepts above.
  • Population
  • All MSU students
  • Experiment unit
  • A MSU student
  • variables
  • height,( and/or weight, gender)
  • Observations
  • any one of X1,X2,.
  • Data set
  • X1,X2,.
  • Sample
  • the 50 selected students

7
  • A Census is a sample consisting of the entire
    population.  
  • Why dont we always do a census?
  • Time time consuming,
  • Cost cost more,
  • Inaccessible population units study all the
    sunfish in Lake Michigan, say.
  • Destructive testing destroy the unit

8
Statistics terms
  • experimental unit, subject
  • variable
  • population
  • sample
  • census
  • s r s simple random sample

9
Exercise 1.1
  • A stock market investor is interested in oil
    stocks.She collects last years price/earnings
    ratios on ten randomly selected oil stocks. The
    data of ratios are X1,X2,, X10.
  • What is our population?
  • What is the variable? Give one observation.
  • What is our sample and sample size?
  • What is our data set?

10
Exercise1.2
  • Want to know the average height of 2nd grade
    students in East Lansing Public schools. Instead
    of measure all 2nd grade students, we sample the
    60 2nd grade students randomly, The data of
    heights are X1,X2,, X60.
  • What is our population?
  • What is the variable? Give one observation.
  • What is our sample and sample size?
  • What is our data set?

11
Chapter 2 Univariate DataData set for seven
undergraduate students (table 2.1)
12
Data Sets
  • A univariate data set is a data set in which one
    measurement (variable) has been made on each
    experiment unit.
  • A bivariate data set is a data set in which two
    measurements (variables) have been made on each
    experiment unit.
  • A multivariate data set is a data set in which
    several measurements (variables) have been made
    on each experiment unit.

13
Types of Variables
  • A categorical variable (also called a qualitative
    variable) is a variable whose values are
    classifications or categories.
  • For example, Gender Male, Female
  • Occupation Student, doctor,
    teacher,

14
  • A Numerical variable (also called a quantitative
    or measure variable) is a variable whose values
    are numbers obtained by a count or measurement.
    For example weight, height.
  • 1. A discrete variable is a numerical variable
    that can assume a finite number or at most a
    countable infinite number
  • 2. A continuous variable is a numerical
    variable that can take any number on an interval
    of the real number line. For example, height,
    weight.

15
Remark
  • Coding of categorical variable does not make it
    numerical.
  • For example Gender
  • Male -- 0 Female -- 1

16
Types of Data
Discrete Can only take on certain values in an
interval.
Numerical (Quantitative)
Continuous Can take on infinitely many values
in an interval.
Categorical (Qualitative)
17
Exercise 2.1 Classify the following as
categorical or numerical (discrete or
continuous).
  • a. Age of freshmen in MSU
  • b. Faculty rank
  • c. Weight of newborn babies
  • d. Murder rate in a major city
  • e. Number of children in a family
  • f. Brand of television set

18
display categorical variable
  • frequency table
  • bar chart bar graph
  • pie chart

19
Frequency table
  • A table that lists the different categories of
    categorical data and the corresponding
    frequencies (relative frequencies) with which
    they occur
  • A class is one of the categories into which that
  • the qualitative data is classified.
  • Class frequency is the number of observations in
    each class.
  • Class relative frequency is the class frequency
    divided by the total number of observations.
    (observations in each class)

20
Example2.2 Time/CNN telephone poll of 500 adult
Americans Has the amount of crime in your
community increased in the past 5 years?

21
Bar Chart
  • is a picture consisting of horizontal and
    vertical axis with rectangles that represent the
    frequency (relative frequency) of the categories
    of a variable.

22
Bar chart of CNN telephone poll
23
Pie Chart
  • A circle or pie is divided into pieces
    corresponding to the categories of the variable
    so that the size of the slice is proportional to
    the relative frequency of the category.

24
Pie Chart
25
Exercise 2.3 Sampling 100 students from MSU
students to get their level information, here are
the data Fr 12, So 24, Jr 32, Sr 24, Gr 8.draw
bar chart of the sample data
26
Display numerical variable
  • Example 2.3 The cholesterol levels from a sample
    of 62 subjects from the Framingham Heart study
  • 393 353 334 336 327 300 300 308 283 285
    270 270 272
  • 278 278 263 264 267 267 267 268 254 254
    254 256 256
  • 258 240 243 246 247 248 230 230 230 230
    231 232 232
  • 232 234 234 236 236 238 220 225 225 226
    210 211 212
  • 215 216 217 218 200 202 192 198 184 167

27
Display numerical variable
  • dot plot ok for small data set
  • stem leaf
  • Grouped frequency table
  • histogram

28
Example 2.4
  • A psychologist wishes to test a new method to
    improve rote memorization by college students. A
    sample of 20 college students were taught by this
    method and then asked to memorize a list of 100
    word phrases. The following numbers of correct
    word phrases were recorded for the 20 students.
  • 84 59 82 78 74 96 44 76 85 66
  • 77 91 62 54 72 65 84 38 76 70

29
Dotplot
84 59 82 78 74 96 44 76 85 66 77 91
62 54 72 65 84 38 76 70
  • Distribution of a variable specifies the distinct
    values that the variable assumes and how often
    these values occur.

30
Stem and leaf Plot
  • 84 59 82 78 74 96 44 76 85 66
  • 91 62 54 72 65 84 38 76 70
  • Step 1.Chose one leading digit as stem, the
    trailing digit or digits as leaves

31
Stem and leaf Plot.
  • 84 59 82 78 74 96 44 76 85 66
  • 77 91 62 54 72 65 84 38 76 70
  • Step2. List the stem in a column and record the
    leaves
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8 4
  • 9

32
  • 84 59 82 78 74 96 44 76 85 66
  • 77 91 62 54 72 65 84 38 76 70
  • Step3. Fill stem and leaf plot
  • 3 8
  • 4 4
  • 5 9 4
  • 6 6 2 5
  • 7 8 4 6 7 2 6 0
  • 8 4 2 5 4
  • 9 6 1

33
  • 84 59 82 78 74 96 44 76 85 66
  • 77 91 62 54 72 65 84 38 76 70
  • Step4. Reorder the leaves of each stem
  • 3 8
  • 4 4
  • 5 4 9
  • 6 2 5 6
  • 7 0 2 4 6 6 7 8
  • 8 2 4 4 5
  • 9 1 6
  • Step5Indicate the units for stem and leaf

34
Example 2.5
  • The following is the concentration of mercury in
    25 lake trout caught in a major lake
  • 2.2 3.4 3.0 2.6 3.8 1.8 2.8 3.2
    3.7
  • 1.4 2.7 3.6 1.9 2.2 3.0 3.3 2.3
  • 1.7 2.6 3.5 3.0 2.9 3.4 3.1 2.4
  • Exercise create a stem leaf plot for this data.

35
Stem and leaf plot
  • 1 4 7 8 9
  • 2 2 2 3 4 6 6 7 8 9
  • 3 0 0 0 1 2 3 4 4 5 6 7 8
  • Unit of leaf .1

36
double-stem stem and leaf plot
  • 1 4
  • 1 8 9 7
  • 2 2 3 4
  • 2 6 8 7 6 9
  • 3 4 0 2 0 3 0 4 1
  • 3 8 7 6 5

37
Ordered double-stem stem and leaf plot
  • 1 4
  • 1 7 8 9
  • 2 2 3 4
  • 2 6 6 7 8 9
  • 3 0 0 0 1 2 3 4 4
  • 3 5 6 7 8
Write a Comment
User Comments (0)
About PowerShow.com