Title: What is statistics
1What is statistics
- Statistical training is necessary and important
for many reasons. In almost any area of work, you
must be able to read, interpret, and apply the
results of a statistical analysis of research
data. -
- What is Statistics all about? Statistics
involves the collection, organization,
interpretation and presentation of data. - Studying statistics will help you to understand
the information and to reach correct conclusion.
2Example 1.1 The nutrition chart (of sandwich) in
McDonalds menu
3Chapter 1 Population and Sample
- An experiment unit(or subject) is the smallest
entity that is of interest in a statistical
study. - A variable is any characteristics that can be
measured on each experiment unit in a statistical
study. - An observation is a value that the variable
assumes for a single unit. - The collection of observations assumed by the
variables in the study is called a data set. - For example 1.1
- experiment unit A McDonalds item
- Variables calories, protein, carbohydrates,
total fat, saturated fat,
cholesterol, and sodium - Population all the items in the McDonalds menu
4- The population is the collection of all objects
or items that are of interest in a statistical
study. The individual objects in the population
are the experimental units or subjects. - A sample is a finite portion (subset) of the
population that is used to study the
characteristics of concern in the population. The
number of objects in this sample is called sample
size. - Bias is a systematic tendency of the sample to
misrepresent the population - A simple random sample (s.r.s) of size n consists
of n elements chosen from the population in such
a way that all samples of that size have the same
chance of being selected
5For example, (the lottery sampling)
- 100 balls are mixed thoroughly in a bag. draw 10
balls randomly from the bag. Try twice
with/without replacement.
6- Example1.2 Now we are interested in the heights
of MSU students. We measured the heights (and/or
weight, gender) of all the students and recorded
as follows X1, X2,. Randomly select 50
students to measure their heights. Explain the
concepts above. - Population
- All MSU students
- Experiment unit
- A MSU student
- variables
- height,( and/or weight, gender)
- Observations
- any one of X1,X2,.
- Data set
- X1,X2,.
- Sample
- the 50 selected students
7- A Census is a sample consisting of the entire
population. - Why dont we always do a census?
- Time time consuming,
- Cost cost more,
- Inaccessible population units study all the
sunfish in Lake Michigan, say. - Destructive testing destroy the unit
8Statistics terms
- experimental unit, subject
- variable
- population
- sample
- census
- s r s simple random sample
9Exercise 1.1
- A stock market investor is interested in oil
stocks.She collects last years price/earnings
ratios on ten randomly selected oil stocks. The
data of ratios are X1,X2,, X10. - What is our population?
- What is the variable? Give one observation.
- What is our sample and sample size?
- What is our data set?
10Exercise1.2
- Want to know the average height of 2nd grade
students in East Lansing Public schools. Instead
of measure all 2nd grade students, we sample the
60 2nd grade students randomly, The data of
heights are X1,X2,, X60. - What is our population?
- What is the variable? Give one observation.
- What is our sample and sample size?
- What is our data set?
11Chapter 2 Univariate DataData set for seven
undergraduate students (table 2.1)
12Data Sets
- A univariate data set is a data set in which one
measurement (variable) has been made on each
experiment unit. - A bivariate data set is a data set in which two
measurements (variables) have been made on each
experiment unit. - A multivariate data set is a data set in which
several measurements (variables) have been made
on each experiment unit.
13Types of Variables
- A categorical variable (also called a qualitative
variable) is a variable whose values are
classifications or categories. - For example, Gender Male, Female
- Occupation Student, doctor,
teacher,
14- A Numerical variable (also called a quantitative
or measure variable) is a variable whose values
are numbers obtained by a count or measurement.
For example weight, height. - 1. A discrete variable is a numerical variable
that can assume a finite number or at most a
countable infinite number - 2. A continuous variable is a numerical
variable that can take any number on an interval
of the real number line. For example, height,
weight.
15Remark
- Coding of categorical variable does not make it
numerical. - For example Gender
- Male -- 0 Female -- 1
16Types of Data
Discrete Can only take on certain values in an
interval.
Numerical (Quantitative)
Continuous Can take on infinitely many values
in an interval.
Categorical (Qualitative)
17Exercise 2.1 Classify the following as
categorical or numerical (discrete or
continuous).
- a. Age of freshmen in MSU
- b. Faculty rank
- c. Weight of newborn babies
- d. Murder rate in a major city
- e. Number of children in a family
- f. Brand of television set
18display categorical variable
- frequency table
- bar chart bar graph
- pie chart
19Frequency table
- A table that lists the different categories of
categorical data and the corresponding
frequencies (relative frequencies) with which
they occur - A class is one of the categories into which that
- the qualitative data is classified.
- Class frequency is the number of observations in
each class. - Class relative frequency is the class frequency
divided by the total number of observations.
(observations in each class)
20Example2.2 Time/CNN telephone poll of 500 adult
Americans Has the amount of crime in your
community increased in the past 5 years?
21Bar Chart
- is a picture consisting of horizontal and
vertical axis with rectangles that represent the
frequency (relative frequency) of the categories
of a variable.
22Bar chart of CNN telephone poll
23Pie Chart
- A circle or pie is divided into pieces
corresponding to the categories of the variable
so that the size of the slice is proportional to
the relative frequency of the category.
24Pie Chart
25Exercise 2.3 Sampling 100 students from MSU
students to get their level information, here are
the data Fr 12, So 24, Jr 32, Sr 24, Gr 8.draw
bar chart of the sample data
26Display numerical variable
- Example 2.3 The cholesterol levels from a sample
of 62 subjects from the Framingham Heart study - 393 353 334 336 327 300 300 308 283 285
270 270 272 - 278 278 263 264 267 267 267 268 254 254
254 256 256 - 258 240 243 246 247 248 230 230 230 230
231 232 232 - 232 234 234 236 236 238 220 225 225 226
210 211 212 - 215 216 217 218 200 202 192 198 184 167
27Display numerical variable
- dot plot ok for small data set
- stem leaf
- Grouped frequency table
- histogram
28Example 2.4
- A psychologist wishes to test a new method to
improve rote memorization by college students. A
sample of 20 college students were taught by this
method and then asked to memorize a list of 100
word phrases. The following numbers of correct
word phrases were recorded for the 20 students. - 84 59 82 78 74 96 44 76 85 66
- 77 91 62 54 72 65 84 38 76 70
29Dotplot
84 59 82 78 74 96 44 76 85 66 77 91
62 54 72 65 84 38 76 70
-
- Distribution of a variable specifies the distinct
values that the variable assumes and how often
these values occur.
30Stem and leaf Plot
- 84 59 82 78 74 96 44 76 85 66
- 91 62 54 72 65 84 38 76 70
- Step 1.Chose one leading digit as stem, the
trailing digit or digits as leaves
31Stem and leaf Plot.
- 84 59 82 78 74 96 44 76 85 66
- 77 91 62 54 72 65 84 38 76 70
- Step2. List the stem in a column and record the
leaves -
- 3
- 4
- 5
- 6
- 7
- 8 4
- 9
32- 84 59 82 78 74 96 44 76 85 66
- 77 91 62 54 72 65 84 38 76 70
- Step3. Fill stem and leaf plot
- 3 8
- 4 4
- 5 9 4
- 6 6 2 5
- 7 8 4 6 7 2 6 0
- 8 4 2 5 4
- 9 6 1
33- 84 59 82 78 74 96 44 76 85 66
- 77 91 62 54 72 65 84 38 76 70
- Step4. Reorder the leaves of each stem
- 3 8
- 4 4
- 5 4 9
- 6 2 5 6
- 7 0 2 4 6 6 7 8
- 8 2 4 4 5
- 9 1 6
- Step5Indicate the units for stem and leaf
34Example 2.5
- The following is the concentration of mercury in
25 lake trout caught in a major lake - 2.2 3.4 3.0 2.6 3.8 1.8 2.8 3.2
3.7 - 1.4 2.7 3.6 1.9 2.2 3.0 3.3 2.3
- 1.7 2.6 3.5 3.0 2.9 3.4 3.1 2.4
- Exercise create a stem leaf plot for this data.
35Stem and leaf plot
-
- 1 4 7 8 9
- 2 2 2 3 4 6 6 7 8 9
- 3 0 0 0 1 2 3 4 4 5 6 7 8
- Unit of leaf .1
36double-stem stem and leaf plot
-
- 1 4
- 1 8 9 7
- 2 2 3 4
- 2 6 8 7 6 9
- 3 4 0 2 0 3 0 4 1
- 3 8 7 6 5
37Ordered double-stem stem and leaf plot
- 1 4
- 1 7 8 9
- 2 2 3 4
- 2 6 6 7 8 9
- 3 0 0 0 1 2 3 4 4
- 3 5 6 7 8