Title: Introduction to Statistics
1Chapter 1
- Introduction to Statistics
- 1.1 An Overview of Statistics
- 1.2 Data Classification
- 1.3 Experimental Design
2Section 1.1 and 1.2
- An Overview of Statistics
- Classifying Data
- Critical Thinking
3What is Statistics?
- Statistics
- The science of collecting, organizing, analyzing,
and interpreting data in order to make decisions.
4Definitions
Population The collection of all outcomes,
responses, measurements, or counts that are of
interest.
Sample The collection of data from a subset of
the population.
Census The collection of data from every member
of the population.
5Example Identify the population, and whether a
census or sample would be done.
- 1. HCC is doing a study on how many credit hours
a HCC student is taking. - 2. HCC is doing a study on many hours a week a
HCC student is working. - A fashion magazine gathers data on the price of
womens jeans.
6What is Data?
- Data
- The responses, counts, measurements, or
observations that have been collected. - Data can be classified as one of 2 types
1. Qualitative Data 2. Quantitative Data
6
Larson/Farber 4th ed.
7Qualitative Data
- Qualitative Data Consists of non-numeric,
categorical attributes or labels
Major
Place of birth
Eye color
Common statistic calculated percentages
8Quantitative Data
- Quantitative data Numerical measurements or
counts.
Temperature
Weight of a letter
Age
Common statistic calculated averages
8
Larson/Farber 4th ed.
9Quantitative DataDiscrete vs. Continuous
- Discrete data finite number of possible data
values 0, 1, 2, 3, 4. - ex Number of classes a student is taking
Continuous data infinite number of possible data
values on a continuous scale ex Weight of a
baby
9
Larson/Farber 4th ed.
10(No Transcript)
11Parameters and Statistics
- Parameter
- A number that describes some characteristic of an
entire population. - Average age of all people in the United States
Statistic A number that describes some
characteristic from a sample. Average age of
people from a sample of three states
12- Ex Parameters vs. Statistics
Decide whether the numerical value describes a
population parameter or a sample statistic.
- The average credit load of all HCC full-time
students is 14.2 credit hours. - From a sample of 300 HCC full-time students
showed the average work hours a week is 18.3
hours. - A gallup poll of 1012 adults nationwide showed
34 owned a handgun.
13White House 2008 Republican Nomination
Pew Research Center for the People the Press survey conducted by Princeton Survey Research Associates International. Dec. 19-30, 2007. N471 registered voters nationwide who are Republicans or lean Republican. MoE 5.
"I'm going to read you the names of some Republican presidential candidates. Which one of the following Republican candidates would be your first choice for president see below?" If unsure "Just as of today, would you say you lean toward see below?" (Names were rotated)
Candidate Percent
John McCain 22
Rudy Giuliani 20
Mike Huckabee 17
Mitt Romney 12
Fred Thompson 9
Ron Paul 4
Duncan Hunter 1
Other (vol) 1
None (vol.) 2
Unsure 12
14Branches of Statistics
Descriptive Statistics Involves organizing,
summarizing, and displaying data. Describes the
important characteristics of the data. e.g.
Tables, charts, averages, percentages
Inferential Statistics Involves using sample
data to draw conclusions or make inferences about
an entire population.
15Example Descriptive and Inferential Statistics
- Decide which part of the study represents the
descriptive branch of statistics. What
conclusions might be drawn from the study using
inferential statistics?
A sample of Illinois adults showed that 22.7 of
those with a high school diploma were obese, and
16.7 of college graduates were obese. (Source
Illinois BRFSS, 2004)
16Example Descriptive and Inferential Statistics
- Decide which part of the study represents the
descriptive branch of statistics. What
conclusions might be drawn from the study using
inferential statistics?
A sample of 471 registered republicans showed
that 22 would pick John McCain as the republican
nominee for president. (Margin of error 5).
(Source USA Today/CNN poll)
16
Larson/Farber 4th ed.
17Uses of Statistics
- Almost all fields of study benefit from the
application of statistical methods - Statistics often lead to change
18Misuses of Statistics
- Bad Samples
- Small Samples
- Misleading Graphs
- Pictographs
- Loaded Questions
- Correlation Causality
- Self Interest Study
19Misuse Bad Samples
- Samples must be unbiased and fairly represent the
entire population. - If the data is not collected appropriately, the
data may be completely useless. Garbage in,
garbage out
- Voluntary response sample Respondents
themselves decide whether to be included in the
sample - Ex. Online surveys
- Ex. Ratemyprofessor.com
20Misuse Misleading Graphs
21CNN/USA Today Gallup poll on Terri Schiavo (March
2005)
22CNN/USA Today Gallup poll on Terri Schiavo
(March 2005) Reprinted
23Misuse Pictographs
24Misuse Loaded Questions
Should the President have the line item veto to
eliminate waste? (97 said yes
) Should the President have the line item
veto? (57 said yes )
25Misuse Loaded Questions
26Misuse Correlation does not imply Cause and
Effect
27(No Transcript)
28Misuses Self Interest and Deliberate Distortions
29(No Transcript)
30Section 1.3
31Designing a Statistical Study
- What is it you want to study?
- What is the population to gather data from?
- Collect data. If you use a sample, it must be
representative of the population. - Descriptive Statistics organize, present,
summarize data - Inferential Statistics draw conclusions about
the population based on sample data
32Things to Consider with Samples
- The sample must be unbiased and fairly represent
the entire population. - If the data is not collected appropriately, the
data may be completely useless. Garbage in,
garbage out - Want the maximum information at the minimum cost.
What sample size is needed?
32
Larson/Farber 4th ed.
33Methods of Collecting Data
- Observational study
- Survey
- Experiment
- Simulation
33
Larson/Farber 4th ed.
34Methods of Collecting Data
- Observational study
- A researcher observes or measures characteristics
of interest of part of a population but does not
change any existing conditions. - Experiment
- A treatment is applied to part of a population
and responses are observed.
35Methods of Collecting Data
- Survey
- An investigation of one or more characteristics
of a population, usually be asking people
questions. - Commonly done by interview, mail, or telephone.
- Simulation
- Uses a mathematical or physical model to
reproduce the conditions of a situation or
process. Often involves the use of computers.
36Example Methods of Data Collection
- Consider the following studies. Which method of
data collection would you use to collect data for
each study?
- A study of salaries of NFL players.
- A study of the emergency response times during a
terrorist attack. - A study of whether changing teaching techniques
improves FCAT scores. - A study of whether Tampa residents support a mass
transit system.
37Sampling Techniques
- Random versus Non-Random Samples
- Convenience Samples
- Simple Random Samples
- Systematic Samples
- Cluster Samples
37
Larson/Farber 4th ed.
38Random and Non-Random Sampling
- Random Sampling
- Every member of the population has an equal
change of being selected. - Non-Random Sampling
- Some members of the population have no chance of
being picked. Often leads to biased samples.
38
Larson/Farber 4th ed.
39Convenience Samples
- Data is collected that is readily available and
easy to get. - Self-selected surveys or voluntary response
surveys (online surveys, magazine surveys,
1-800-Verdict, Ratemyprofessor.com)
40Simple Random Sample
- A random sample where every member of the
population and every group of the same size has
an equal chance of being selected. - Usually involves using a random number generator.
41Simple Random Sampling
- Number each element of the population from 1 to
N. - Use a random number generator (table, calculator,
computer) to randomly selected a sample of size
n. - TI-83/4 randint (1,N,n), or
- Table 1 in text. Pick a random start.
42Systematic Sampling
- Choose a starting value at random. Then
- choose every kth member of the population.
- example Select every 3rd patient who enters the
ER.
43Stratified Sampling
- Divide a population into at least 2 different
subgroups (strata) that share the same
characteristics (age, gender, ethnicity, income,
etc) and select a random sample from each group.
- Advantages More information
44Cluster Sampling
- Divide the population into many like subgroups
(clusters) randomly select some of those
clusters, and then select all of the members of
those clusters to be in the sample. - Advantage geographically separately populations
45Sources of Error in Sampling
- Sampling Error
- the expected difference between a sample result
and the true population result. (e.g.
Margin of error).
- Non-Sampling Error
- sample data is incorrectly gathered, collected,
or recorded. - Selection Bias - bad sample
- Response Bias- bad data incorrect responses,
inaccurate measurements, - etc.)