1 / 45

Chapter 1

- Introduction to Statistics

- 1.1 An Overview of Statistics
- 1.2 Data Classification
- 1.3 Experimental Design

Section 1.1 and 1.2

- An Overview of Statistics
- Classifying Data
- Critical Thinking

What is Statistics?

- Statistics
- The science of collecting, organizing, analyzing,

and interpreting data in order to make decisions.

Definitions

Population The collection of all outcomes,

responses, measurements, or counts that are of

interest.

Sample The collection of data from a subset of

the population.

Census The collection of data from every member

of the population.

Example Identify the population, and whether a

census or sample would be done.

- 1. HCC is doing a study on how many credit hours

a HCC student is taking. - 2. HCC is doing a study on many hours a week a

HCC student is working. - A fashion magazine gathers data on the price of

womens jeans.

What is Data?

- Data
- The responses, counts, measurements, or

observations that have been collected. - Data can be classified as one of 2 types

1. Qualitative Data 2. Quantitative Data

6

Larson/Farber 4th ed.

Qualitative Data

- Qualitative Data Consists of non-numeric,

categorical attributes or labels

Major

Place of birth

Eye color

Common statistic calculated percentages

Quantitative Data

- Quantitative data Numerical measurements or

counts.

Temperature

Weight of a letter

Age

Common statistic calculated averages

8

Larson/Farber 4th ed.

Quantitative DataDiscrete vs. Continuous

- Discrete data finite number of possible data

values 0, 1, 2, 3, 4. - ex Number of classes a student is taking

Continuous data infinite number of possible data

values on a continuous scale ex Weight of a

baby

9

Larson/Farber 4th ed.

(No Transcript)

Parameters and Statistics

- Parameter
- A number that describes some characteristic of an

entire population. - Average age of all people in the United States

Statistic A number that describes some

characteristic from a sample. Average age of

people from a sample of three states

- Ex Parameters vs. Statistics

Decide whether the numerical value describes a

population parameter or a sample statistic.

- The average credit load of all HCC full-time

students is 14.2 credit hours. - From a sample of 300 HCC full-time students

showed the average work hours a week is 18.3

hours. - A gallup poll of 1012 adults nationwide showed

34 owned a handgun.

White House 2008 Republican Nomination

Pew Research Center for the People the Press survey conducted by Princeton Survey Research Associates International. Dec. 19-30, 2007. N471 registered voters nationwide who are Republicans or lean Republican. MoE 5.

"I'm going to read you the names of some Republican presidential candidates. Which one of the following Republican candidates would be your first choice for president see below?" If unsure "Just as of today, would you say you lean toward see below?" (Names were rotated)

Candidate Percent

John McCain 22

Rudy Giuliani 20

Mike Huckabee 17

Mitt Romney 12

Fred Thompson 9

Ron Paul 4

Duncan Hunter 1

Other (vol) 1

None (vol.) 2

Unsure 12

Branches of Statistics

Descriptive Statistics Involves organizing,

summarizing, and displaying data. Describes the

important characteristics of the data. e.g.

Tables, charts, averages, percentages

Inferential Statistics Involves using sample

data to draw conclusions or make inferences about

an entire population.

Example Descriptive and Inferential Statistics

- Decide which part of the study represents the

descriptive branch of statistics. What

conclusions might be drawn from the study using

inferential statistics?

A sample of Illinois adults showed that 22.7 of

those with a high school diploma were obese, and

16.7 of college graduates were obese. (Source

Illinois BRFSS, 2004)

Example Descriptive and Inferential Statistics

- Decide which part of the study represents the

descriptive branch of statistics. What

conclusions might be drawn from the study using

inferential statistics?

A sample of 471 registered republicans showed

that 22 would pick John McCain as the republican

nominee for president. (Margin of error 5).

(Source USA Today/CNN poll)

16

Larson/Farber 4th ed.

Uses of Statistics

- Almost all fields of study benefit from the

application of statistical methods - Statistics often lead to change

Misuses of Statistics

- Bad Samples
- Small Samples
- Misleading Graphs
- Pictographs
- Loaded Questions
- Correlation Causality
- Self Interest Study

Misuse Bad Samples

- Samples must be unbiased and fairly represent the

entire population. - If the data is not collected appropriately, the

data may be completely useless. Garbage in,

garbage out

- Voluntary response sample Respondents

themselves decide whether to be included in the

sample - Ex. Online surveys
- Ex. Ratemyprofessor.com

Misuse Misleading Graphs

CNN/USA Today Gallup poll on Terri Schiavo (March

2005)

CNN/USA Today Gallup poll on Terri Schiavo

(March 2005) Reprinted

Misuse Pictographs

Misuse Loaded Questions

Should the President have the line item veto to

eliminate waste? (97 said yes

) Should the President have the line item

veto? (57 said yes )

Misuse Loaded Questions

Misuse Correlation does not imply Cause and

Effect

(No Transcript)

Misuses Self Interest and Deliberate Distortions

(No Transcript)

Section 1.3

- Experimental Design

Designing a Statistical Study

- What is it you want to study?
- What is the population to gather data from?
- Collect data. If you use a sample, it must be

representative of the population. - Descriptive Statistics organize, present,

summarize data - Inferential Statistics draw conclusions about

the population based on sample data

Things to Consider with Samples

- The sample must be unbiased and fairly represent

the entire population. - If the data is not collected appropriately, the

data may be completely useless. Garbage in,

garbage out - Want the maximum information at the minimum cost.

What sample size is needed?

32

Larson/Farber 4th ed.

Methods of Collecting Data

- Observational study
- Survey
- Experiment
- Simulation

33

Larson/Farber 4th ed.

Methods of Collecting Data

- Observational study
- A researcher observes or measures characteristics

of interest of part of a population but does not

change any existing conditions. - Experiment
- A treatment is applied to part of a population

and responses are observed.

Methods of Collecting Data

- Survey
- An investigation of one or more characteristics

of a population, usually be asking people

questions. - Commonly done by interview, mail, or telephone.
- Simulation
- Uses a mathematical or physical model to

reproduce the conditions of a situation or

process. Often involves the use of computers.

Example Methods of Data Collection

- Consider the following studies. Which method of

data collection would you use to collect data for

each study?

- A study of salaries of NFL players.
- A study of the emergency response times during a

terrorist attack. - A study of whether changing teaching techniques

improves FCAT scores. - A study of whether Tampa residents support a mass

transit system.

Sampling Techniques

- Random versus Non-Random Samples
- Convenience Samples
- Simple Random Samples
- Systematic Samples
- Cluster Samples

37

Larson/Farber 4th ed.

Random and Non-Random Sampling

- Random Sampling
- Every member of the population has an equal

change of being selected. - Non-Random Sampling
- Some members of the population have no chance of

being picked. Often leads to biased samples.

38

Larson/Farber 4th ed.

Convenience Samples

- Data is collected that is readily available and

easy to get. - Self-selected surveys or voluntary response

surveys (online surveys, magazine surveys,

1-800-Verdict, Ratemyprofessor.com)

Simple Random Sample

- A random sample where every member of the

population and every group of the same size has

an equal chance of being selected. - Usually involves using a random number generator.

Simple Random Sampling

- Number each element of the population from 1 to

N. - Use a random number generator (table, calculator,

computer) to randomly selected a sample of size

n. - TI-83/4 randint (1,N,n), or
- Table 1 in text. Pick a random start.

Systematic Sampling

- Choose a starting value at random. Then
- choose every kth member of the population.

- example Select every 3rd patient who enters the

ER.

Stratified Sampling

- Divide a population into at least 2 different

subgroups (strata) that share the same

characteristics (age, gender, ethnicity, income,

etc) and select a random sample from each group.

- Advantages More information

Cluster Sampling

- Divide the population into many like subgroups

(clusters) randomly select some of those

clusters, and then select all of the members of

those clusters to be in the sample. - Advantage geographically separately populations

Sources of Error in Sampling

- Sampling Error
- the expected difference between a sample result

and the true population result. (e.g.

Margin of error).

- Non-Sampling Error
- sample data is incorrectly gathered, collected,

or recorded. - Selection Bias - bad sample
- Response Bias- bad data incorrect responses,

inaccurate measurements, - etc.)