1 / 34

Chapter 1 Statistical Thinking

- What is statistics?
- Why do we study statistics

Statistical Thinking

- the science of collecting, organizing, and

analyzing data - the mathematics of the collection, organization

and interpretation of numerical data - The branch of mathematics which is the study of

the methods of collecting and analyzing data - a branch of applied mathematics concerned with

the collection and interpretation of quantitative

data and the use of probability theory to

estimate population parameters

Statistical Thinking

- Statistics is a discipline which is concerned
- with
- designing experiments and other data collection,
- summarizing information to aid understanding,
- drawing conclusions from data, and
- estimating the present or predicting the future.

Statistical Thinking

- "I like to think of statistics as the science of

learning from data...." Jon Kettenring, ASA

President, 1997 - Steps of statistical analysis involve
- collecting information (Data Collection)
- evaluating the information (Data Analysis)
- drawing conclusions (Statistical Inference)

Statistical Thinking

- What type of information?
- A test group's favorite amount of sweetness in a

blend of fruit juices - The number of men and women hired by a city

government - The velocity of a burning gas on the sun's

surface - Clinical trials to investigate the effectiveness

of new treatments - Field experiments to evaluate irrigation methods

- Measurements of water quality

Statistical Thinking

- Problems
- Is a new treatment for heart disease more

effective than a standard one? - Is using a high octane gas beneficial to car

performance? - Does reading an article in statistics improve

students statistics grade?

Statistical Thinking

- Is a new treatment for heart disease more

effective than a standard one? - Pick, say, 100 heart patients
- Divide them into two groups, 50 in each group
- Group 1------------New treatment
- Group 2------------Standard treatment

Statistical Thinking

- Results
- 40 out of 50 of Group 1 patients improved
- 30 out of 50 of Group 2 patients improved
- Conclusion New treatment is more effective!

Statistical Thinking

- How do you divide the patients?
- Have you controlled other factors? (fitness

level, life style, age, etc) - How do you decide who gets what treatment?

Ethical issues????

Statistical Thinking

- Comparing Test Scores
- Select 10 students and give them a journal

article in statistics. - Test their knowledge about the article and record

their scores - Repeat the test after they take STT 231.

Statistical Thinking

- Result
- 8 out of the 10 students improved their scores.
- Question Can we conclude that reading the

article has improved students knowledge about

statistics?

Statistical Thinking

- Look at worst case scenarios
- Under the assumption that the new
- treatment is no better than the standard one,
- what is the chance that 80 of the patients
- benefit from this treatment?
- Under the assumption that STT 231 brings
- no benefit, how likely is it that we see 80
- of the students improve their scores?

Statistical Thinking

- Need a model to answer these questions!!
- If STT 231 is not beneficial, then students
- scores may go up or down with 50
- chance.
- This is equivalent to flipping a coin
- 50 chance you get Head
- 50 chance you get Tail

Statistical Thinking

- Comparing pre and post test scores for 10

students is equivalent to - flipping a coin 10 times and calculating the

chance of observing 8H - Relevant Questions
- Will the chance of observing 80 of the time H

depend on the number of students involved in the

experiment? - Will this chance go up, down or remain the same

if you repeat the experiment with 200 students?

Statistical Thinking

- Suppose the proportion of improvement in 10

trials is 4.4. What does this mean? - If STT 231 is not beneficial, then there is a

4.4chance that we will observe 8 out of 10

students scores improve. - There is little hope that 8 students scores will

improve by just by CHANCE

Statistical Thinking

- Suppose the proportion of improvement in 10

trials is 4.4. - We observed 8 students scores out of 10 improve.

- What does this mean?

Statistical Thinking

- Course is highly effective
- Course is ineffective and we observed an unlikely

event. - We do not know which one!

Statistical Thinking

- Suppose there is a small chance that an event

happens by CHANCE, - Then this is an indication for a strong evidence

that the change that we observe did not happen by

CHANCE. - Hence there is a strong evidence for a factor to

be responsible for this change.

Statistical Thinking

- The course is highly effective!!
- Reasoning What we observed is very unlikely if

the course was ineffective. Hence the course is

effective. - The 80 score increment is unlikely to be

achieved if the course was ineffective.

Statistical Thinking

- Some Remarks
- For questions that involve uncertainty
- Carefully formulate the question you want to

answer (Modeling) - Collect Data
- Summarize, analyze and present data
- Draw Conclusions. Conclusions always include

uncertainty - Support your conclusions by quantifying how

confident you are about your conclusions.

Chapter 2 A Design Example

- The Polio Vaccine Case
- Caused by virus
- Especially deadly in children
- Big problem during the first half of the 20th

Century - Develop vaccine to fight the disease
- Jonas Salk (1950)

A Design Example

- Problem with vaccines
- Are they safe?
- Are they effective?
- Undertake a large scale trial to answer these

questions

A Design Example

- Case 1 A Simple Study
- Distribute the vaccine widely (under the

assumption it is safe) - Decrease in the number of polio cases after the

vaccine provides evidence that the vaccine is

effective - Problem?????

A Design Example

- Problems
- Lack of control group
- Is decrease in number of polio due to the vaccine

or other factors? - How reliable is the assumption vaccine is safe?

A Design Example

- Case 2 Adding a Control Group
- Have two groups
- Control group-----gets salt solution
- Treatment group---gets the actual vaccine

A Design Example

- Example (Observed Control Study)
- Control Group---all 1st and 3rd grade children
- Treatment group---all 2nd graders
- Assumption
- Age difference between control and treatment

group was felt to be unimportant

A Design Example

- Potential Problems
- Parents of 2nd graders may not agree to

vaccinating their kids - Parents of sicker kids are most likely to accept

the vaccine - More educated parents tend to accept the vaccine
- Parents of sick 1st and 3rd graders may object

that their kids are not getting treatment

A Design Example

- Difficulty in diagnosing polio
- Extreme case of polio are easy to diagnose
- Less severe cases of polio have symptoms similar

to other common illnesses

A Design Example

- Potential Problems
- Physicians are aware of who has received the

vaccine and who has not - Less severe case of polio in a 2nd grader (who

has received the vaccine) may wrongly diagnosed

as another illness - Less severe case in a 1st or 3rd grader will most

likely be diagnosed as polio

A Design Example

- Case 3 Randomization, Placebo Control, Double

Blindness - Random assignment of control and treatment groups
- Select a child
- Flip a coin-------H-------Treatment Group
- T---------Control Group

Design Example

- Placebo Control
- Kids in the control group receive salt solution
- Double Blind
- Neither the child
- nor the parents
- nor the doctors/nurses
- who make the diagnosis of polio know whether a
- kid receives the vaccine or the placebo

A Design Example

- Summary
- In designing experiments
- Introduce some sort of control group
- Use randomization to avoid bias in selection and

assignment of subjects for the study - Double blind experiments give protection against

biases, both intentional and unintentional

A Design Example

- Perform the experiment on a large number of

subjects (Polio case in millions of kids) - Repeat the experiment several times before making

definitive conclusions

A Design Example

- Basic Principles of Experimental Designs
- Randomization
- Blocking (Treatment/Control Groups)
- Replication