Chapter 10 Categorical Data

About This Presentation

Title:

Chapter 10 Categorical Data

Description:

This map was drawn by the New York Times 3 - 1 day before the election. ... Risk: Risk of making a wrong decision. Accidental death rate = 10-6/day in USA ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 21

Provided by: RCLit

Category:

Tags: categorical | chapter | data | in | map | of | states | usa

more less

Transcript and Presenter's Notes

Title: Chapter 10 Categorical Data

1
Chapter 10 Categorical Data

10.1 - 10.3 only

2
Chapter 10 Categorical Data

Inference on one proportion ?
Sample size n, number of successes y
Sample proportion

The second equation is called the Wilson
estimator (1927)
3
Confidence Intervals
Example 10.1 Five year survival rate for cancer
patients n870, y330.
Or, without the 4 and 2, i.e.,
4
Confidence interval for the extremes

When y0, (1-a)100 C.I. for ? (0, 1-(a/2)1/n)
(p.457)
Example, 1000 sample without defect parts 95 CI
for defect rate (0, 1-(0.025)0.001) (0, 0.0037)
When y1, (1-a)100 C.I. for ?((a/2)1/n, 1)
(p.457)
Sample size for given error margin E (
)with (1-a) confidence
The most conservative size is let ?0.5. (p.458)

5
Example

In most popularity polls, the accuracy is usually
set at ?3, which means a 95 CI is
what is the required sample size.

The key is that a random sample can be collected.
6
Successful stories of polls
1992 US Presidential election predictions
Source, from newspaper a few days before the
election.
7
More on polls
Source Nov. 5 (Election day morning) USA Today
Both 2000 and 2004, the candidates (Bush vs Gore,
Kerry) were too close to call (within ?3). The
actual results showed the same.
It is difficult to reduce ?3 by sample size
alone. From mathematics to practice Random
sample, mind change, not telling mind
8
The next two elections, 2000 (Bush vs Gore) and
2004 (Bush vs Kerry) were too close to call
before the election. The final results confirmed
this fact. Now the 2008 election.

This map was drawn by the New York Times 3 - 1
day before the election. All the state
projections were correct. Toss-up states were
extremely close.
It also predicted that Obama would get 52?2 and
McCain 41?2 with 7 undecided.
The actual result is Obama 52.5 and McCain 46.
The total number of votes was 124,471,000.

9
Hypothesis Testing with One Proportion (p.458)
This is large sample result. We need
n?0(1-?0)5.
10
Example 10.4

Car failing rate at the inspection station 30.
Is the failing SUV rate is higher?
N150, Y60 fails.

Conclusion The is a strong evidence (p0.0035)
that the failing rate for SUV was indeed higher.
11
Sample size in hypothesis testing The second
part of key question

Nature cure rate of a disease is 50, a drug is
invented. We want to conduct a clinical trial and
determine whether this drug is effective. How
many patients should I recruit for this clinical
trial?
You tell your boss
There is no 100 correct statistical decisions.
If the drug is marginal effective, say with cure
rate 0.5001, it would be very difficult to
detect.
Any reasonable person will agree.

12
Key Concept in Statistical DecisionNatural cure
rate 50
Where does the no jump to yes?
13
There is no 100 correct statistical decision
Risk Risk of making a wrong decision Accidental
death rate 10-6/day in USA
How many patients should we recruit in the
beginning?
14
What you need to ask you boss

What risk you can take on a wrong claim (to claim
ineffective drug as effective).
What do you considered as a good drug that need
to be detected with high probability.
Let the first answer to be a0.05
Let the second answer to be if the cure rate
becomes larger than 0.6 (p1), I want at least 0.9
(1-ß) probability to detected.

15
The solution (1)
16
The solution (2)
17
The solution (3)
18
Two proportion influence (p.465)
19
Hypothesis Testing with Two Proportion (p.466)
This is large sample result. We need both
n?i(1-?i)5.
20
Example 10.6

Two English teaching methods (computer,
traditional) measured by success rate in passing
exams. Find whether there is difference by
hypothesis testing and confidence interval. Use a
0.05.
Data n1125, passing 94 (computer) n2175,
passing 113.

Conclusion We can say that the computer teaching
is better (p0.05) and a 95 CI for the passing
rate difference is 0.106 ? 0.104 (0.02,
0.210) Note The book used one-sided test. Here
two-sided test is used.

Write a Comment

User Comments (0)

About PowerShow.com

Chapter 10 Categorical Data - PowerPoint PPT Presentation

Chapter 10 Categorical Data

This map was drawn by the New York Times 3 - 1 day before the election. ... Risk: Risk of making a wrong decision. Accidental death rate = 10-6/day in USA ... – PowerPoint PPT presentation