Chapter 10 Categorical Data - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Chapter 10 Categorical Data

Description:

This map was drawn by the New York Times 3 - 1 day before the election. ... Risk: Risk of making a wrong decision. Accidental death rate = 10-6/day in USA ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 21
Provided by: RCLit
Category:
Tags: categorical | chapter | data | in | map | of | states | usa

less

Transcript and Presenter's Notes

Title: Chapter 10 Categorical Data


1
Chapter 10 Categorical Data
  • 10.1 - 10.3 only

2
Chapter 10 Categorical Data
  • Inference on one proportion ?
  • Sample size n, number of successes y
  • Sample proportion

The second equation is called the Wilson
estimator (1927)
3
Confidence Intervals
Example 10.1 Five year survival rate for cancer
patients n870, y330.
Or, without the 4 and 2, i.e.,
4
Confidence interval for the extremes
  • When y0, (1-a)100 C.I. for ? (0, 1-(a/2)1/n)
    (p.457)
  • Example, 1000 sample without defect parts 95 CI
    for defect rate (0, 1-(0.025)0.001) (0, 0.0037)
  • When y1, (1-a)100 C.I. for ?((a/2)1/n, 1)
    (p.457)
  • Sample size for given error margin E (
    )with (1-a) confidence
  • The most conservative size is let ?0.5. (p.458)

5
Example
  • In most popularity polls, the accuracy is usually
    set at ?3, which means a 95 CI is
    what is the required sample size.

The key is that a random sample can be collected.
6
Successful stories of polls
1992 US Presidential election predictions
Source, from newspaper a few days before the
election.
7
More on polls
Source Nov. 5 (Election day morning) USA Today
Both 2000 and 2004, the candidates (Bush vs Gore,
Kerry) were too close to call (within ?3). The
actual results showed the same.
It is difficult to reduce ?3 by sample size
alone. From mathematics to practice Random
sample, mind change, not telling mind
8
The next two elections, 2000 (Bush vs Gore) and
2004 (Bush vs Kerry) were too close to call
before the election. The final results confirmed
this fact. Now the 2008 election.
  • This map was drawn by the New York Times 3 - 1
    day before the election. All the state
    projections were correct. Toss-up states were
    extremely close.
  • It also predicted that Obama would get 52?2 and
    McCain 41?2 with 7 undecided.
  • The actual result is Obama 52.5 and McCain 46.
  • The total number of votes was 124,471,000.

9
Hypothesis Testing with One Proportion (p.458)
This is large sample result. We need
n?0(1-?0)5.
10
Example 10.4
  • Car failing rate at the inspection station 30.
    Is the failing SUV rate is higher?
  • N150, Y60 fails.

Conclusion The is a strong evidence (p0.0035)
that the failing rate for SUV was indeed higher.
11
Sample size in hypothesis testing The second
part of key question
  • Nature cure rate of a disease is 50, a drug is
    invented. We want to conduct a clinical trial and
    determine whether this drug is effective. How
    many patients should I recruit for this clinical
    trial?
  • You tell your boss
  • There is no 100 correct statistical decisions.
  • If the drug is marginal effective, say with cure
    rate 0.5001, it would be very difficult to
    detect.
  • Any reasonable person will agree.

12
Key Concept in Statistical DecisionNatural cure
rate 50
Where does the no jump to yes?
13
There is no 100 correct statistical decision
Risk Risk of making a wrong decision Accidental
death rate 10-6/day in USA
How many patients should we recruit in the
beginning?
14
What you need to ask you boss
  • What risk you can take on a wrong claim (to claim
    ineffective drug as effective).
  • What do you considered as a good drug that need
    to be detected with high probability.
  • Let the first answer to be a0.05
  • Let the second answer to be if the cure rate
    becomes larger than 0.6 (p1), I want at least 0.9
    (1-ß) probability to detected.

15
The solution (1)
16
The solution (2)
17
The solution (3)
18
Two proportion influence (p.465)
19
Hypothesis Testing with Two Proportion (p.466)
This is large sample result. We need both
n?i(1-?i)5.
20
Example 10.6
  • Two English teaching methods (computer,
    traditional) measured by success rate in passing
    exams. Find whether there is difference by
    hypothesis testing and confidence interval. Use a
    0.05.
  • Data n1125, passing 94 (computer) n2175,
    passing 113.

Conclusion We can say that the computer teaching
is better (p0.05) and a 95 CI for the passing
rate difference is 0.106 ? 0.104 (0.02,
0.210) Note The book used one-sided test. Here
two-sided test is used.
Write a Comment
User Comments (0)
About PowerShow.com