# CHAPTER 6.1 SUMMARIZING POSSIBLE OUTCOMES AND THEIR PROBABILITIES - PowerPoint PPT Presentation

Loading...

PPT – CHAPTER 6.1 SUMMARIZING POSSIBLE OUTCOMES AND THEIR PROBABILITIES PowerPoint presentation | free to download - id: 708f38-NjdmM

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

## CHAPTER 6.1 SUMMARIZING POSSIBLE OUTCOMES AND THEIR PROBABILITIES

Description:

### chapter 6.1 summarizing possible outcomes and their probabilities definition: a random variable is a numerical measurement of the outcome of a random phenomenon ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 90
Provided by: msue9
Learn more at: http://www.stt.msu.edu
Category:
Tags:
User Comments (0)
Transcript and Presenter's Notes

Title: CHAPTER 6.1 SUMMARIZING POSSIBLE OUTCOMES AND THEIR PROBABILITIES

1
CHAPTER 6.1 SUMMARIZING POSSIBLE OUTCOMES AND
THEIR PROBABILITIES
• DEFINITION A RANDOM VARIABLE IS A NUMERICAL
MEASUREMENT OF THE OUTCOME OF A RANDOM PHENOMENON
(EXPERIMENT).
• DEFINITION A DISCRETE RANDOM VARIABLE X TAKES
ITS VALUES FROM A COUNTABLE SET, FOR EXAMPLE, N
0, 1, 2, 3, 4, 5, 6, 7, . . . .
• DEFINITION THE PROBABILITY DISTRIBUTION OF A
DISCRETE RANDOM VARIABLE IS A FUNCTION SUCH THAT
FOR ALL OUTCOMES

2
MEAN OF A DISCRETE PROBABILITY DISTRIBUTION
• THE MEAN OF A PROBABILITY DISTRIBUTION FOR A
DISCRETE RANDOM VARIABLE IS GIVEN BY
• IN WORDS, TO GET THE MEAN OF A DISCRETE
PROBABILITY DISTRIBUTION, MULTIPLY EACH POSSIBLE
VALUE OF THE RANDOM VARIABLE BY ITS PROBABILITY,
AND THEN ADD ALL THESE PRODUCTS.

3
EXAMPLE NUMBER OF HOME RUNS IN A GAME FOR BOSTON
RED SOX
NUMBER OF HOME RUNS PROBABILITY
0 0.23
1 0.38
2 0.22
3 0.13
4 0.03
5 0.01
6 OR MORE 0.00
SUM 1.00
4
(1) WHAT IS THE EXPECTED (MEAN) NUMBER OF HOME
RUNS FOR A BOSTON RED SOX BASEBALL GAME?
• (2) INTERPRET WHAT THIS MEAN (EXPECTED VALUE)
MEANS.

5
PROBABILITY FOR CONTINUOUS RANDOM VARIABLE
• DEFINITION A CONTINUOUS RANDOM VARIABLE HAS
POSSIBLE VALUES THAT FORM AN INTERVAL, THAT IS,
TAKES ITS VALUES FROM AN INTERVAL, FOR EXAMPLE,
(2 , 5).
• DEFINITION THE PROBABILITY DISTRIBUTION OF A
CONTINUOUS RANDOM VARIABLE IS SPECIFIED BY A
CURVE THAT DETERMINES THE PROBABILITY THAT THE
RANDOM VARIABLE FALLS IN ANY PARTICULAR INTERVAL
OF VALUES.

6
REMARKS
• EACH INTERVAL HAS PROBABILITY BETWEEN 0 AND 1.
THIS IS THE AREA UNDER THE CURVE, ABOVE THAT
INTERVAL.
• THE INTERVAL CONTAINING ALL POSSIBLE VALUES HAS
PROBABILITY EQUAL TO 1, SO THE TOTAL AREA UNDER
THE CURVE EQUALS 1.
• ILLUSTRATIVE PICTURES

7
CHAPTER 6.2 FINDING PROBABILITIES FOR BELL
SHAPED DISTRIBUTIONS THE NORMAL DISTRIBUTION
• THE NORMAL DISTRIBUTION IS VERY COMMONLY USED FOR
CONTINUOUS RANDOM VARIABLES. IT IS CHARACTERIZED
BY A PARTICULAR SYMMETRIC, BELL SHAPED CURVE
WITH TWO PARAMETERS THE MEAN AND STANDARD
DEVIATION.
• NOTATION
• ILLUSTRATIVE PICTURES

8
THE NORMAL DISTRIBUTION IS ALSO THE MODEL FOR A
POPULATION DISTRIBUTION
• THE POPULATION DISTRIBUTION OF A RANDOM VARIABLE
X IS OFTEN MODELED BY A BELL SHAPED CURVE WITH
THE PROPERTIES THAT THE PROPORTION OF THE
POPULATION FOR WHICH X IS BETWEEN a AND b, IS THE
AREA UNDER THE CURVE, AND BETWEEN a AND b.
• ILLUSTRATIVE PICTURE

9
THE EMPIRICAL OR 68 95 99.7 RULE
• THE EMPIRICAL RULE STATES THAT FOR AN
APPROXIMATELY BELL SHAPED DISTRIBUTION, ABOUT
68 OF OBSERVATIONS(VALUES) FALL WITHIN ONE
STANDARD DEVIATION OF THE MEAN 95 OF THE VALUES
FALL WITHIN TWO STANDARD DEVIATIONS OF THE MEAN
99.7 OF VALUES FALL WITHIN THREE STANDARD
DEVIATIONS OF THE MEAN.
• ILLUSTRATIVE PICTURE

10
FINDING PROBABILITIES FOR CONTINUOUS RANDOM
VARIABLES USING THE STANDARD NORMAL DISTRIBUTION
TABLE
• DEFINITION THE STANDARD NORMAL DISTRIBUTION IS
THE NORMAL DISTRIBUTION WITH MEAN 0 AND
STANDARD DEVIATION 1. IT IS THE DISTRIBUTION OF
NORMAL Z SCORES.
• DEFINITION THE Z SCORE FOR A VALUE x OF A
RANDOM VARIABLE IS THE NUMBER OF STANDARD
DEVIATIONS THAT x FALLS FROM THE MEAN. IT IS
CALCULATED AS

11
CLASS EXAMPLE 1
• IN A STANDARD NORMAL MODEL, WHAT PERCENT OF
POPULATION IS IN EACH REGION? DRAW A PICTURE IN
EACH CASE.
• Z lt 0.83 (B) Z gt 0.83 (C) 0.1 lt Z lt 0.9
• SOLUTION

12
CLASS EXAMPLE 2
• IN A STANDARD NORMAL MODEL, FIND THE VALUE OF Z
THAT CUTS OFF
• (A) THE LOWEST 75 OF POPULATION
• (B) THE HIGHEST 20 OF POPULATION ( THE LOWEST
80)
• SOLUTION

13
CLASS EXAMPLE 3
• SUPPOSE THAT WE MODEL SAT SCORES Y, BY N(500,
100) DISTRIBUTION.
• (A) WHAT PERCENTAGE OF SAT SCORES FALL BETWEEN
450 AND 600?
• (B) FOR WHAT SAT VALUE b, 10 OF SAT SCORES ARE
GREATER THAN b?
• SOLUTION

14
CHAPTER 6.3 PROBABILITY MODELS FOR OBSERVATIONS
WITH TWO POSSIBLE OUTCOMES
• BERNOULLI TRIAL
• A RANDOM EXPERIMENT WITH TWO COMPLEMENTARY
EVENTS, SUCCESS (S) AND FAILURE (F) IS CALLED A
BERNOULLI TRIAL.
• P(SUCCESS) p
• P(FAILURE) q 1 - p

15
EXAMPLES
• TOSSING A COIN 20 TIMES
• SUCCESS HEADS WITH p 0.5 AND FAILURE TAILS
WITH q 1 p 0.5
• TAKING A MULTIPLE CHOICE EXAM UNPREPARED.
• SUCCESS CORRECT ANSWER
• FAILURE WRONG ANSWER
• p 0.2 q 1 p 1 0.2 0.8

16
PRODUCTS COMING OUT OF A PRODUCTION LINE
• SUCCESS DEFECTIVE ITEMS
• FAILURE NON-DEFECTIVE ITEMS
• ROLLING A DIE 10 TIMES
• SUCCESS GETTING A 6 p 1/6
• FAILURE NOT GETTING A 6 q 5/6

17
AN OFFER FROM A BANK FOR A CREDIT CARD WITH HIGH
INTEREST RATE
• SUCCESS DECLINE FAILURE ACCEPT
• HAVING HEALTH INSURANCE
• SUCCESS HAVE FAILLURE NOT HAVE
• A REFERENDUM WHETHER TO RECALL AN UNFAITHFUL
GOVERNOR FROM OFFICE
• SUCCESS VOTE YES FAILLURE VOTE NO

18
GEOMETRIC PROBABILITY MODEL
• QUESTION HOW LONG WILL IT TAKE TO ACHIEVE THE
FIRST SUCCESS IN A SERIES OF BERNOULLI TRIALS?
• THE MODEL THAT TELLS US THIS PROBABILITY (THAT
IS, THE PROBABILITY UNTIL FIRST SUCCESS) IS
CALLED THE GEOMETRIC PROBABILITY MODEL.

19
CONDITIONS
• THE FOLLOWING CONDITIONS MUST HOLD BEFORE USING
THE GEOMETRIC PROBABILITY MODEL.
• (1) THE TRIALS MUST BE BERNOULLI, THAT IS, THE
RANDOM EXPERIMENT MUST HAVE TWO COMPLEMENTARY
OUTCOMES SUCCESS AND FAILURE
• (2) THE TRIALS MUST BE INDEPENDENT OF ONE
ANOTHER
• (3) THE PROBABILITY OF SUCCESS IS THE SAME FOR
EACH TRIAL.

20
GEOMETRIC PROBABILITY MODEL FOR BERNOULLI TRIALS
• LET p PROBABILAITY OF SUCCESS
• AND q 1 p PROBABILITY OF FAILURE
• X NUMBER OF TRIALS UNTIL FIRST SUCCESS
OCCURS

21
EXAMPLE
• ASSUME THAT 13 OF PEOPLE ARE LEFT-HANDED. IF WE
SELECT 5 PEOPLE AT RANDOM, FIND THE PROBABILITY
OF EACH OUTCOME DESCRIBED BELOW.
• (1) THE FIRST LEFTY IS THE FIFTH PERSON CHOSEN?
• 0.0745
• (2) THE FIRST LEFTY IS THE SECOND OR THIRD
PERSON.
• 0.211
• (3) IF WE KEEP PICKING PEOPLE UNTIL WE FIND A
LEFTY, HOW LONG WILL YOU EXPECT IT WILL TAKE?
• 7.69 PEOPLE

22
EXAMPLE
• AN OLYMPIC ARCHER IS ABLE TO HIT THE BULLS-EYE
80 OF THE TIME. ASSUME EACH SHOT IS INDEPENDENT
OF THE OTHERS. IF SHE SHOOTS 6 ARROWS, WHATS THE
PROBABILITY THAT
• (1) HER FIRST BULLS-EYE COMES ON THE THIRD
ARROW? ANS 0.032
• (2) HER FIRST BULLS-EYE COMES ON THE FOURTH OR
FIFTH ARROW? ANS 0.00768
• IF SHE KEEPS SHOOTING ARROWS UNTIL SHE HITS THE
BULLS-EYE, HOW LONG DO YOU EXPECT IT WILL TAKE?
ANS 1.25 SHOTS

23
BINOMIAL PROBABILITY MODEL FOR BERNOULLI TRIALS
• QUESTION WHAT IS THE NUMBER OF SUCCESSES IN A
SPECIFIED NUMBER OF TRIALS?
• THE BINOMIAL PROBABILITY MODEL ANSWERS THIS
QUESTION, THAT IS, THE PROBABILITY OF EXACTLY k
SUCCESSES IN n TRIALS.
• CONDITIONS SAME AS THOSE FOR THE GEOMETRIC
PROBABILITY MODEL

24
BINOMIAL PROBABILITY MODEL
• LET n NUMBER OF TRIALS
• p PROBABILITY OF SUCCESS
• q PROBABILITY OF FAILURE
• X NUMBER OF SUCCESSESS IN n TRIALS

25
n! n(n-1)(n-2)(n-3) 3.2.1

26
EXAMPLES
• COMPUTE
• (1) 3! (2) 4! (3) 5! (4) 6!
• COMPUTE

27
EXAMPLE
• ASSUME THAT 13 OF PEOPLE ARE LEFT-HANDED. IF WE
SELECT 5 PEOPLE AT RANDOM, FIND THE PROBABILITY
OF EACH OUTCOME BELOW.
• (1) THERE ARE EXACTLY 3 LEFTIES IN THE GROUP.
• 0.0166
• (2) THERE ARE AT LEAST 3 LEFTIES IN THE GROUP.
• 0.0179
• (3) THERE ARE NO MORE THAN 3 LEFTIES IN THE
GROUP. 0.9987

28
EXAMPLE
• AN OLYMPIC ARCHER IS ABLE TO HIT THE BULLS-EYE
80 OF THE TIME. ASSUME EACH SHOT IS INDEPENDENT
OF THE OTHERS. IF SHE SHOOTS 6 ARROWS, WHATS THE
PROBABILITY THAT
• (1) SHE GETS EXACTLY 4 BULLS-EYES? 0.246
• (2) SHE GETS AT LEAST 4 BULLS-EYES? 0.901
• (3) SHE GETS AT MOST 4 BULLS-EYES? 0.345
• (4) SHE MISSES THE BULLS-EYE AT LEAST ONCE?

• 0.738
• (5) HOW MANY BULLS-EYES DO YOU EXPECT HER TO
GET? 4.8
BULLSEYES
• (6) WITH WHAT STANDARD DEVIATION? 0.98

29
THE NORMAL MODEL TO THE RESCUE OF BINOMIAL MODEL
• IF n, THE FIXED NUMBER OF TRIALS IS LARGE,
• THAT IS,

• THEN, THE BINOMIAL CUMULATIVE PROBABILITIES
CAN BE APPROXIMATED BY THE NORMAL PROBABILITIES
WITH THE SAME MEAN OR EXPECTED VALUE np
• AND, THE SAME STANDARD DEVIATION
• SQRT(
npq)

30
EXAMPLE
• TENNESSEE RED CROSS COLLECTED BLOOD FROM 32,000
DONORS. WHAT IS THE PROBABILITY THAT THEY HAD AT
LEAST 1850 DONORS OF THE O-NEGATIVE BLOOD GROUP.
THE PROBABILITY OF SOMEONE HAVING A 0-NEGATIVE
BLOOD TYPE IS 0.06.
• SOLUTION LET X BE SOMEONE OF THE O-NEGATIVE
BLOOD GROUP. THEN THE QUESTION CAN BE FORMULATED
MATHEMATICALLY AS

31
CHAPTER 6.4 HOW LIKELY ARE THE POSSIBLE VALUES OF
A STATISTICS?
• REMINDER A STATISTIC IS A NUMERICAL SUMMARY OF A
SAMPLE DATA. SOME EXAMPLES ARE SAMPLE
PROPORTION, SAMPLE MEAN.
• DEFINITION THE SAMPLING DISTRIBUTION OF A
STATISTIC IS THE PROBABILITY DISTRIBUTION THAT
SPECIFIES PROBABILITIES FOR THE POSSIBLE
VALUES THE STATISTIC CAN TAKE.

32
SAMPLING DISTRIBUTION MODELS FOR PROPORTIONS AND
MEANS
• SAMPLING DISTRIBUTION MODEL FOR A PROPORTION
• PROBLEM FORMULATION SUPPOSE THAT p IS AN UNKNOWN
PROPORTION OF ELEMENTS OF A CERTAIN TYPE S IN A
POPULATION.
• EXAMPLES
• PROPORTION OF LEFT - HANDED PEOPLE
• PROPORTION OF HIGH SCHOOL STUDENTS WHO ARE
FAILING A READING TEST
• PROPORTION OF VOTERS WHO WILL VOTE FOR MR. X.

33
ESTIMATION OF p
• TO ESTIMATE p, WE SELECT A SIMPLE RANDOM SAMPLE
(SRS), OF SIZE SAY, n 1000, AND COMPUTE THE
SAMPLE PROPORTION.
• SUPPOSE THE NUMBER OF THE TYPE WE ARE INTERESTED
IN, IN THIS SAMPLE OF n 1000 IS x 437. THEN
THE SAMPLE PROPORTION
• IS COMPUTED USING THE FORMULA

34
IN THE EXAMPLE ABOVE
35
WHAT IS THE ERROR OF ESTIMATION?
• THAT IS, WHAT IS
• WHAT MODEL CAN HELP US FIND THE BEST
ESTIMATE OF THE TRUE PROPORTION OF p?
• LETS START THE ANALYSIS BY FIRST ANSWERING THE
SECOND QUESTION.

36
APPROACH
• SUPPOSE THAT WE TAKE A SECOND SAMPLE OF SIZE 1000
AND COMPUTE P(HAT) CLEARLY, THE NEW ESTIMATE
WILL BE DIFFERENT FROM 0.437. NOW, TAKE A THIRD
SAMPLE, A FOURTH SAMPLE, UNTIL THE TWO THOUSANDTH
(2000 TH) SAMPLE, EACH OF SIZE 1000. IT IS
OBVIOUS THAT WE WILL LIKELY OBTAIN TWO THOUSAND
DIFFERENT P(HATS) AS ILLUSTRATED IN THE TABLE
BELOW.

37
TABLE OF 2000 SAMPLES OF SIZE EACH n1000, AND
THEIR CORRESPONDING P(HATS)
SAMPLES OF SIZE n P(HATS)

38
WHAT DO WE DO WITH THE DATA FOR P(HATS)?
• WE CONSTRUCT A HISTOGRAM OF THESE 2000 P(HATS).

OF SAMPLES
p
P(HATS)
39
WHAT WE OBSERVE FROM THE HISTOGRAM
• THE HISTOGRAM ABOVE IS AN EXAMPLE OF WHAT WE
WOULD GET IF WE COULD SEE ALL THE PROPORTIONS
FROM ALL POSSIBLE SAMPLES. THAT DISTRIBUTION HAS
A SPECIAL NAME. IT IS CALLED THE SAMPLING
DISTRIBUTION OF THE PROPORTIONS.
• OBSERVE THAT THE HISTOGRAM IS UNIMODAL, ROUGHLY
SYMMETRIC, AND ITS CENTERED AT P WHICH IS THE
TRUE PROPORTION

40
WHAT DOES THE SHAPE OF THE HISTOGRAM REMIND US
ABOUT A MODEL THAT MAY JUST BE THE RIGHT ONE FOR
SAMPLE PROPORTIONS?
• ANSWER IT IS AMAZING AND FORTUNATE THAT A NORMAL
MODEL IS JUST THE RIGHT ONE FOR THE HISTOGRAMS OF
SAMPLE PROPORTIONS.
• HOW GOOD IS THE NORMAL MODEL?
• IT IS GOOD IF THE FOLLOWING ASSUMPTIONS AND
CONDITIONS HOLD.

41
ASSUMPTIONS AND CONDITIONS
• ASSUMPTIONS
• INDEPENDENCE ASSUMPTION THE SAMPLED VALUES MUST
BE INDEPENDENT OF EACH OTHER.
• SAMPLE SIZE ASSUMPTION THE SAMPLE SIZE, n, MUST
BE LARGE ENOUGH
• REMARK ASSUMPTIONS ARE HARD OFTEN IMPOSSIBLE
TO CHECK. THATS WHY WE ASSUME THEM. GLADLY, SOME
CONDITIONS MAY PROVIDE INFORMATION ABOUT THE
ASSUMPTIONS.

42
CONDITIONS
• RANDOMIZATION CONDITION THE DATA VALUES MUST BE
SAMPLED RANDOMLY. IF POSSIBLE, USE SIMPLE RANDOM
SAMPLING DESIGN TO SAMPLE THE POPULATION OF
INTEREST.
• 10 CONDITION THE SAMPLE SIZE, n, MUST BE NO
LARGER THAN 10 OF THE POPULATION OF INTEREST.
• SUCCESS/FAILURE CONDITION THE SAMPLE SIZE HAS TO
BE BIG ENOUGH SO THAT WE EXPECT AT LEAST 10
SUCCESSES AND AT LEAST 10 FAILLURES. THAT IS,

43
THE CENTRAL LIMIT THEOREM FOR THE SAMPLING
DISTRIBUTION OF A PROPORTION
• FOR A LARGE SAMPLE SIZE n, THE SAMPLING
DISTRIBUTION OF P(HAT) IS APPROXIMATELY
• THAT IS, P(HAT) IS NORMAL WITH

44
EXAMPLE 1
• ASSUME THAT 30 OF STUDENTS AT A UNIVERSITY WEAR
CONTACT LENSES
• (A) WE RANDOMLY PICK 100 STUDENTS. LET P(HAT)
REPRESENT THE PROPORTION OF STUDENTS IN THIS
SAMPLE WHO WEAR CONTACTS. WHATS THE APPROPRIATE
MODEL FOR THE DISTRIBUTION OF P(HAT)? SPECIFY THE
NAME OF THE DISTRIBUTION, THE MEAN, AND THE
STANDARD DEVIATION. BE SURE TO VERIFY THAT THE
CONDITIONS ARE MET.
• (B) WHATS THE APPROXIMATE PROBABILITY THAT MORE
THAN ONE THIRD OF THIS SAMPLE WEAR CONTACTS?

45
SOLUTION TO EXAMPLE 1
46
EXAMPLE 2
• INFORMATION ON A PACKET OF SEEDS CLAIMS THAT THE
GERMINATION RATE IS 92. WHATS THE PROBABILITY
THAT MORE THAN 95 OF THE 160 SEEDS IN THE PACKET
WILL GERMINATE? BE SURE TO DISCUSS YOUR
ASSUMPTIONS AND CHECK THE CONDITIONS THAT SUPPORT
YOUR MODEL.
• SOLUTION

47
CHAPTER 6.5 6.6 SAMPLING DISTRIBUTION OF THE
SAMPLE MEAN APPROACH FOR ESTIMATING
SAME AS FOR SAMPLING DISTRIBUTION FOR
PROPORTIONS ILLUSTRATED ABOVE
48
ASSUMPTIONS AND CONDITIONS
• ASSUMPTIONS
• INDEPENDENCE ASSUMPTION THE SAMPLED VALUES MUST
BE INDEPENDENT OF EACH OTHER
• SAMPLE SIZE ASSUMPTION THE SAMPLE SIZE MUST BE
SUFFICIENTLY LARGE.
• REMARK WE CANNOT CHECK THESE DIRECTLY, BUT WE
CAN THINK ABOUT WHETHER THE INDEPENDENCE
ASSUMPTION IS PLAUSIBLE.

49
CONDITIONS
• RANDOMIZATION CONDITION THE DATA VALUES MUST BE
SAMPLED RANDOMLY, OR THE CONCEPT OF A SAMPLING
DISTRIBUTION MAKES NO SENSE. IF POSSIBLE, USE
SIMPLE RANDOM SAMPLING DESIGN TO ABTAIN THE
SAMPLE.
• 10 CONDITION WHEN THE SAMPLE IS DRAWN WITHOUT
REPLACEMENT (AS IS USUALLY THE CASE), THE SAMPLE
SIZE, n, SHOULD BE NO MORE THAN 10 OF THE
POPULATION.
• LARGE ENOUGH SAMPLE CONDITION IF THE POPULATION
IS UNIMODAL AND SYMMETRIC, EVEN A FAIRLY SMALL
SAMPLE IS OKAY. IF THE POPULATION IS STRONGLY
SKEWED, IT CAN TAKE A PRETTY LARGE SAMPLE TO
ALLOW USE OF A NORMAL MODEL TO DESCRIBE THE
DISTRIBUTION OF SAMPLE MEANS

50
CENTRAL LIMIT THEOREM FOR THE SAMPLING
DISTRIBUTION FOR MEANS
• FOR A LARGE ENOUGH SAMPLE SIZE, n, THE SAMPLING
DISTRIBUTION OF THE SAMPLE MEAN IS
APPROXIMATELY
• THAT IS, NORMAL WITH

51
EXAMPLE 3
• SUPPOSE THE MEAN ADULT WEIGHT, , IS 175
POUNDS WITH STANDARD DEVIATION, , OF 25
POUNDS. AN ELEVATOR HAS A WEIGHT LIMIT OF 10
PERSONS OR 2000 POUNDS. WHAT IS THE PROBABILITY
THAT 10 PEOPLE WHO GET ON THE ELEVATOR OVERLOAD
ITS WEIGHT LIMIT?
• SOLUTION

52
EXAMPLE 4
• STATISTICS FROM CORNELLS NORTHEAST REGIONAL
CLIMATE CENTER INDICATE THAT ITHACA, NY, GETS AN
AVERAGE OF 35.4 INCHES OF RAIN EACH YEAR, WITH A
STANDARD DEVIATION OF 4.2 INCHES. ASSUME THAT A
NORMAL MODEL APPLIES.
• (A) DURING WHAT PERCENTAGE OF YEARS DOES ITHACA
GET MORE THAN 40 INCHES OF RAIN?
• (B) LESS THAN HOW MUCH RAIN FALLS IN THE DRIEST
20 OF ALL YEARS?
• (C) A CORNELL UNIVERSITY STUDENT IS IN ITHACA FOR
4 YEARS. LET y (bar) REPRESENT THE MEAN AMOUNT OF
RAIN FOR THOSE 4 YEARS. DESCRIBE THE SAMPLING
DISTRIBUTION MODEL OF THIS SAMPLE MEAN, y (bar).
• (D) WHATS THE PROBABILITY THAT THOSE 4 YEARS
AVERAGE LESS THAN 30 INCHES OF RAIN?

53
SOLUTION TO EXAMPLE 4
54
CHAPTER 7.1 7.2
• CONFIDENCE INTERVALS FOR PROPORTIONS
• ESTIMATION
• POINT ESTIMATION PRODUCES A NUMBER (AN ESTIMATE)
WHICH IS BELIEVED TO BE CLOSE TO THE VALUE OF
UNKNOWN PARAMETER.
• FOR EXAMPLE A CONCLUSION MAYBE THAT PROPORTION
P OF LEFT-HANDED STUDENTS IN MSU IS APPROXIMATELY
O.46

55
SOME POINT ESTIMATORS
PARAMETER ESTIMATOR
PROPORTION P
MEAN
STANDARD DEVIATION S
56
INTERVAL ESTIMATION
• PRODUCES AN INTERVAL THAT CONTAINS THE ESTIMATED
PARAMETER WITH A PRESCRIBED CONFIDENCE.
• A CONFIDENCE INTERVAL OFTEN HAS THE FORM

57
DEFINITION
• GIVEN A CONFIDENCE LEVEL C, THE CRITICAL VALUE
IS THE NUMBER SO THAT THE AREA UNDER THE
PROPER CURVE AND BETWEEN
IS C (IN DECIMALS).

58
SOME CRITICAL VALUES FOR STANDARD NORMAL
DISTRIBUTION
C CONFIDENCE LEVEL CRITICAL VALUE
80 1.282
90 1.645
95 1.960
98 2.326
99 2.576
59
WHAT DOES C CONFIDENCE REALLY MEAN?
• FORMALLY, WHAT WE MEAN IS THAT C OF SAMPLES OF
THIS SIZE WILL PRODUCE CONFIDENCE INTERVALS THAT
CAPTURE THE TRUE PROPORTION.
• C CONFIDENCE MEANS THAT ON AVERAGE, IN C OUT OF
100 ESTIMATIONS, THE INTERVAL WILL CONTAIN THE
TRUE ESTIMATED PARAMETER.
• E.G. A 95 CONFIDENCE MEANS THAT ON THE AVERAGE,
IN 95 OUT OF 100 ESTIMATIONS, THE INTERVAL WILL
CONTAIN THE TRUE ESTIMATED PARAMETER.

60
CONFIDENCE INTERVAL FOR PROPORTION P
ONE-PROPORTION Z-INTERVAL
• ASSUMPTIONS AND CONDITIONS
• RANDOMIZATION CONDITION
• 10 CONDITION
• SAMPLE SIZE ASSUMPTION OR SUCCESS/FAILURE
CONDITION
• INDEPENDENCE ASSUMPTION
• NOTE PROPER RANDOMIZATION CAN HELP ENSURE
INDEPENDENCE.

61
CONSTRUCTING CONFIDENCE INTERVALS
ESTIMATOR SAMPLE PROPORTION
STANDARD ERROR
C MARGIN OF ERROR
C CONFIDENCE INTERVAL

62
SAMPLE SIZE NEEDED TO PRODUCE A CONFIDENCE
INTERVAL WITH A GIVEN MARGIN OF ERROR, ME

• SOLVING FOR n GIVES

• WHERE IS A REASONABLE
GUESS. IF WE CANNOT MAKE A GUESS, WE TAKE

63
EXAMPLE 1
• A MAY 2002 GALLUP POLL FOUND THAT ONLY 8 OF A
RANDOM SAMPLE OF 1012 ADULTS APPROVED OF ATTEMPTS
TO CLONE A HUMAN.
• FIND THE MARGIN OF ERROR FOR THIS POLL IF WE WANT
95 CONFIDENCE IN OUR ESTIMATE OF THE PERCENT OF
AMERICAN ADULTS WHO APPROVE OF CLONING HUMANS.
• EXPLAIN WHAT THAT MARGIN OF ERROR MEANS.
• IF WE ONLY NEED TO BE 90 CONFIDENT, WILL THE
MARGIN OF ERROR BE LARGER OR SMALLER? EXPLAIN.
• FIND THAT MARDIN OF ERROR.
• IN GENERAL, IF ALL OTHER ASPECTS OF THE SITUATION
REMAIN THE SAME, WOULD SMALLER SAMPLES PRODUCE
SMALLER OR LARGER MARGINS OF ERROR?

64
SOLUTION
65
EXAMPLE 2
• DIRECT MAIL ADVERTISERS SEND SOLICITATIONS
(a.k.a. junk mail) TO THOUSANDS OF POTENTIAL
CUSTOMERS IN THE HOPE THAT SOME WILL BUY THE
COMPANYS PRODUCT. THE RESPONSE RATE IS USUALLY
QUITE LOW. SUPPOSE A COMPANY WANTS TO TEST THE
RESPONSE TO A NEW FLYER, AND SENDS IT TO 1000
PEOPLE RANDOMLY SELECTED FROM THEIR MAILING LIST
OF OVER 200,000 PEOPLE. THEY GET ORDERS FROM 123
OF THE RECIPIENTS.
• CREATE A 90 CONFIDENCE INTERVAL FOR THE
PERCENTAGE OF PEOPLE THE COMPANY CONTACTS WHO MAY
BUY SOMETHING.
• EXPLAIN WHAT THIS INTERVAL MEANS.
• EXPLAIN WHAT 90 CONFIDENCE MEANS.
• THE COMPANY MUST DECIDE WHETHER TO NOW DO A MASS
MAILING. THE MAILING WONT BE COST-EFFECTIVE
UNLESS IT PRODUCES AT LEAST A 5 RETURN. WHAT
DOES YOUR CONFIDENCE INTERVAL SUGGEST? EXPLAIN.

66
SOLUTION
67
EXAMPLE 3
• IN 1998 A SAN DIEGO REPRODUCTIVE CLINIC REPORTED
49 BIRTHS TO 207 WOMEN UNDER THE AGE OF 40 WHO
HAD PREVIOUSLY BEEN UNABLE TO CONCEIVE.
• FIND A 90 CONFIDENCE INTERVAL FOR THE SUCCESS
RATE AT THIS CLINIC.
• INTERPRET YOUR INTERVAL IN THIS CONTEXT.
• EXPLAIN WHAT 90 CONFIDENCE MEANS.
• WOULD IT BE MISLEADING FOR THE CLINIC TO
ADVERTISE A 25 SUCCESS RATE? EXPLAIN.
• THE CLINIC WANTS TO CUT THE STATED MARGIN OF
ERROR IN HALF. HOW MANY PATIENTS RESULTS MUST BE
USED?
• DO YOU HAVE ANY CONCERNS ABOUT THIS SAMPLE?
EXPLAIN.

68
SOLUTION
69
CHAPTER 7.3 7.4 CONFIDENCE INTERVALS TO
ESTIMATE A POPULATION MEAN
• NOTES TO BE TAKEN IN CLASS

70
CHAPTER 8 TESTING HYPOTHESES ABOUT PROPORTIONS
• PROBLEM
• SUPPOSE WE TOSSED A COIN 100 TIMES AND WE
OBTAINED 38 HEADS AND 62 TAILS. IS THE COIN
BIASED?
• THERE IS NO WAY TO SAY YES OR NO WITH 100
CERTAINTY. BUT WE MAY EVALUATE THE STRENGTH OF
SUPPORT TO THE HYPOTHESIS THAT THE COIN IS
BIASED.

71
TESTING
• HYPOTHESES
• NULL HYPOTHESIS
• ESTABLISHED FACT
• A STATEMENT THAT WE EXPECT DATA TO CONTRADICT
• NO CHANGE OF PARAMETERS.
• ALTERNATIVE HYPOTHESIS
• NEW CONJECTURE
• YOUR CLAIM
• A STATEMENT THAT NEEDS A STRONG SUPPORT FROM DATA
TO CLAIM IT
• CHANGE OF PARAMETERS

72
IN OUR PROBLEM
73
EXAMPLE
• WRITE THE NULL AND ALTERNATIVE HYPOTHESES YOU
WOULD USE TO TEST EACH OF THE FOLLOWING
SITUATIONS.
• (A) IN THE 1950s ONLY ABOUT 40 OF HIGH SCHOOL
GRADUATES WENT ON TO COLLEGE. HAS THE PERCENTAGE
CHANGED?
• (B) 20 OF CARS OF A CERTAIN MODEL HAVE NEEDED
COSTLY TRANSMISSION WORK AFTER BEING DRIVEN
BETWEEN 50,000 AND 100,000 MILES. THE
MANUFACTURER HOPES THAT REDESIGN OF A
TRANSMISSION COMPONENT HAS SOLVED THIS PROBLEM.
• (C) WE FIELD TEST A NEW FLAVOR SOFT DRINK,
PLANNING TO MARKET IT ONLY IF WE ARE SURE THAT
OVER 60 OF THE PEOPLE LIKE THE FLAVOR.

74
ATTITUDE
• ASSUME THAT THE NULL HYPOTHESIS
• IS TRUE AND UPHOLD IT,
• UNLESS DATA STRONGLY SPEAKS
• AGAINST IT.

75
TEST MECHANIC
• FROM DATA, COMPUTE THE VALUE OF A PROPER TEST
STATISTICS, THAT IS, THE Z-STATISTICS.
• IF IT IS FAR FROM WHAT IS EXPECTED UNDER THE
NULL HYPOTHESIS ASSUMPTION, THEN WE REJECT THE
NULL HYPOTHESIS.

76
COMPUTATION OF THE Z STATISTICS OR PROPER TEST
STATISTICS
77
CONSIDERING THE EXAMPLE AT THE BEGINNING
78
THE P VALUE AND ITS COMPUTATION
• THE PROBABILITY THAT IF THE NULL HYPOTHESIS IS
CORRECT, THE TEST STATISTIC TAKES THE OBSERVED OR
MORE EXTREME VALUE.
• P VALUE MEASURES THE STRENGTH OF EVIDENCE
AGAINST THE NULL HYPOTHESIS. THE SMALLER THE P
VALUE, THE STRONGER THE EVIDENCE AGAINST THE NULL
HYPOTHESIS.

79
THE WAY THE ALTERNATIVE HYPOTHESIS IS WRITTEN IS
HELPFUL IN COMPUTING THE P - VALUE
NORMAL CURVE

80
IN OUR EXAMPLE,
• P VALUE P( z lt - 2.4) 0.0082
• INTERPRETATION IF THE COIN IS FAIR, THEN THE
PROBABILITY OF OBSERVING 38 OR FEWER HEADS IN 100
TOSSES IS 0.0082

81
CONCLUSION GIVEN SIGNIFICANCE LEVEL 0.05
• WE REJECT THE NULL HYPOTHESIS IF THE P VALUE IS
LESS THAN THE SIGNIFICANCE LEVEL OR ALPHA LEVEL.
• WE FAIL TO REJECT THE NULL HYPOTHESIS (I.E. WE
RETAIN THE NULL HYPOTHESIS) IF THE P VALUE IS
GREATER THAN THE SIGNIFICANCE LEVEL OR ALPHA
LEVEL.

82
ASSUMPTIONS AND CONDITIONS
• RANDOMIZATION
• INDEPENDENT OBSERVATIONS
• 10 CONDITION
• SUCCESS/FAILURE CONDITION

83
EXAMPLE 1
• THE NATIONAL CENTER FOR EDUCATION STATISTICS
MONITORS MANY ASPECTS OF ELEMENTARY AND SECONDARY
EDUCATION NATIONWIDE. THEIR 1996 NUMBERS ARE
OFTEN USED AS A BASELINE TO ASSESS CHANGES. IN
1996, 31 OF STUDENTS REPORTED THAT THEIR MOTHERS
HAD GRADUATED FROM COLLEGE. IN 2000, RESPONSES
FROM 8368 STUDENTS FOUND THAT THIS FIGURE HAD
GROWN TO 32. IS THIS EVIDENCE OF A CHANGE IN
EDUCATION LEVEL AMONG MOTHERS?

84
EXAMPLE 1 CONTD
• (A) WRITE APPROPRIATE HYPOTHESES.
• (B) CHECK THE ASSUMPTIONS AND CONDITIONS.
• (C) PERFORM THE TEST AND FIND THE P VALUE.
• (D) STATE YOUR CONCLUSION.
• (E) DO YOU THINK THIS DIFFERENCE IS MEANINGFUL?
EXPLAIN.

85
SOLUTION
86
EXAMPLE 2
• IN THE 1980s IT WAS GENERALLY BELIEVED THAT
CONGENITAL ABNORMALITIES AFFECTED ABOUT 5 OF THE
NATIONS CHILDREN. SOME PEOPLE BELIEVE THAT THE
INCREASE IN THE NUMBER OF CHEMICALS IN THE
ENVIRONMENT HAS LED TO AN INCREASE IN THE
INCIDENCE OF ABNORMALITIES. A RECENT STUDY
EXAMINED 384 CHILDREN AND FOUND THAT 46 OF THEM
SHOWED SIGNS OF AN ABNORMALITY. IS THIS STRONG
EVIDENCE THAT THE RISK HAS INCREASED? ( WE
CONSIDER A P VALUE OF AROUND 5 TO REPRESENT
STRONG EVIDENCE.)

87
EXAMPLE 2 CONTD
• (A) WRITE APPROPRIATE HYPOTHESES.
• (B) CHECK THE NECESSARY ASSUMPTIONS.
• (C) PERFORM THE MECHANICS OF THE TEST. WHAT IS
THE P VALUE?
• (D) EXPLAIN CAREFULLY WHAT THE P VALUE MEANS IN
THIS CONTEXT.
• (E) WHATS YOUR CONCLUSION?
• (F) DO ENVIRONMENTAL CHEMICALS CAUSE CONGENITAL
ABNORMALITIES?

88
SOLUTION
89
CHAPTER 8 CONTD TESTING HYPOTHESES ABOUT MEANS
• NOTES TO BE TAKEN IN CLASS
About PowerShow.com