Title: STAT131 Week 4 Lecture 2 Modelling Variation: Introduction to modelling and GOF
1STAT131Week 4 Lecture 2 Modelling Variation
Introduction to modelling and GOF
- Anne Porter
- alp_at_uow.edu.au
2Activity Lets play beat the butcher
- Morning radio 6am -7am, weekdays
- Contestant telephones in to play
- Contestant has to say stop before the gong rings
to win the meat - Radio personality reads the meat items 2 slices
of scotch fillet,,3kg mince, until the gong is
reached
3The list
Lets play, all stand, Ill read, you sit when
you have enough meat. Last ones standing before
the gong win.
- 1) Three kilos scotch fillet
4) 12 chicken kebabs
5) 12 lamb kebabs
2) 1 chicken
3) 3 kilos of sausages
6) 3 livers
9) 2kg salmon rissoles
7) 1 kg bacon
8) lamb chops
4How might you increase your chances of
winning?What information would be useful before
you play again?
5How might you increase your chances of
winning?What information would be useful before
you play again?
- What is the maximum and minimum number of items
ever read out? - What is the voice pattern over the gonged items?
- What is the average number of items read out
before the Gong? - What is the frequency of gongs over time for each
item?
6Frequency distribution of the number of items
before the gong
What is a more informative way of presenting
the data so we optimise our chance of where to
stop?
7Relative frequency table
What is a better way of presenting this
information so it is easier to use?
8Cumulative frequency
What will be the median number of items before
the gong?
9Cumulative frequency
What will be the median number of items before
the gong?
Is the (n1)/2th value the 50.5th value 8
10Frequency distribution of the number of items
before the gong
What is the average number of items read before
the gong?
11What do we do to calculate the mean number of
items before the gong?
12What do we do to calculate the mean number of
items before the gong?
Multiply the number of items by the Frequency
AND add to get the total number of items before
the gong AND divide by the number of games
played
13Calculate the mean
14Calculate the mean
784/100 7.84 Items before the gong
15Will your stopping strategy be the same for this
set of data?
Why not?
16Will your stopping strategy be the same for this
set of data?
Why not?
For these values of x we have a much smaller
spread
17In the long run what should be the probability of
stopping at each number if stopping at random?
18P(Xx) and number expected for each item for the
random stopping model
Does it appear that the data fit the
random stopping model? Why so?
19P(Xx) and number expected for each item for the
random stopping model
Does it appear that the data fit the
random stopping model? Why so?
Number expected differs from number observed.
20Bar Chart Compare observed expected
frequencies
21Measuring the difference between O and E
How do we Measure (compare, calculate) the
difference between observed and expected
22P(Xx) and number expected for each item for the
random stopping model
How might we calculate the difference between
observed and expected
If the data fits will this be big or small?
23P(Xx) and number expected for each item for the
random stopping model
How might we calculate the difference between
observed and expected
If the data fits will this be big or small?
small
24Calculating
25Calculating
26Calculating
27Model Fit Using
- Calculate
- And see if it is too large for the data to be
considered to fit the model
28Model Fit Informal Is too big?
- If
- Where dg-p-1
- g is the number of cells
- p is the number of parameters estimated from the
data - Then there is evidence the data does not fit the
model
29Model Fit Informal Is too big?
- If
- Where dg-p-1
- g is the number of cells
- p is the number of parameters estimated from the
data - Then there is evidence the data does not fit the
model
For our example g
10 cells therefore d10-0-19
17.49
Decision As 65.6 gt17.49 there
is evidence that the data do not fit the random
stopping model
30Model Fit Formal
- Decision If calculated gt
critical value of (tables) then there is
evidence of lack of fit - a0.05 (typical and we will use)
- dfNumber of cells number of estimated
parameters-1 - df 10-0-19
-
31Model Fit Formal
- Decision As calculated
65.6 gt critical value of 16.919 found in the
tables there is evidence of lack of fit between
the data and the random stopping model. -
32Lack of fit
Looking at the table we can see most lack of fit
occurs for items 2, 3, 8 and 9 lots of meat
before the gong
33Sampling Distributions
- We will explore how these types of sampling
distributions, are generated in
our lecture on sampling distributions. - We will also explore how we chose a value of a
- We will look at using the data to estimate
parameters later
34Model fit approaches
- Use a Bar chart to compare observed and expected
frequencies - Compare observed and expected frequencies
- Calculate and use
- Informally
- Formally
- assumes that the expected counts in each cell is
5 - If not combine cells. Other literature uses other
rules, there is a debate over this. - (Check the Utts Heckard (2004) definition)
35Mean (expected value, E(X)) for the random
stopping model
36Expected value for the random stopping model is?
E(X)6.5
37Spread of the Population Model
We will leave calculation of these till a little
later on a simpler example
38What have we been doing?
- We have been looking at the centre, spread,
outliers and shape of samples of data? - With a view to improving decision making.
- Why are we concerned with looking at models?
39Describing characteristics of Data
- We collect data on samples
- Time in seconds until two species of flies
released together mate - The number of lost articles found in a large
municipal office - The average carbohydrate content per 100 gm serve
in a sample of different species - The number of items of meat read before the gong
40Improving our decisions
- Looking at
- The shape of the distribution
- Centre
- Spread
- Whether or not the data fit some model
- May even look at outliers, points not fitting the
model
41Describing Batches of Data
- Comparing midterm marks from the different
versions of the test. - Are the papers completed in a similar manner?
42What we are really looking at is NOT
- The mating behaviour of these particular flies
- Past lost articles
- Or last years exam papers
- Or the last 100 games of beat the butcher
We are interested in them because they
may suggest a model for the characteristics of
the data in general. This involves Probability
Models. We shall continue to explore probability
models in future lectures.