F73DB3 CATEGORICAL DATA ANALYSIS - PowerPoint PPT Presentation

About This Presentation
Title:

F73DB3 CATEGORICAL DATA ANALYSIS

Description:

Multinomial data (more than 2 outcomes) Poisson data ... (a2) Multinomial data (table total fixed) H0: variables/responses are independent ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 155
Provided by: rogerja1
Category:

less

Transcript and Presenter's Notes

Title: F73DB3 CATEGORICAL DATA ANALYSIS


1
hwu
  • F73DB3 CATEGORICAL DATA ANALYSIS
  • Workbook
  • Contents page
  • Preface
  • Aims
  • Summary
  • Content/structure/syllabus
  • plus other information
  • Background computing (R)

2
hwu
  • Examples
  • Single classifications (1-13)
  • Two-way classifications (14-27)
  • Three-way classifications (28-32)

3
hwu
Example 1 Eye colours
Colour A B C D
Frequency observed 89 66 60 85
4
hwu
Example 2 Prussian cavalry deaths
  • Numbers killed in each unit in each year
  • - frequency table

Number killed 0 1 2 3 4 ?5 Total
Frequency observed 144 91 32 11 2 0 280
5
hwu
Example 2 Prussian cavalry deaths
(b) Numbers killed in each unit in each year
raw data 0 0 1 0 0 2 0 0 0 0 . . . . . . . .
. . . . . . . . . . . . . . . 0 0 0 2 0 1 0 1 2
0 1 . . . . . . . . . . . . . . . . . . . . . .
. .0 .. .. 3 0 0 1 0 0 2 1 0 0 1 0 0 1 0 0 1 1
2 0 1 0 1 1
6
hwu
Example 2 Prussian cavalry deaths
(c) Total numbers killed each year
1875 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
3 5 7 9 10 18 6 14 11 9 5 11 15 6 11 17 12 15 8 4
7
hwu
Example 4 Political views
1 2 3 4 5 6 7 (very L) (centre) (very R) Dont Know Total
46 179 196 559 232 150 35 93 1490
8
hwu
Example 7 Vehicle repair visits
Number of visits 0 1 2 3 4 5 ?6 Total
Frequency observed 295 190 53 5 5 2 0 550
9
hwu
Example 15 Patients in clinical trial
Drug Placebo Total
Side-effects 15 4 19
No side-effects 35 46 81
Total 50 50 100
10
hwu
  • 1 INTRODUCTION
  • Data are counts/frequencies (not measurements)
  • Categories (explanatory variable)
  • Distribution in the cells (response)
  • Frequency distribution
  • Single classifications
  • Two-way classifications

11
hwu
Illustration 1.1
B Cause of death B Cause of death
Cancer Other
A Smoking status Smoker 30 20
A Smoking status Not smoker 15 35
12
hwu
  • Data may arise as
  • Bernoulli/binomial data (2 outcomes)
  • Multinomial data (more than 2 outcomes)
  • Poisson data
  • Negative binomial data the version with
    range x 0,1,2,

13
hwu
2 POISSON PROCESS AND ASSOCIATED
DISTRIBUTIONS
14
hwu
2.1 Bernoulli trials and related
distributions Number of successes
binomial distribution Time before kth success
negative binomial distribution Time
to first success geometric
distribution Conditional distribution of
success times
15
hwu
2.2 Poisson process and related distributions
????????????????????????????????????????????
????????????????????????????????????????????
????????????????????????????????????????????
time ?
16
hwu
Poisson process with rate ? Number of events in
a time interval of length t, Nt , has a Poisson
distribution with mean ?t
17
hwu
Poisson process with rate ? Inter-event time, T,
has an exponential distribution with parameter ?
(mean 1/?)
18
hwu
Conditional distribution of number of events
given n events in time (0,t)
????????????????????????????????????????????
how many in time (0,s) (s lt t)?
??????????????????????????????
19
hwu
Conditional distribution of number of events
given n events in time (0,t)
????????????????????????????????????????????
how many in time (0,s) (s lt t)?
??????????????????????????????
Answer NsNt n B(n,s/t)
20
hwu
Splitting into subprocesses ?????????????????????
????????????????????????? ??????????????????????
???????????????????????? ????????????????????????
?????????????????????? time ?
21
hwu
Realisation of a Poisson process
events
time
22
hwu
X Pn(?), Y Pn(?) X,Y independent then
we know X Y Pn(? ?) Given X Y n, what
is distribution of X?
23
hwu
X Pn(?), Y Pn(?) X,Y independent then
we know X Y Pn(??) Given X Y n, what
is distribution of X? Answer XXYn B(n,p)
where p ?/(? ?)
24
hwu
2.3 Inference for the Poisson distribution
Ni , i 1, 2, , r, i. i. d. Pn(?), NSNi
25
hwu
CI for ?
.
26
hwu
2.4 Dispersion and LR tests for Poisson data
Homogeneity hypothesis H0 the Ni ?s are i. i.
d. Pn(?) (for
some unknown?)
Dispersion statistic
(M sample mean)
27
hwu
Likelihood ratio statistic
form for calculation see p18 ??
28
hwu
3 SINGLE CLASSIFICATIONS Binary
classifications (a) N1 , N2 independent
Poisson, with Ni Pn(?i) or (b)
fixed sample size, N1 N2 n, with N1
B(n,p1) where p1 ?1/(?1 ?2)
29
hwu
Qualitative categories (a) N1 , N2, , Nr
independent Poisson, with Ni Pn(?i)
or (b) fixed sample size n, with joint
multinomial distribution Mn(np)
30
hwu
Testing goodness of fit H0 pi ?i , i 1,2,
, r
This is the (Pearson) chi-square statistic
31
hwu
The statistic often appears as
32
hwu
33
hwu
An alternative statistic is the LR statistic
34
hwu
Sparse data/small expected frequencies
ensure mi ? 1 for all cells, and
mi ? 5 for at least about 80
of the cells if not - combine adjacent
cells sensibly
35
hwu
Goodness-of-fit tests for frequency
distributions - very well-known application of
the
statistic (see Illustration 3.4 p 22/23)
36
hwu
Residuals (standardised)
37
hwu
Residuals (standardised)
simpler version
38
hwu
MAJOR ILLUSTRATION 1 Publish and be modelled
Number of papers per author 1 2 3
4 5 6 7 8 9 10
11 Number of authors 1062 263 120 50
22 7 6 2 0 1 1
Model
39
hwu
MAJOR ILLUSTRATION 2 Birds in hedges
Hedge type i A B C D E F
G Hedge length (m) li 2320 2460 2455
2805 2335 2645 2099 Number of pairs ni
14 16 14 26 15 40 71
Model Ni Pn(?ili)
40
hwu
4 TWO-WAY CLASSIFICATIONS
Example 14 Numbers of mice bearing tumours
in treated and control groups
Treated Control Total
Tumours 4 5 9
No tumours 12 74 86
Total 16 79 95
41
hwu
Example 15 Patients in clinical trial
Drug Placebo Total
Side-effects 15 4 19
No side-effects 35 46 81
Total 50 50 100
42
hwu
Patients in clinical trial take 2
Drug Placebo Total
Side-effects 15 15 30
No side-effects 35 35 70
Total 50 50 100
43
hwu
  • 4.1 Factors and responses
  • F R tables
  • R F , R R
  • (F F ?)
  • Qualitative, ordered, quantitative
  • Analysis the same - interpretation may be
  • different

44
hwu
  • A two-way table is often called a
  • contingency table
  • (especially in R ? R case).

45
hwu
Notation (2 ? 2 case, easily extended)
Exposed Not exposed Total
Disease n11 n12 n1?
No disease n21 n22 n2?
Total n?1 n?2 n?? n
46
hwu
Three possibilities One overall sample, each
subject classified according to 2 attributes -
this is R R Retrospective study Prospective
study (use of treated and control groups drug
and placebo etc)
47
hwu
4.2 Distribution theory and tests for r s
tables
(a) R R case (a1) Nij Pn(?ij) ,
independent or, with fixed table total (a2)
Condition on n SSnij Nn Mn(n p)
where N Nij , p pij.
48
hwu
(b) F R case Condition on the observed
marginal totals nj Snij for the s
categories of F (? condition on n and n1)
? s independent multinomials

49
hwu
Usual hypotheses (a1) Nij Pn(?ij) ,
independent H0 variables/responses are
independent ?ij ?i ?j / ? k?i
(a2) Multinomial data (table total fixed)
H0 variables/responses are independent
P(row i and column j)
P(row i)P(column j)
50
hwu
(b) Condition on n and nj (fixed column totals)
Nij Bi(nj , pij) j 1,2, , s
independent H0 response is homogeneous (pij
pi for all j) i.e. response has the same
distribution for all levels of the factor
51
hwu
Tests of H0 The ?2 (Pearson) statistic
where mij ni ? nj /n as before
52
hwu
Tests of H0 The ?2 (Pearson) statistic
where mij ni ? nj /n as before
53
hwu
OR test based on the LR statistic Y2
Illustration tonsils data see p27 In
R Pearson/X2 read data in using matrix then
use chisq.test LR Y2 calculate it directly
(or get it from the results of fitting a
log-linear model- see later)
54
hwu
4.3 The 2 ? 2 table
Statistical tests (a) Using Pearsons ?2
Drug Placebo Total
Side-effects 15 4 19
No side-effects 35 46 81
Total 50 50 100
55
hwu
where mij ni ? nj /n
56
hwu
Yates (continuity) correction Subtract 0.5 from
O E before squaring it Performing the test
in R n.patmatrix(c(15,35,4,46),2,2) chisq.test(n.
pat)
57
hwu
(b) Using deviance/LR statistic Y2 (c)
Comparing binomial probabilities (d) Fishers
exact test
58
hwu
Drug Placebo Total
Side-effects 15 4 N 19
No side-effects 35 46 81
Total 50 50 100
59
hwu
Under a random allocation
one-sided P-value P(N ? 4) 0.0047
60
hwu
4.4 Log odds, combining and collapsing
tables, interactions
In the 2 ? 2 table, the H0 independence
condition is equivalent to
?11?22 ?12?21 Let ? log(?11?22
/?12?21) Then we have H0 ? 0 ? is the
log odds ratio
61
hwu
The ? 0 hypothesis is often called the no
association hypothesis.
62
hwu
The odds ratio is
?11?22 /?12?21
Sample equivalent is
63
hwu
The odds ratio (or log odds ratio) provides a
measure of association for the factors in the
table. no association ? odds ratio 1 ? log
odds ratio 0
64
hwu
Dont combine heterogeneous tables!
65
hwu
Interaction An interaction exists between two
factors when the effect of one factor is
different at different levels of another factor.
66
hwu
67
hwu
68
hwu
5 INTRODUCTION TO GENERALISED LINEAR MODELS
(GLMs) Normal linear model Yx N with
EYx? ?x or EYx ?0 ?1x1 ?2x2
?rxr ? ?x i.e. EYx ?(x) ? ?x
69
hwu
We are explaining ?(x) using a linear predictor
(a linear function of the explanatory
data) Generalised linear model Now we set
g(?(x)) ? ?x for some function g We explain
g(?(x)) using a linear function of the
explanatory data, where g is called the link
function
70
hwu
e.g. modelling a Poisson mean ? we use a log link
g(?) log? We use a linear predictor to
explain log? rather than ? itself the model is
Yx Pn with mean ?x with log ?x ?
?x or log ?x ? ?x This is a log-linear model
71
hwu
An example is a trend model in which we use log?i
? ? i Another example is a cyclic model
in which we use log?i ?0 ?1 cos?i ?2 sin?i
72
hwu
6 MODELS FOR SINGLE CLASSIFICATIONS 6.1 Single
classifications - trend models Data numbers in
r categories Model Ni , i 1, 2, , r,
independent Pn(?i)
73
hwu
Basic case H0 ?is equal v H1 ?is follow a
trend Let Xj be category of observation j P(Xj
i) 1/r
Test based on
see Illustration 6.1
74
hwu
A more general model Ni independent Pn(?i)
with
Log-linear model
75
hwu
It is a linear regression model for log?i and a
non-linear regression model for ?i . It is a
generalised linear model. Here the link
between the parameter we are estimating and the
linear estimator is the log function - it is a
log link.
76
hwu
Fitting in R Example 13 stressful events data
gtnc(15,11, , 1, 4) gtrlength(n) gti1r

77
hwu
gtnc(15,11, , 1, 4) response
vector gtrlength(n) gti1r
explanatory vector model gtstressglm(ni,f
amilypoisson)
78
hwu
gtsummary(stress) Call glm(formula n i,
family poisson)
model being
fitted Deviance Residuals Min 1Q
Median 3Q Max -1.9886 -0.9631
0.1737 0.5131 2.0362

summary information on the
residuals Coefficients
Estimate Std. Error z value Pr(gtz)
(Intercept) 2.80316 0.14816 18.920
lt 2e-16 i -0.08377
0.01680 -4.986 6.15e-07
information on the fitted
parameters
79
hwu
Signif. codes 0 ' 0.001 ' 0.01 ' 0.05
.' 0.1 ' 1 (Dispersion parameter for poisson
family taken to be 1) Null deviance 50.843 on
17 degrees of freedom Residual
deviance 24.570 on 16 degrees of freedom

deviances (Y2 statistics) AIC 95.825 Number of
Fisher Scoring iterations 4
80
hwu
Fitted mean is
e.g. for date 6, i 6 and fitted mean is
exp(2.30054) 9.980
81
hwu
Fitted model
82
hwu
Test of H0 no trend ? the null fit, all fitted
values equal (to the observed mean) Y2 50.84
( ?2 on 17df)
The trend model ? fitted values
exp(2.80316-0.08377i) Y2 24.57 ( ?2 on
16df) Crude 95 CI for slope is -0.084
2(0.0168) i.e. -0.084 0.034
83
hwu
The lower the value of the residual deviance,
the better in general is the fit of the model.
84
hwu
Basic residuals
85
hwu
6.2 Taking into account a deterministic
denominator using an offset for the
exposure
See the Gompertz model example (p 40, data in
Example 26)
Model Nx Pn(?x) where ENx
?x Exb?x log?x logEx c dx
86
hwu
We include a term offset(logE) in the
formula for the linear predictor in R model
glm(n.deaths age
offset(log(exposure)), family poisson)
Fitted value is the estimate of the expected
response per unit of exposure (i.e. per unit of
the offset E)
87
hwu
7 LOGISTIC REGRESSION
  • for modelling proportions
  • we have a binary response for each item
  • and a quantitative explanatory variable

for example dependence of the proportion of
insects killed in a chamber on the concentration
of a chemical present we want to predict the
proportion killed from the concentration
88
hwu
  • for example dependence of the proportion of
  • women who smoke - on age
  • metal bars on test which fail - on pressure

  • applied
  • policies which give rise to claims on sum

  • insured

Model successes at value xi of explanatory
variable Ni bi(ni , pi)
89
hwu
We use a glm we do not predict pi directly we
predict a function of pi called the logit of pi.
The logit function is given by
It is the log odds.
90
See Illustration 7.1 p 43 proportion v dose
91
logit(proportion) v dose
92
hwu
This leads to the logistic regression model
c.f. log linear model Ni Poisson(?i) with
log ?i a bxi
93
hwu
We are using a logit link
We use a linear predictor to explain
rather than ? itself
94
hwu
The method based on the use of this model is
called logistic regression
95
hwu
Data explanatory successes group
observed variable value
size proportion x1 n11
n1 n11/n1 x2
n21 n2 n21/n2
. xs
ns1 ns ns1/ns
96
hwu
In R we declare the proportion of successes
as the response and include the group sizes as a
set of weights
drug.mod1 glm(propdead dose,
weights groupsize, family binomial)
explanatory vector is dose
note the family declaration
97
hwu
RHS of model can be extended if required to
include additional explanatory variables and
factors e.g. mod3 glm(mat3
agesocialclassgender)
98
hwu
drug.mod see output p44 Coefficients very
highly significant () Null deviance 298 on
9df Residual deviance 17.2 on 8df
But residual v fitted plot and fitted v
observed proportions plot
99
hwu
100
hwu
101
hwu
model with a quadratic term (dose2)
102
hwu
8 MODELS FOR TWO-WAY AND THREE-WAY
CLASSIFICATIONS
8.1 Log-linear models for two-way
classifications Nij Pn(?ij) , i 1,2, , r
j 1,2, , s H0 variables are
independent ?ij ?i ?j / ?
103
hwu
?log?ij log?i log?j ? log?
? ? ?
row effect ? overall effect
? column effect
104
hwu
We explain log ?ij in terms of additive
effects log?ij ? ai ßj Fitted
values are the expected frequencies
Fitting process gives us the value of Y2 -2log?
105
hwu
Fitting a log-linear model
Nij Pn(?ij) , independent, with
log?ij ? ai ßj Declare the response
vector (the cell frequencies) and the row/column
codes as factors then use gt name glm()
106
hwu
Tonsils data (Example 16) n.tonsils
c(19,497,29,560,24,269) rc factor(c(1,2,1,2,1,2)
) cc factor(c(1,1,2,2,3,3)) tonsils.mod1
glm(n.tonsils rc cc,
familypoisson)
107
Call glm(formula n.tonsils2 rc cc, family poisson) Deviance Residuals 1 2 3 4 5 6 -1.54915 0.34153 -0.24416 0.05645 2.11018 -0.53736 Coefficients Estimate Std. Error z value Pr(gtz) (Intercept) 3.27998 0.12287 26.696 lt 2e-16 rc2 2.91326 0.12094 24.087 lt 2e-16 cc2 0.13232 0.06030 2.195 0.0282 cc3 -0.56593 0.07315 -7.737 1.02e-14 --- Null deviance 1487.217 on 5 degrees of freedom Residual deviance 7.321 on 2 degrees of freedom ? Y2 - 2log?
108
hwu
The fit of the independent attributes model is
not good
109
hwu
Patients data (Example 15)
  • gt n.patients c(15, 4, 35, 46)
  • gt rc factor(c(1, 1, 2, 2))
  • gt cc factor(c(1, 2, 1, 2))
  • gt pat.mod1 glm(n.patients rc cc,
  • family
    poisson)

110
Call glm(formula n.patients rc cc, family poisson) Deviance Residuals 1 2 3 4 1.6440 -2.0199 -0.8850 0.8457 Coefficients Estimate Std. Error z value Pr(gtz) (Intercept) 2.251e00 2.502e-01 8.996 lt 2e-16 rc2 1.450e00 2.549e-01 5.689 1.28e-08 cc2 2.184e-10 2.000e-01 1.09e-09 1 --- Signif. codes 0 ' 0.001 ' 0.01 ' 0.05 .' 0.1 ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance 49.6661 on 3 degrees of freedom Residual deviance 8.2812 on 1 degrees of freedom AIC 33.172
111
hwu
fitted coefficients coef(pat.mod1) (Intercept)
rc2 cc2 2.251292e00
1.450010e00 2.183513e-10 fitted values
fitted(pat.mod1) 1 2 3 4
9.5 9.5 40.5 40.5
112
hwu
Estimates are
Predictors for cells 1,1 and 1,2 are 2.251292
exp(2.251292) 9.5
Predictors for cells 2,1 and 2,2 are
2.251292 1.450010 3.701302
exp(3.701302) 40.5
113
hwu
Residual deviance 8.2812 on 1 degree of
freedom ?
Y2 for testing the model i.e. for testing H0
response is homogeneous/
column distributions are the same/
no association between response and
treatment group The lower the value
of the residual deviance, the better in general
is the fit of the model. Here the fit of the
additive model is very poor (we have of course
already concluded that there is an association
P-value about 1).
114
hwu
8.2 Two-way classifications - taking into account
a deterministic denominator See the grouse
data (Illustration 8.3 p50, data in Example
25) Model Nij Pn(?ij) where ENij
?ij Eij exp(? ai ßj)
logENij/Eij ? ai ßj i.e.
log?ij logEij ? ai ßj
115
hwu
We include a term offset(logE) in the
formula for the linear predictor Fitted value is
the estimate of the expected response per unit
of exposure (i.e. per unit of the offset E)
116
hwu
8.3 Log-linear models for three-way
classifications Each subject classified
according to 3 factors/variables with r,s,t
levels respecitvely Nijk Pn(?ijk) with log
?ijk ? ai ßj ?k
(aß)ij (a?)ik (ß?)jk (aß?)ijk r ? s ? t
parameters
117
hwu
Recall interaction
Model with two factors and an interaction (no
longer additive) is log ?ij ? ai ßj
(aß)ij
118
hwu
8.4 Hierarchic log-linear models
Interpretation!
Range of possible models/dependencies From 1
Complete independence model formula A B
C link log ?ijk ? ai ßj ?k notation
ABC df rst r s t 2
119
hwu
. through 2 One interaction (B and C say) model
formula A BC link log ?ijk ? ai ßj
?k (ß?)jk notation ABC df rst r st
1
120
hwu
. to 5 All possible interactions model formula
ABC notation ABC df 0
121
hwu
Model selection by backward
elimination or forward
selection through the hierarchy of
models containing all 3 variables
122
hwu
saturated
ABC
AB AC BC AB AC
AB BC ACBC AB
C A BC AC B
A B C
independence
123
hwu
Our models can include mean (intercept) factor
effects 2-way interactions 3-way interaction
124
hwu
Illustration 8.4 Models for lizards data

(Example 29) liz array(c(32, 86, 11, 35, 61,
73, 41, 70),
dim c(2, 2, 2)) n.liz
as.vector(liz) s factor(c(1,1,1,1,2,2,2,2))
? species d factor(c(1, 1, 2, 2, 1, 1, 2,
2)) ? diameter of

perch h factor(c(1,2,1,2,1,2,1,2)) ? height of
perch
125
hwu
Forward selection liz.mod1 glm(n.liz s d
h, family poisson) liz.mod2 glm(n.liz sd
h, family poisson) liz.mod3 glm(n.liz
s dh, family poisson) liz.mod4
glm(n.liz sh d, family poisson) liz.mod5
glm(n.liz sd sh, family poisson) liz.mod6
glm(n.liz sd dh, family poisson)
126
hwu
Forward selection liz.mod1 glm(n.liz s d
h, family poisson)

25.04 on 4df liz.mod2 glm(n.liz sd h,
family poisson)
12.43 on
3df liz.mod5 glm(n.liz sd sh, family
poisson) liz.mod6 glm(n.liz sd dh, family
poisson)
127
hwu
Forward selection liz.mod1 glm(n.liz s d
h, family poisson) liz.mod2 glm(n.liz
sd h, family poisson) liz.mod5
glm(n.liz sd sh, family poisson)

2.03 on 2df
128
hwu
gt summary(liz.mod5) Call glm(formula n.liz s
d s h, family poisson) Coefficients
Estimate Std. Error z value
Pr(gtz) (Intercept) 3.4320 0.1601
21.436 lt 2e-16 s2
0.5895 0.1970 2.992 0.002769
d2 -0.9420 0.1738 -5.420
5.97e-08 h2 1.0346
0.1775 5.827 5.63e-09 s2d2
0.7537 0.2161 3.488
0.000486 s2h2 -0.6967 0.2198
-3.170 0.001526 Null deviance
98.5830 on 7 degrees of freedom Residual
deviance 2.0256 on 2 degrees of freedom
129
hwu
130
hwu
131
hwu
FIN
132
hwu
MAJOR ILLUSTRATION 1
Number of papers per author 1 2 3
4 5 6 7 8 9 10
11 Number of authors 1062 263 120 50
22 7 6 2 0 1 1
Model
133
hwu
134
hwu
135
hwu
136
hwu
MAJOR ILLUSTRATION 2
Hedge type i A B C D E F
G Hedge length (m) li 2320 2460 2455
2805 2335 2645 2099 Number of pairs ni
14 16 14 26 15 40 71
Model Ni Pn(?ili)
137
hwu
138
hwu
Cyclic models
139
hwu
Model Ni independent Pn(?i) with
Explanatory variable the category/month i has
been transformed into an angle ?i
140
hwu
It is another example of a
non-linear regression model for Poisson
responses. It is a generalised
linear model.
141
hwu
Fitting in R gtnc(40, 34, , 33, 38)
response vector gtrlength(n) gti1r gtth2pii/r
explanatory vector model gtleukg
lm(ncos(th) sin(th),familypoisson)
142
hwu
Fitted mean is
143
hwu
Fitted model
144
hwu
F73DB3 CDA Data from class
Male Female
Cinema often 22 21
Not often 20 12

145
hwu
Male Female
Cinema often 22 21 43
Not often 20 12 32
42 33 75
146
Male Female
Cinema often 22 21 43
Not often 20 12 32
42 33 75
P(oftenmale) 22/42 0.524 P(oftenfemale)
21/33 0.636 significant difference (on these
numbers)? is there an association between gender
and cinema
attendance?
147
hwu
Null hypothesis H0 no association between gender
and cinema attendance Alternative not H0 Under
H0 we expect 42 ? 43/75 24.08 in cell 1,1 etc.
148
hwu
gt matcinemamatrix(c(22,20,21,12),2,2) gt
chisq.test(matcinema) Pearson's Chi-squared test
with Yates' continuity correction data
matcinema X-squared 0.5522, df 1, p-value
0.4574 gt chisq.test(matcinema)expected
,1 ,2 1, 24.08 18.92 2, 17.92
14.08
149
hwu
gt matcinemamatrix(c(22,20,21,12),2,2) gt
chisq.test(matcinema) Pearson's Chi-squared test
with Yates' continuity correction data
matcinema X-squared 0.5522, df 1, p-value
0.4574 gt chisq.test(matcinema)expected
,1 ,2 null hypothesis can stand
1, 24.08 18.92 no association between
gender 2, 17.92 14.08 and cinema attendance
150
hwu
more students, same proportions
Male Female
Cinema often 110 105 215
Not often 100 60 160
210 165
P(oftenmale) 110/210 0.524 P(oftenfemale)
105/60 0.636 significant difference (on these
numbers)?
151
hwu
gt matcinema2matrix(c(110,100,105,60),2,2) gt
chisq.test(matcinema2) Pearson's Chi-squared test
with Yates' continuity correction data
matcinema2
152
hwu
gt matcinema2matrix(c(110,100,105,60),2,2) gt
chisq.test(matcinema2) Pearson's Chi-squared test
with Yates' continuity correction data
matcinema2 X-squared 4.3361, df 1, p-value
0.03731 gt chisq.test(matcinema2)expected
,1 ,2 1, 120.4 94.6 2, 89.6 70.4
153
hwu
gt matcinema2matrix(c(110,100,105,60),2,2) gt
chisq.test(matcinema2) Pearson's Chi-squared test
with Yates' continuity correction data
matcinema2 X-squared 4.3361, df 1, p-value
0.03731 gt chisq.test(matcinema2)expected
,1 ,2 null hypothesis is
rejected 1, 120.4 94.6 there IS an
association between 2, 89.6 70.4 gender
and cinema attendance
154
hwu
FIN
Write a Comment
User Comments (0)
About PowerShow.com