Categorical Data Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Categorical Data Analysis

Description:

Subjects (sample items) are classified as belonging to one of a set of ... Example Eye colours: eye colours of males visiting an optician, in four categories ... – PowerPoint PPT presentation

Number of Views:248
Avg rating:3.0/5.0
Slides: 21
Provided by: jphil
Category:

less

Transcript and Presenter's Notes

Title: Categorical Data Analysis


1
Categorical Data Analysis

2
Categorical data arise whenever counts (as
opposed to measurements) are made. Subjects
(sample items) are classified as belonging to one
of a set of categories and the numbers in the
categories (the frequencies) are recorded.
3
Example Eye colours eye colours of males
visiting an optician, in four categories Colour
A B C D Frequency observed 89
66 60 85
4
Example Tonsils Relationship between nasal
carrier status for Streptococcus pyogenes and
size of tonsils among 1398 children aged 0-15
years.
Normal Enlarged Much enlarged Total
Carriers 19 29 24 72
Non-carriers 497 560 269 1326
Total 516 589 293 1398
5
Example Prussian cavalry deaths numbers of
cavalry soldiers killed by horsekicks in each of
14 units of the Prussian army over a 20-year
period (1875-1894).
Number killed 0 1 2 3 4 ?5 Total
Frequency observed 144 91 32 11 2 0 280
6
Often we wish to decide whether the categorical
variables follow some well known distribution
A chi-squared test will provide a method of
testing the hypothesis that a data set follows a
particular distribution.
7
Often we wish to decide whether the categorical
variables follow some well known distribution
A chi-squared test will provide a method of
testing the hypothesis that a data set follows a
particular distribution. It works by summing
the quantity (Observed
Expected)2/Expected
8
The chi-squared test in the R program is fairly
limited it copes well with testing whether
there is a significant relationship between nasal
carrier status for Streptococcus pyogenes and
size of tonsils among 1398 children aged 0-15
years (as in the second example) but gives us a
problem with the other two.
9
Consider now data from Standard and Poors 500 -
an index of 500 of the largest, most actively
traded stocks on the New York Stock Exchange
These data are available in R as sp500.R from
the module website.
10
(No Transcript)
11
Technique To look at any one of the variables in
a data frame such as sp.500, the sign is
helpful. Without attaching the data, typing
adjclose produces nothing.
12
Instead use gtsp500adjclose or gtplot(sp500adjcl
ose)
13
(No Transcript)
14
We are interested in the distribution of the
change in returns from day to day. We suspect
that the logs of these changes may follow a
normal distribution.
15
These are placed in an R vector by using the
command gtddiff(log(sp500adjclose))
16
(No Transcript)
17
The chisq function that is pre-defined in R is
not powerful enough to test the values of d to
see if they conform to a normal distribution, so
a program is written instead.
18
We wish to test whether a normal distribution
with the same mean and standard deviation of d
will look similar to this histogram. Calculate,
for example, the approximate expected number
between -0.04 and -0.02 by
19
This can be repeated and made more sophisticated
with more than 4 comparisons by writing a
program. The one considered has 100 comparisons.
20
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com