Title: Lecture 3: Chi-Sqaure, correlation and your dissertation proposal
1Lecture 3 Chi-Sqaure, correlation and your
dissertation proposal
- Non-parametric data the Chi-Square test
- Statistical correlation and regression
parametric and non-parametric tests - Break
- Regression in SPSS
- Writing a dissertation proposal when you plan to
use statistics - Exercises, assessment and assistance
2Non-parametric statistics
- Non-parametric statistics in human geography
- Different types of non-parametric test
- 1 sample
- 2 independent samples
- 2 tied samples
- 3 or more samples
3The Chi-Square test
- Most versatile test in social science
- Can be used to examine nominal data, ordinal data
and interval/ratio data in groups - There are no assumptions about independent or
paired observations
4Theory of Chi-Square
- The test examines the difference between observed
counts and expected values - Suppose we wanted to examine the difference
between age groups in our sample and people in
those groups in the UK? Or perhaps the difference
between age groups between two or three samples? - Chi-Square can examine these differences
5The Chi-Square Equation
- ?2 Sum of (observed - expected)2
- expected
6One way Chi-Square test
- Examines whether there is a difference between
one sample and a population - We can assume either that the expected counts
will be equal between categories or that we know
the proportions - But, before we do the test, we have to
cross-tabulate the data
7The Cross-tabulation
8The expected counts
- Expected counts relate to either equal
proportions or previously known proportions (e.g.
from a population) - These are then compared to observed counts and
the difference is calculated - A significance level is selected and the null
hypothesis is accepted or rejected
9The Contingency Table
10The test result
- Chi-Square is calculated as the sum of each
difference for every cell - Assessed as for other statistical tests
- ?2 7.1 (p lt0.05)
11Two way Chi-Square tests
- Very often, we want to compare more than one
sample with a population, such as with another
sample, or three or more samples - Two way Chi-Square allows us to do this easily
- Again, we cross-tabulate the data
12The Contingency table
13Two-way analysis
- Chi-Square calculates expected values by
multiplying the row and column totals and
dividing between the grand total - Expected values represent the number in each
category which, given the sample sizes and
distribution, we would expect to see in each cell
14The Chi-Square result
- Chi-Square gives the result and we evaluate the
test with the use of significance tests - ?2 21.7 (p lt0.05)
- But, we can only state that there is a difference
- not what the difference is. For example, does
our sample from the north have more older people
in it? - We must examine the relative proportions of the
contingency table to find this out
15The expected counts problem
- Chi-Square has the stipulation that 20 or less
of the expected counts in an analysis must be
under 5. If there are more than this, the test is
invalid - So, how can we get over this problem?
16Recoding variables
- We can aggregate suitable variables to make the
number of groups smaller - Aggregating only works with ordinal data
- This reduces the number of groups and makes the
likelihood of obtaining counts below 5 less - We can also use this to make interval/ratio data
into groups
17Chi-Square Qualifications
- You should have no less than 20 cases
- As stated above, not more than 20 of cells
should have expected values under 5 - You should not necessarily ignore a contingency
table, even if the Chi-Square test is invalid - Remember, above all, that Chi-Square is a test of
difference, not correlation
18Statistical correlation relationships among
variables
- Relationships are concerned with the extent to
which variable A is related to B - This is termed correlation
- Correlation does not necessarily imply causation,
but merely a possible relationship - There are parametric and non-parametric tests of
correlation
19Types of correlation
- Perfect positive correlation 1
- Perfect negative correlation -1
- Linear relationship
- No correlation 0
- Non-linear relationship
20Parametric correlation Pearsons r
- Assumes your data are on interval/ratio scales
AND are normally distributed - Measured as -1 - 1
- This result shows the strength of the
relationship - The test must be judged by its significance (as
for other parametric tests lt gt 0.05)
21Non-parametric correlationSpearmans rs
- Assumes ordinal data, or interval/ratio data that
are not normally distributed - Data are ranked for the test
- Measured as for Pearsons
- Significance as for Pearsons
22From correlation to explanation regression
analysis
- Regression seeks to examine the nature of the
relationship between one or more independent
variables and a dependent variable - It is concerned with prediction, not just
correlation - To predict, there is an equation which describes
the line of best fit between variables
23The Line of best fit
- Line of best fit fits a straight line through
the data points you observe - Can be expressed by
- Y mx c
- Where
- Y Dependent variable
- c constant (intercept)
- m slope gradient
- x independent variable
-
24Predicting using the regression equation
- You can use the equation to predict levels of Y
for given levels of X - This is often of use when looking at different
outcome situations
25Interpreting regression results
- R2 the goodness of fit that the model offers,
expressed in per cent - F the significance of the model
- The regression coefficients and associated p
values
26Regression assumptions
- Your data
- Are measured on interval/ratio scales
- Are normally distributed
- And are therefore Parametric and...
- Have a linear relationship
- You can use other techniques for non-linear
regression and regression with nominal/ordinal
variables
27Is any of this relevant to me?
- YES - you have to write a dissertation proposal
- Saying you will analyse the data using
appropriate methods is not enough - You will get a far higher mark if you follow
these simple steps in the next two months when
preparing your proposal
28Writing your Dissertation Proposal key points
- Do you need to use a questionnaire/other
quantitative instrument? - If yes, what key questions are you posing?
- ALWAYS relate these questions to your plans for
analysis - How will you analyse these collected data to meet
your aims and objectives?
29Writing your proposal
- Methodology
- Questionnaire
- Questions
- Data this will yield
- Analysis types
- Analysis tools
- Quantitative/qualitative?
- Type closed/open/both?
- Yes/no frequency categorical multiple
response? - Parametric/non-parametric?
- Description, Differences, relationships?
- Parametric/non-parametric?
30Example of this process
31A final word
- Think carefully about your questionnaire - can
you meet the objectives you have set yourself? - Do you need to use every statistical test?
- Assessments (all 3) due in on 6 May
- Where can you get help?
- Friday 14th March, 9-11am
- Monday 28th April, 11am-1pm
- E-mail S.W.Barr_at_exeter.ac.uk