Title: Overview
1Predictive Tests
2Overview
- Introduction
- Some theoretical issues
- The failings of human intuitions in prediction
- Issues in formal prediction
- Inference from class membership The individual
versus group problem (and its only solution) - Some predictive tests
- Some wider implications Nonlinear predictions in
sience and psychometrics
3Predictive Tests
- Many tests are used to make predictions, of
levels of achievement or success, or of
likelihood of recidivism, or diagnostic category - Two kinds of predictions
- Categorical Predict which category this subject
will fall into (diagnosis, occupation) - Numerical predict the value of a relevant
numerical value (GPA, economic return to company)
4The failings of human intuition
- We have already seen many ways in which humans
succumb to errors in numerical reasoning - Kahneman Tversky Asked subjects about areas of
graduate specialization base rate estimation,
estimates (from a description) of similarity to
other students in each field, and predictive
estimate (also from a description)
5Results
- Results
- Similarity and prediction correlate at 0.97
- Similarity and base rates correlate at -0.65
- What does this result remind you of?
- What do these subjects need to be taught?
66 Errors discussed by Kahneman Tversky
- Representativeness error Assumes predictions are
not different from assessments of similarity - Insufficient regression error People fail to
take into account that when predictive validity
is less than perfect, correlations between
predictors and performance should be lt 1 - Central tendency error Subjects making judgments
tend to avoid extremes, and compress their
judgments into a smaller range than the
phenomenon being judged
76 Errors discussed by Kahneman Tversky
- Discounting of prior probabilities Human
predictors will throw out base rate info for
almost any reason - Overweighting of coherence There is greater
confidence in predictions based on consistent
input than inconsistent input with the same
average (i.e. two B's is better than a B C for
predicting a B average) - Overweighting of extremes Confidence in judgment
is over-weighted at extremes, especially positive
extremes ( j-shaped confidence function)
8What do we need to make good predictions?
- We need three pieces of information
- 1.) Base rates
- 2.) Relevant predictors in the individual case
- 3.) Bounds on accuracy (cutting scores)
- Kahneman Tversky's experimental evidence
(previous slides) show that subjects usually fail
to weight any of these three properly
9What can we infer from class membership?
- Some commentators have suggested that inference
from class membership is inherently fallacious - i.e. 25 of first-degree relatives of those
diagnosed with malignant melanoma (skin cancer)
will also develop melanoma - I am a first-degree relative of a person
diagnosed with melanoma, so I take my odds of
developing the disease to be 25 - Critics of the inference say No, it is either 0
(I don't develop the disease) or 100 (I do)
i.e. group probabilities don't apply to
individuals
10Do group probabilities apply to individuals?
- Meehl's response "If nothing is rationally
inferable from membership in a class, no
empirical prediction is ever possible" - The argument is a re-statement of the necessity
of inference even in the case of predicting
individual behavior from that individual's data,
we need to consider the pattern over past data - Moreover, claim of 'certainty' is philosophical,
not real in the absence of knowing which group
you are in, there is only probability, not
knowledge
11Some Predictive Tests Standardized admission
tests
- Thanks to Lily Tsui for these GRE slides
- Scholastic Aptitude Tests (SAT, GREs) are highly
reliable tests developed to painstaking
psychometric standards - The general GRE has four sections verbal
(including reading comprehension), quantitative
(including chart comprehension), analytical, and
a random test section - The subject test has 215 multiple choice
questions - On psychology 40 experimental/natural science
43 social science 17 general - The test is timed and corrected for guessing
12Sample Verbal Questions
- Analogies
- ETERNAL END
- a. precursory beginning
- b. grammatical sentence
- c. implausible credibility
- d. invaluable worth
- e. frenetic movement
13Sample Verbal Questions
- Sentence Completions
- Museums, which house many paintings and
sculptures, are good places for students of
_____. - a. art
- b. science
- c. religion
- d. dichotomy
- e. democracy
-
14Sample Verbal Questions
- Antonyms
- MALADROIT
- a. ill-willed
- b. dexterous
- c. cowardly
- d. enduring
- e. sluggish
15Sample Quantitative Questions
- Quantitative Comparison
- Column A y-6 Column B -3
- If y gt 2
- a. the quantity in column A is always greater
- b. the quantity in column B is always greater
- c. the quantities are always equal
- d. It cannot be determined from the information
given
16Sample Quantitative Questions
- Problem Solving
- The sum of x distinct integers greater than zero
is less than 75. What is the greatest possible
value of x ? - a. 8
- b. 9
- c. 10
- d. 11
- e. 12
17Sample Analytical Questions
- A pastry shop will feature 5 desserts-- V,W,X,Y
Z-- to be served Monday thru Friday, one dessert
a day, that conforms to the following
restrictions - Y must be served before V.
- X and Y must be served on consecutive days.
- Z may not be the second dessert to be served.
18Reliability
- Within-test reliability is 0.9
- Test re-test reliability is not so good Repeat
test takers for both tests show an average score
gain of 20-30 points - This may move a student by a large amount more
than 10 percentiles
19Predictive Validity
- In one meta-analysis by Sternberg and Williams,
they point out that empirical validities of the
GRE vary somewhat by field - GRE correlations between various combinations of
GRE scores and grad school performance are only
between 0.25 and 0.35, and only marginally better
(0.4) if you include undergraduate grades
20Correlations of GRE Scores
21Construct Validity
- Is the GRE getting at anything related to
graduate school? - What about motivation, creativity, devotion,
conscientiousness, and other aspects that make a
successful graduate student? - Some complaints
- Graduate assignments require that students
develop research skills, but GRE does not test
this - GRE is timed but real life is rarely timed
- GRE is individualised but real work usually
involves collaboration
22Why is the GRE so popular?
- Because is in the public eye
- Since average scores for admissions on tests such
as the GRE are published, there is pressure on
schools to keep the average scores of the
students that they accept high so that they can
remain competitive with other institutions in
the public eye - One strength of the GR that they have specific
regression equation by college i.e. they can
predict future performance at a particular
college independently - Because there is relatively little variation in
their reference letters and undergraduate GPA --gt
GRE scores are one main sources of the variation
that is needed to rank applicants
23Some Predictive Tests The SAT
- SAT r 0.4 with university GPA
- By comparison, high school grade r 0.48
- Together, r 0.55
24Can you beat the standards?
- Notwithstanding the huge industry waiting to take
money from anxious high school students, studying
for the SAT doesn't help much - SAT coaching increases scores by about 15 points,
which is 0.15 SDs - Repeat testing increases it a little less, about
12 points or 0.12 SDs
25Some Predictive Tests Professional tests
- Professional school tests (MCAT, LSAT)
- MCAT r low .80s
- LSAT r gt 0.9
- There is relatively little evidence of validity
- They predict performance about as well as
undergraduate GPA alone r 0.25 - 0.3
26Some Predictive Tests The Strong Interest
Inventory
- The Strong (1927) Interest Inventory
(Strong-Campbell, 1981) widely used test of
interests as predictors of professional aptitude - Empirically constructed with concurrent validity,
comparing each vocational group to the overall
average - Has 325 items, 162 scales covering 85 occupations
- Reliability is high
- 0.9test/retest over weeks 0.6-0.7 over years
unless they were old ( 25years!) at first test,
then 0.8 even after 20 years - Does not predict success or satisfaction in a
profession - Does predict likelihood of entering and remaining
in a profession chances of 50 that a person
will end up in a profession most strongly
predicted (A score), and only 12 that he will
end in one least predicted (C score)
27Prediction in scientific psychology
- Prediction scientific explanation are related
- We admire Newton's laws precisely because they
are accurate in predicting real phenomena - Many cognitive models in psychology are purely
descriptive they fail to make an effort to
predict how a person will perform on unseen
stimuli - There are many ways to do so, if you have
sufficient variation in predictors multiple
regression, neural networks, 'cheap' methods
(i.e. best single predictor)
28What is a linear relation?
- Things are linearly related if they change in
direct proportion to each other When one goes up
or down at a constant rate, so does the other - Things are non-linearly related if changes in one
are not mirrored by analogous changes in the
other - Many biological systems are non-linear
29Example Predicting lexical decision RTs
- Lexical decision ( time to decide if a string is
a word or not) is a simple task to perform - Many well-specified variables can be calculated
for words frequency, similarity to other words,
frequency of components - This allows for predictive testing How well can
we predict how long it will take (average
reaction time RT) to reach a decision about
wordness? - We used 35 predictors, and a non-linear method of
combining them (genetic programming) to predict
average RTs
30(No Transcript)
31Some lessons about scientific prediction
- Models can 'cheat' by using variance in the input
data set that does not transfer to unseen data
you must test your predictions on unseen data - Some models that are very good may be very good
precisely because they are very good at using
this 'within-set' variation - Very simple (3-variable) non-linear models may do
as well or better than than much more complex
models, especially linear models, and may exclude
highly-correlated variables - Different measures of successful prediction may
yield quite different results (i.e. test
correlation versus 0.5 SD correlation)
32Prediction in psychometrics
- A test was designed to measure the construct of
geekiness the extent to which a person is a
geek. - This test was validated against a self-rating on
a Likert scale. - The test consisted of 76 questions.
- We split the data into two parts a validation
set and a test set - The validation set contained 59 subjects.
- The test set contained 30 subjects.
33Prediction in psychometrics
Development Set Test Set
Summed score 0.54 0.59
Multiple regression 0.70 0.20
GP 0.89 0.56
- The estimate produced by non-linear means is
about as good at predicting scores on unseen
tests as using the summed score. - However, the GP equation used a non-linear
combination of responses to only 12 of the 76
test questions in its prediction!
34Prediction in psychometrics
- The take-home message Linear assumptions may be
very limiting - More predictive power may sometimes (perhaps
often) be obtained by dropping the assumptions
of linear relations between predictors and the
quality to be predicted