Psychometrics: An introduction presentation

About This Presentation

Transcript and Presenter's Notes

Title: Psychometrics: An introduction

1
PsychometricsAn introduction
2
Overview

A brief history of psychometrics
The main types of tests
The 10 most common tests
Why psychometrics? Clinical versus actuarial
judgment

3
A brief history

Testing for proficiency dates back to 2200 B.C.,
when the Chinese emperor used grueling tests to
assess fitness for office

4
Francis Galton

Modern psychometrics dates to Sir Francis Galton
(1822-1911), Charles Darwins cousin

Interested in (in fact, obsessed with)
individual differences and their distribution
1884-1890 Tested 17000 individuals (!) on
height, weight, sizes of accessible body parts,
behavior hand strength, visual acuity, RT etc
Demonstrated that objective tests could provide
meaningful scores
Invented correlation First regression line was
the average diameter of seeds against the average
diameter of their parents

5
Regression to the mean

Galton also popularized the idea of regression to
the mean extreme values when repeated tend to be
less extreme

Francis Galton (1886). Regression Towards
Mediocrity in Hereditary Stature. Journal of the
Anthropological Institute 15 246263.
6
Regression to the mean

So Second albums by great bands tend to be worse
than first albums second novels by successful
first novelists tend to be worse than first
novels sports teams who excelled in one game or
season tend do worse in the next game/season
geniuses have childen who are less brilliant than
they are etc.
WHY?

7
Regression to the mean

I had the most satisfying Eureka experience of
my career while attempting to teach flight
instructors that praise is more effective than
punishment for promoting skill-learning. When I
had finished my enthusiastic speech, one of the
most seasoned instructors in the audience raised
his hand and made his own short speech, which
began by conceding that positive reinforcement
might be good for the birds, but went on to deny
that it was optimal for flight cadets. He said,
"On many occasions I have praised flight cadets
for clean execution of some aerobatic maneuver,
and in general when they try it again, they do
worse. On the other hand, I have often screamed
at cadets for bad execution, and in general they
do better the next time. So please don't tell us
that reinforcement works and punishment does not,
because the opposite is the case." This was a
joyous moment, in which I understood an important
truth about the world because we tend to reward
others when they do well and punish them when
they do badly, and because there is regression to
the mean, it is part of the human condition that
we are statistically punished for rewarding
others and rewarded for punishing them. I
immediately arranged a demonstration in which
each participant tossed two coins at a target
behind his back, without any feedback. We
measured the distances from the target and could
see that those who had done best the first time
had mostly deteriorated on their second try, and
vice versa. But I knew that this demonstration
would not undo the effects of lifelong exposure
to a perverse contingency.
Daniel Kahneman (In his Nobel acceptance speech)

8
James Cattell

James Cattell (studied with Wundt Galton)
first used the term mental test in 1890

His tests were in the brass instruments
tradition of Galton
mostly motor and acuity tests
Founded Psychological Review(1897)

9
Clark Wissler

Clark Wissler (Cattells student) did the first
basic validational research, examining the
relation between the old mental test scores and
academic achievement

His results were largely discouraging
He had only bright college students in his
sample
Why is this a problem?
Wissler became an anthropologist with a strong
environmentalist bias.

10
Alfred Binet

Goodenough (1949) The Galtonian approach was
like inferring the nature of genius from the the
nature of stupidity or the qualities of water
from those of.hydrogen and oxygen.

Alfred Binet (1905) introduced the first modern
intelligence test, which directly tested higher
psychological processes (real abilities
practical judgments)
i.e. picture naming, rhyme production, weight
ordering, question answering, word definition.
Also motivated IQ (Stern, 1914) mental age
divided by chronological age

11
The rise of psychometrics

Lewis Terman (1916) produced a major revision of
Binets scale
Robert Yerkes (1919) convinced the US government
to test 1.75 million army recruits
Post WWI Factor analysis emerged, making other
aptitude and personality tests possible

12
What is a psychometric test?

A test is a standardized procedure for sampling
behavior and describing it using scores or
categories
Most tests are predictive of some non-test
behavior of interest (or what would be the
point?)
Most tests are norm-referenced they describe
the behavior in terms of norms, test results
gathered from a large group of subjects (the
standardization sample)
Some tests are criterion-referenced the
objective is to see if the subject can attain
some pre-specified criterion.

13
The main types of tests

Intelligence tests Assess intelligence
Aptitude tests Assess capability
Achievement tests Assess degree of
accomplishment
Creativity tests Assess capacity for novelty
Personality tests Assess traits
Interest inventories Assess preferences for
activities
Behavioral tests Measure behaviors and their
antecedents/consequences
Neuropsychological tests Measure cognitive,
sensory, perceptual, or motor functions

14
The 10 most commonly used tests

1.) Wechsler Intelligence Scale for Children
(WISC)
2.) Bender Visual-Motor Gestalt Test
3.) Wechsler Adult Intelligence Scale (WAIS)
4.) Minnesota Multiphasic Personality Inventory
(MMPI)
5.) Rorschach Ink Blot Test
6.) Thematic Apperception Test (TAT)
7.) Sentence Completion
8.) Goodenough Draw-A-Person Test
9.) House-Tree-Person Test
10.) Stanford-Binet Intelligence Scale
From Brown McGuire, 1976

15
Clinical versus actuarial judgment

Clinical judgment reaching a decision by
processing information in ones head
Actuarial judgment reaching a decision without
employing human judgment, using
empirically-established relations between data
and the event of interest
Actuarial ad. L. actu amac ri-us, a keeper
of accounts
Note that some of the data in an actuarial
judgment may be qualitative clinical
observations, allowing a mixture of methods

16
Clinical versus actuarial judgment

Paul Meehl (1954) first addressed the question
Which is better?

His ground rules for comparison
Both methods should draw from the same data set
(this was relaxed by others, with no changes in
results)
Cross-validation should be required, to avoid
using variation specific to the data set
There should be explicit prediction of success,
recidivism, or recovery

17
Meehl (1954) Results

He looked at between 16 and 20 studies (depending
on inclusion criteria)
it is clear that the dogmatic, complacent
assertion sometimes heard from clinicians that
naturally clinical prediction, being based on
real understanding is superior, is simply not
justified by the facts to date.
In all but one case, predictions made by
actuarial means were equal to or better than
clinical methods
In a later paper, he changed his mind about the
one!

18
Thirty years later...

Review and reflection indicate that no more than
5 of what was written in the 1954 book entitled,
Clinical Versus Statistical Prediction needs to
be retracted 30 years later. If anything, these
retractions would result in the book's being more
actuarial than it was.
There is no controversy in social science that
shows such a large body of qualitatively diverse
studies coming out so uniformly as this one.
Paul Meehl, 1986 (Causes and Effects of My
Disturbing Little Book)

19
In 1989

After eliminating studies that might be biased
against clinicians, by 1989 there were
approximately 100 studies that pitted actuarial
against clinical methods
In virtually every one of these studies, the
actuarial method has equaled or surpassed the
clinicla method, sometimes substantially
Dawes, Faust, Meehl, 1989 In your course pack

20
Example Goldbergs Rule

Goldbergs Rule (1965) gives a simple formula for
diagnosing psychosis versus neurosis from MMPI
scale scores (we will see these scales later)
It was derived by looking at gold standard
discharge diagnoses
It was compared to 29 judges on 861 profiles from
7 settings
Judges got an average of 62 correct
The best judge got 67 correct
Goldbergs Rule got 70 correct, and exceeded
judges in every one of the 7 settings
Additional training didnt help the judges do
better (and note also that the judges knew and
could have used Goldbergs Rule!)

21
Where are clinicians strengths? I

i.) Theory-mediated judgments
If the predictor knows the relevant causal
influences, can measure them, and has a model
specific enough to take him/her from theory to
fact
However, are there any reasons to doubt this
potential advantage?

22
Where are clinicians strengths? II

ii.) Ability to use rare events
If the predictor knows that the current case is
an exception to the statistical trend, s/he can
use that information to over-ride the trend
It is also possible to build these into actuarial
methods
Why is it very difficult in practice?
Why might we worry about clinicians ability to
incorporate rare events into prediction?

23
Where are clinicians strengths? III

iii.) Able to detect complex predictive cues
- Humans beings are still (for now) masters at
recognizing some complex configurations, such as
facial expressions etc.

24
Where are clinicians strengths? IV

iv.) Able to re-weight utilities in real-time
- For ethical, legal, humanitarian, or financial
reasons, we might decide to do things differently
than usual in particular cases.

25
Where are actuarial strengths? I

i.) Immunity from fatigue, forgetfulness,
hang-overs, hostility, prejudice, ignorance,
false association, over-confidence, bias,
heart-ache, and random fluctuations in judgment.

26
Where are actuarial strengths? II

ii.) Consistency proper weighting
- Variables are weighted the same way every
time, according to their actual demonstrable
contributions to the criterion of interest
- Perhaps more importantly irrelevant variables
are properly weighted to zero

27
Where are actuarial strengths? III

iii.) Feedback base-rates built-in to the
system
- Clinicians rarely know how they are doing
because they dont get immediate feedback and
because they have imperfect memory
- Actuarial records constitute perfect memories
of how things came out in similar cases and can
include a larger and wider sample than a single
human or a small group of humans can ever hope to
see

28
Where are actuarial strengths? IV

iv.) Not overly sensitive to optimal weightings
- Even simplistic actuarial judgments often beat
human judgments
- Simple linear weightings often do better than
humans
v.) Optimal (non-linear) weightings are
possible.

29
The power of non-linearity

Linear relations are those that say that X goes
up by the same amount for each equal sized
increments in Y
P aX bY c
Such equations are represented graphically by a
straight line relating X and Y or any higher
number of dimensions
Non-linear relations are those that say that X
goes up by different amounts for each equal sized
increments in Y (there are many many such
equations)
Such equations are represented graphically by a
non-straight line relating X and Y either
because the line breaks or because it curves

30
The power of non-linearity

Westbury, C., Buchanan, L., Sanderson, M.,
Rhemtulla, M., Phillips, L. (2003). Using
genetic programming to discover non-linear
variable interactions. Behavior Research Methods,
Instruments, and Computers, 352 202-216.
We used computational means to discover
non-linear weightings for a test (constructed for
PSYCO 431) which looked at the construct of
geekiness the extent to which a person is a
geek.
This test was validated against a self-rating on
a Likert scale.
The test consisted of 76 questions.
The validation set contained 59 subjects
-The test set contained 30 subjects.

31
The power of non-linearity (and the need for
cross-validation)

The non-linear estimate was about as good at
predicting scores on unseen tests as the (gold
standard) summed validation score around which
the test had been designed
It blew away the linear regression (0.56 versus
0.20)
The non-linear combination used responses to
only 12 of the 76 test questions in its
prediction.

Write a Comment

User Comments (0)

About PowerShow.com

Psychometrics: An introduction PowerPoint PPT Presentation