Can we compare people to each other using ipsative measures - PowerPoint PPT Presentation


PPT – Can we compare people to each other using ipsative measures PowerPoint presentation | free to view - id: 64bc2-ZDc1Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Can we compare people to each other using ipsative measures


25th Biennial Conference of the Society for Multivariate Analysis in the ... Like the bumble bee, rather than using theory to prove that it cannot fly, we ... – PowerPoint PPT presentation

Number of Views:292
Avg rating:3.0/5.0
Slides: 46
Provided by: daveba4


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Can we compare people to each other using ipsative measures

Can we compare people to each other using
ipsative measures?
  • Prof Dave Bartram
  • Research Director
  • SHL Group plc
  • 25th Biennial Conference of the Society for
    Multivariate Analysis in the Behavioural Sciences
  • 2 July 2006

  • Both predictors (e.g. personality test items) and
    criteria (e.g. line manager ratings on
    competencies) can be constructed either using
    likert item formats (normative) or forced-choice
    formats (ipsative).
  • What are the differences?
  • What are their relative advantages and
  • Generally argued that
  • Likert are easier to analyse but are subject to
    halo and response bias effects.
  • Ipsative control response bias but can pose
    problems for analysis due to constraints on

Normative OPQ32
Ipsative OPQ32
What is ipsative measurement?
  • Two methods forced choice or ipsatization
  • Forced choice
  • choose between items loading on different scales
  • allocating fixed number of points between scales.
  • Ipsatization
  • Subtract average score across scales from each
    scale score. This removes one degree of freedom
    and locates each profile about the same
  • Can also go further and equate score variance
    across people, so that all profiles are equally
    variable across scales (this is not often done)

Score ipsatization
Forced choice is not always ipsative
  • Some instruments use forced-choice for pairs
    of items from the same scales
  • A1 I prefer to be alone extraversion ve
  • A2 I like to spend time with other people
    extraversion ve
  • Versus
  • B1 I prefer to be alone extraversion ve
  • B2 I often feel anxious neuroticism ve
  • A is not ipsative, B is.

Alternate ipsative forced choice formats
  • Pairs of items - dyads
  • I prefer to be alone Extraversion ve
  • I often feel anxious Neuroticism ve
  • Triplets - triads
  • I prefer to be alone Extraversion ve
  • I often feel anxious Neuroticism ve
  • I try to help others Agreeable ve
  • Quads -tetrads
  • I prefer to be alone Extraversion ve
  • I often feel anxious Neuroticism ve
  • I try to help others Agreeable ve
  • I am creative Openness ve

Scoring a quad
Some ipsative maths
  • Sum of scores across all scales sums to a
    constant (by definition)
  • Lose one degree of freedom. For k scales, df
  • Scale scores tend to correlate negatively
  • High scores on one scale ? lower scores on other
  • Average scale intercorrelation constrained
  • K2 scales, r -1.0
  • k4 scales, r -.33
  • k16 scales, r -.07
  • k32 scales, r -.03

Classical test theory
Test theory for ipsative
  • For ipsative forced choice pairs, c is a constant
    equal to the number of items measuring each scale
    and k is the number of scales.
  • ck total number of items
  • ck/2 total score (one point is given for each
    pair of items).
  • If T true score, eerror, Xobserved score

Meade, 2004
Test theory for forced choice ipsative
  • Meade (2004) argues that the observed score on
    any ipsative scale is a function of the true
    score and error on that scale minus some function
    of the true scores and errors on all the other
  • This needs modifying in terms of whether the
    ipsative design is complete or incomplete (i.e.
    does not contain all possible scale pairings).
  • So long as the design is balanced, however, this
    should not have any differential biasing effects
    on scales.
  • It does however explain why scale scores are
    negatively correlated.
  • Also explains why error terms are correlated in

Alternate models
  • Normative
  • Xi ti erand faking central tendency
  • Ipsative
  • Xi ti erand - f ( tjej, where j?i )

SEM model for an ipsative quad
ltlt Not included
Example OPQ32i quad 1 (UK English)
Ipsative Controversies
Ipsativity and self-referencing.
  • The key feature of ipsative measurement is that
    it requires people to make comparisons between
    trait strengths of different scales.
  • It is often called self-referenced measurement
    because of this.
  • This is a misnomer, as one can argue that all
    multivariate self-report measures are
  • However, as a consequence it is argued that one
    cannot therefore compare peoples scores on
    ipsative scales.
  • I will argue that with large numbers of scales
    (20 or more) the constraints that scores on
    scales place on the absolute values each can have
    are not substantive and have minimal impact in

Percent raw score point change in score on Scalei
when score changes one raw score point on Scalej
lt 5
DISC 33.3
OPQ32 3.2
Construct Validity
  • Meaning of the scale is a comparison
  • Correlation matrix is constrained
  • Average correlation
  • Are not all scales understood by comparison with
    other traits?
  • When rating an item people compare themselves
    both against others and against themselves
    whether the format is ipsative or normative

Scale Intercorrelations - OPQ32
Reliability - Issues
  • Reliability requires interval measure. Some claim
    that there are inflated results from ipsative
  • Reliability can be (a little) distorted with
    ipsative measure. Tenopyr (1998)
  • Reliability is conserved but can be depressed
    (Bartram, 1996 Karpatschof Elkjaer, 2000)
  • Bartram (1996) derived equation for reliability
    of ipsative data by showing that reliability is
    reduced as a direct function of the range
    restriction associated with loss of 1 df.
  • OPQ32 uses 208 items for normative and 416 for
    ipsative to ensure equal reliabilities.

Normative-Ipsative equivalence
  • N488 training delegates
  • For ipsative, median alpha0.86
  • For normative, median alpha0.83.
  • Alternate form scale reliabilities for
    OPQ32n-OPQ32i median 0.71
  • These correlations are lower than internal
    consistency reliabilities for the two versions,
    or testretest reliabilities for the OPQ32n
  • Corrected for attenuation, true-score
    correlations have median 0.83.

Big 5 equivalence n488
Profile similarities k32, n488
What determines the profile similarity?
  • The normative profile average deviations were
    correlated with the profile similarity
  • The correlation is r0.51 n488,
  • The similarity between a persons normative and
    ipsative profile is higher for people with more
    differentiated normative profiles.
  • Correlation between ipsative consistency scores
    and the similarity between normative and ipsative
  • The correlation is r0.52 n488.
  • People with a more consistent pattern of
    responding to the forced-choice format are likely
    to have a similar normative profile and that is
    likely to be relatively well differentiated.

Scale dependencies in normative and ipsative
  • Likert ratings for a 12 item 1-5 rating scale
    have theoretical range from 12 to 60.
  • In practice score obtained is constrained by
    scores on other scales, as scales are correlated
  • For the normative OPQ32 average R for predicting
    scalei from all other scales except scalei is
  • For ipsative this is 1.0 by definition.

Normative bias
  • Positive normative bias represents a shift of the
    profile to the right (average greater than sten
  • Negative normative bias represents a shift of the
    profile to the left (average less than sten 5.5)
  • For OPQ32n, SD of standardized average scores
    across scales 0.27 (n242)
  • For OPQ32i, by definition, SD0.

Distribution of average normative scale z-scores
Is normative bias related to personality?
  • Prediction of normative bias (stepwise)
  • Using normative scale residuals (n) R0.76 (0.75
  • Using ipsative scales (i) R0.53 (0.50 adjusted)
  • Normative residual predictor and ipsative scale
    predictor correlate r0.62
  • People who have positive normative bias are
  • More
  • (n i) Achieving Controlling Optimistic
  • (n) Evaluative Conscientious
  • (i) Caring Detail Conscious
  • Less
  • (n i) Worrying
  • (n) Decisive Competitive Variety seeking
    Modest Independent Conventional

N242 in all cases
  • So long as number of scales is large you can
    compare people across scales.
  • Normative and ipsative versions are not parallel
    forms of the same test, they provide
    qualitatively different but highly correlated
  • Most people have similar ipsative and normative
    profiles both in shape and location.
  • Some people will have moderate or large score
    differences across forms, especially if their
    profiles are flat or if they are showing strong
    response bias on the normative version
  • Where there is a difference between forms, which
    is correct or are they both correct?

Criterion validity of ipsative measures
Criterion Validity
  • Constraints on scales will have an impact on
    external correlations
  • Lower average inter-scale correlation should
    optimise additive effect of scale variances
    (increasing multiple Rs)
  • Greer and Dunlap (1997) found that Type 1 error
    rates well preserved and power nearly equivalent
    in Monte Carlo study of ANOVA.
  • Get similar validities for individual scales for
    normative and ipsative

Other research
  • Jackson, Wroblewski Ashton (2000) compared
    single stimulus and forced choice format for
    integrity-related personality items.
  • Those simulating applying for a job gave 1 SD
    better scores when using single stimulus format
    instrument and this lead to lower validity
  • Shift in mean only one third for forced choice
    format and validity maintained.
  • Martin Bowen Hunt (2002) show that ipsative OPQ
    is more resistant to faking instructions than
  • No differences between faking and honest group
    for ipsative, but large differences for normative.

Other research Christiansen et al (2005)
  • Both FC and normative format susceptible to
    distortion, but FC more robust with applicants
  • For validity re supervisor ratings, distortion
    had more deleterious effect on validity of
    normative, with some evidence for enhancement of
    validity of FC format
  • High ability individuals tend to be better at
    distorting FC format instruments than those of
    lower ability.
  • Triad harder to distort than dyad format.
  • NB OPQ32 uses tetrad format

SHL Research Meta analysis of normative vs
ipsative validity data
  • 19 studies (n3241) drawn from meta-analysis of
    29 validity studies (Bartram, 2005)
  • Predictors included both normative and ipsative
    forms of OPQ personality tests
  • Compare studies using likert format (OPQ32n, OPQ
    CM5.2 and CCSQ 5.2) with those using
    forced-choice format (OPQ32i, OPQ CM4.2, CCSQ
    7.2) where the criterion measures where the same
    (IMC or CCCI).
  • Criteria included the mixed item format
    (normativeipsative) Inventory of Management
    Competencies (IMC).
  • Compare the validities of likert rating with the
    ipsative choices made by the same line managers
    using IMC
  • Control over candidates, instrument, items and

Normative part of normative-ipsative IMC
Ipsative part of normative-ipsative IMC
Ipsative vs normative predictor
Ipsative vs Normative IMC criteria
Summary of results
  • For comparison of predictors
  • Ipsative k13, n2,348 mean ? 0.268
  • Normative k4 n 409 mean ? 0.223
  • For comparison of criterion measures k9,
  • Ipsative mean ? 0.315
  • Normative mean ? 0.189

  • Ipsative scales are not identical to normative
  • However, with more scales (kgt20) results are very
  • Both have advantages and disadvantages
  • As predictors both have good validity, but
    ipsative has better differentiation and is more
    resistant to distortion
  • Choice depends on application and likely sources
    of error/bias
  • We can enhance validity by using forced-choice
    formats to reduce halo effects for criterion

We should not argue against the use of a
methodology that provides real practical benefits
just because we do not understand its
psychometric complexities.
  • Like the bumble bee, rather than using theory to
    prove that it cannot fly, we should reflect on
    practice and try to understand how it does.

Thank you
  • Email for copies

  • Baron, H. (1996). Strengths and limitations of
    ipsative instruments. Journal of Occupational and
    Organizational Psychology, 69. 49-56.
  • Baron, H. (2002). Working with ipsative measures.
    Paper presented at the 17th annual conference of
    the Society for Industrial and Organizational
    Psychology, April 2002, Toronto, Canada.
  • Bartram, D. (1996). The relationship between
    ipsatized and normative measures of personality.
    Journal of Occupational and Organizational
    Psychology, 69, 25-39.
  • Bartram, D. (2005) The Great Eight Competencies
    A criterion-centric approach to validation.
    Journal of Applied Psychology, 90, 1185-1203.
  • Christiansen, N., Burns, G.N., Montgomery, G.E.
    (2005). Reconsidering forced-choice item formats
    for applicant personality assessment. Human
    Factors, 18, 267-307.
  • Closs, S. J. (1996). On the factoring and
    interpretation of ipsative data. Journal of
    Occupational and Organizational Psychology, 69,
  • Converse, P.D., Oswald, F.L., Imus, A., Hedricks,
    C., Roy, R., Butera, H. (undated ms). Comparing
    yourself with many people or comparing yourself
    on many traits Effects of personality test
    format on faking, criterion-related validity and
    test-taker reactions.
  • Converse, P.D., Oswald, F.L., Imus, A., Hedricks,
    C., Roy, R., Butera, H. (undated ms). Forcing
    choices in personality measurement Benefits and
  • Jackson, D.N., Wroblewski, V.R., Ashton, M.C.
    (2000). The impact of faking on employment tests
    Does forced-choice offer a solution? Human
    Performance, 13, 371-388.
  • Karpatschof, B Elkjaeer, H. K. (2000) Yet the
    Bumblebee Flies The reliability of ipsative
    scores examined by empirical data and a
    simulation study. Research Report no 1.
    Department Psychology, University of Copenhagen.
  • King, L.M., Hunter, J.E., Schmidt, F.L. (1980).
    Halo in a multidimensional forced-choice
    performance evaluation scale. Journal of Applied
    Psychology, 65, 507-516.
  • Martin, B.A., Bowen, C-C., Hunt, S.T. (2002).
    How effective are people at faking on personality
    questionnaires? Personality and Individual
    Differences, 32, 247-256.
  • Matthews, G., Oddy, K. (1997). Ipsative and
    normative scales in adjectival measurement of
    personality Problems of bias and discrepancy.
    International Journal of Selection and
    Assessment, 5, 169-182.
  • Meade, A. (2004). Psychometric problems and
    issues involved with creating and using ipsative
    measures for selection. Journal of Occupational
    and Organizational Psychology, 77, 531-552.
  • McLoy, R.A. (2005). A silk purse from a sows
    ear Retrieving normative information from
    multi-dimensionla forced-choice items.
    Organizational Research Methods, 8(2), 222-248.
  • Saville, P. Willson, E. (1991). The
    reliability and validity of normative and
    ipsative approaches in the measurement of
    personality. Journal of Occupational and
    Organizational Psychology, 64, 219-238.
  • SHL (1993a). Inventory of Management
    Competencies Manual and Users Guide. Thames
    Ditton, England SHL Group plc.
  • SHL (1993b). OPQ Concept Model Manual and Users
    Guide. Thames Ditton, England SHL Group plc.
  • SHL (1999). OPQ32 Manual and Users Guide.
    Thames Ditton, England SHL Group plc.