Validity and Reliability - PowerPoint PPT Presentation

Loading...

PPT – Validity and Reliability PowerPoint presentation | free to download - id: 3b9630-NWY4Y



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Validity and Reliability

Description:

A bunch of stuff you need to know Becky and Danny Counterbalancing Counterbalancing Counterbalancing Counterbalancing Counterbalancing Counterbalancing Validity Is ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 52
Provided by: wwwpsychS
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Validity and Reliability


1
A bunch of stuff you need to know
Becky and Danny
2
Counterbalancing
Why you need to counterbalance To avoid order
effects some items may influence other items To
avoid fatigue effects subjects get tired and
performance on later items suffers To avoid
practice effects subjects learn how to do the
task and performance on later items improves
3
Counterbalancing
2 item counterbalance
4
Counterbalancing
3 item counterbalance
5
Counterbalancing
4 item counterbalance
6
Counterbalancing
X gt 4 item counterbalance 1) Create a simple
Latin Square 2) Randomize the rows 3) Randomize
the Colums
7
Counterbalancing
X gtgtgtgt 4 item counterbalance Randomize items
8
Simpsons Paradox
9
Simpsons Paradox
10
Simpsons Paradox
59 41
11
Simpsons Paradox
12
Interactions
Definitions of interactions The whole is greater
than the sum of its parts The relationship
between the variables is multiplicative instead
of additive The effectiveness of one intervention
is contingent upon another intervention
13
Interactions
  • Why are interactions important?
  • Null effects cant get published, the interaction
    solves that
  • Interactions are usually more interesting than
    main effects
  • Like Simpsons paradox, interactions can mask an
    effect

14
Interactions
Yes
No
0
3
No
5
-20
Yes
15
Interactions
Valium
Yes
No
0
10
No
Alcohol
5
100
Yes
16
Interactions
Gender
Female
Male
Scorpion King
10
-10
Movie
Ya Ya Sisterhood
-10
10
17
Validity
  • Is the translation from concept to
    operationalization accurately representing the
    underlying concept.
  • Does it measure what you think it measures.
  • This is more familiarly called Construct
    Validity.

18
Types of Construct Validity
  • Translation validity (Trochims term)
  • Face validity
  • Content validity
  • Criterion-related validity
  • Predictive validity
  • Concurrent validity
  • Convergent validity
  • Discriminant validity

19
Translation validity
  • Is the operationalization a good reflection of
    the construct?
  • This approach is definitional in nature
  • assumes you have a good detailed definition of
    the construct
  • and you can check the operationalization against
    it.

20
Face Validity
  • On its face" does it seems like a good
    translation of the construct.
  • Weak Version If you read it does it appear to
    ask questions directed at the concept.
  • Strong Version If experts in that domain assess
    it, they conclude it measures that domain.

21
Content Validity
  • Check the operationalization against the relevant
    content domain for the construct.
  • Assumes that a well defined concept is being
    operationalized which may not be true.
  • For example, a depression measure should cover
    the checklist of depression symptoms

22
Criteria-Related Validity
  • Check the performance of operationalization
    against some criterion.
  • Content validity differs in that the criteria are
    the construct definition itself -- it is a direct
    comparison.
  • In criterion-related validity, a prediction is
    made about how the operationalization will
    perform based on our theory of the construct.

23
Predictive Validity
  • Assess the operationalization's ability to
    predict something it should theoretically be able
    to predict.
  • A high correlation would provide evidence for
    predictive validity -- it would show that our
    measure can correctly predict something that we
    theoretically thing it should be able to predict.

24
Concurrent Validity
  • Assess the operationalization's ability to
    distinguish between groups that it should
    theoretically be able to distinguish between.
  • As in any discriminating test, the results are
    more powerful if you are able to show that you
    can discriminate between two groups that are very
    similar.

25
Convergent Validity
  • Examine the degree to which the
    operationalization is similar to (converges on)
    other operationalizations that it theoretically
    should be similar to.
  • To show the convergent validity of a test of
    arithmetic skills, one might correlate the scores
    on a test with scores on other tests that purport
    to measure basic math ability, where high
    correlations would be evidence of convergent
    validity.

26
Discriminant Validity
  • Examine the degree to which the
    operationalization is not similar to (diverges
    from) other operationalizations that it
    theoretically should be not be similar to.
  • To show the discriminant validity of a test of
    arithmetic skills, we might correlate the scores
    on a test with scores on tests that of verbal
    ability, where low correlations would be evidence
    of discriminant validity.

27
Threats to ConstructValidity
  • From the discussion in Cook and Campbell (Cook,
    T.D. and Campbell, D.T. Quasi-Experimentation
    Design and Analysis Issues for Field Settings.).
  • Inadequate Preoperational Explication of
    Constructs
  • Mono-Operation Bias
  • Mono-Method Bias
  • Interaction of Different Treatments
  • Interaction of Testing and Treatment
  • Restricted Generalizability Across Constructs
  • Confounding Constructs and Levels of Constructs

28
Inadequate Preoperational Explication of
Constructs
  • You didn't do a good enough job of defining
    (operationally) what you mean by the construct.
  • Avoid by
  • Thinking through the concepts better
  • Use methods (e.g., concept mapping) to articulate
    your concepts
  • Get experts to critique your operationalizations

29
Mono-Operation Bias
  • Pertains to the independent variable, cause,
    program or treatment in your study not to
    measures or outcomes.
  • If you only use a single version of a program in
    a single place at a single point in time, you may
    not be capturing the full breadth of the concept
    of the program.
  • Solution try to implement multiple versions of
    your program.

30
Mono-Method Bias
  • Refers to your measures or observations.
  • With only a single version of a self esteem
    measure, you can't provide much evidence that
    you're really measuring self esteem.
  • Solution try to implement multiple measures of
    key constructs and try to demonstrate (perhaps
    through a pilot or side study) that the measures
    you use behave as you theoretically expect them
    to.

31
Interaction of Different Treatments
  • Changes in the behaviors of interest may not be
    due to experimental manipulation, but may be due
    to an interaction of experimental manipulation
    with other interventions.

32
Interaction of Testing and Treatment
  • Testing or measurement itself may make the groups
    more sensitive or receptive to treatment.
  • If it does, then the testing is in effect a part
    of the treatment, it's inseparable from the
    effect of the treatment.
  • This is a labeling issue (and, hence, a concern
    of construct validity) because you want to use
    the label treatment" to refer to the treatment
    alone, but in fact it includes the testing.

33
Restricted Generalizability Across Constructs
  • The "unintended consequences" treat to construct
    validity
  • You do a study and conclude that Treatment X is
    effective. In fact, Treatment X does cause a
    reduction in symptoms, but what you failed to
    anticipate was the drastic negative consequences
    of the side effects of the treatment.
  • When you say that Treatment X is effective, you
    have defined "effective" as only the directly
    targeted symptom.

34
Confounding Constructs and Levels of Constructs
  • If your manipulation does not work, it may not be
    the case that it does not work at all, but only
    at that level
  • For example peer pressure may not work if only 2
    people are applying pressure, but may work fine
    if 4 people are applying pressure.

35
The "Social" Threats to Construct Validity
  • Hypothesis Guessing
  • Evaluation Apprehension
  • Experimenter Expectancies

36
Hypothesis Guessing
  • Participants may try to figure out what the study
    is about. They "guess" at what the real purpose
    of the study is.
  • They are likely to base their behavior on what
    they guess, not just on your manipulation.
  • If change in the DV could be due to how they
    think they are supposed to behave, then the
    change cannot be completely attributed to the
    manipulation.
  • It is this labeling issue that makes this a
    construct validity threat.

37
Evaluation Apprehension
  • Some people may be anxious about being evaluated
    and consequently perform poorly.
  • Or because of wanting to look good social
    desirability they may try to perform better
    (e.g. unusual prosocial behavior).
  • In both cases, the apprehension becomes
    confounded with the treatment itself and you have
    to be careful about how you label the outcomes.

38
Experimenter Expectancies
  • The researcher can bias the results of a study in
    countless ways, both consciously or
    unconsciously.
  • Sometimes the researcher can communicate what the
    desired outcome for a study might be (and
    participant desire to "look good" leads them to
    react that way).
  • The researcher might look pleased when
    participants give a desired answer.
  • If this is what causes the response, it would be
    wrong to label the response as a manipulation
    effect.

39
Reliability
  • Means "repeatability" or "consistency".
  • A measure is considered reliable if it would give
    us the same result over and over again (assuming
    that what we are measuring isn't changing!).
  • There are four general classes of reliability
    estimates, each of which estimates reliability in
    a different way.

40
Reliabilty (continued)
  • Inter-Rater or Inter-Observer Reliability
  • Test-Retest Reliability
  • Parallel-Forms Reliability
  • Internal Consistency Reliability

41
Inter-Rater or Inter-Observer Reliability
  • Used to assess the degree to which different
    raters/observers give consistent estimates of the
    same phenomenon.
  • Establish reliability on pilot data or a
    subsample of data and retest often throughout.
  • For categorical data a X2 can be used and for
    continuous data an R can be calculated.

42
Test-Retest Reliability
  • Used to assess the consistency of a measure from
    one time to another.
  • This approach assumes that there is no
    substantial change in the construct being
    measured between the two occasions.
  • The amount of time allowed between measures is
    critical.
  • The shorter the time gap, the higher the
    correlation the longer the time gap, the lower
    the correlation

43
Parallel-Forms Reliability
  • Used to assess the consistency of the results of
    two tests constructed in the same way from the
    same content domain.
  • Create a large set of questions that address the
    same construct and then randomly divide the
    questions into two sets and administer both
    instruments to the same sample of people. 
  • The correlation between the two parallel forms is
    the estimate of reliability. 
  • One major problem with this approach is that you
    have to be able to generate lots of items that
    reflect the same construct. 

44
Parallel-Forms and Split Half Reliability
  • The parallel forms approach is very similar to
    the split-half reliability described below. 
  • The major difference is that parallel forms are
    constructed so that the two forms can be used
    independent of each other and considered
    equivalent measures. 
  • With split-half reliability we have an instrument
    that we wish to use as a single measurement
    instrument and only develop randomly split halves
    for purposes of estimating reliability.

45
Internal Consistency Reliability
  • Used to assess the consistency of results across
    items within a test.
  • In effect we judge the reliability of the
    instrument by estimating how well the items that
    reflect the same construct yield similar
    results. 
  • We are  looking at how consistent the results are
    for different items for the same construct within
    the measure.  

46
Kinds of Internal Reliability
  • Average Inter-item Correlation
  • Average Itemtotal Correlation
  • Split-Half Reliability
  • Cronbach's Alpha (a)

47
Pragmatics
  • Gricean Maxims
  • Quality Speaker is assumed to tell the truth
  • Quantity Speakers wont burden hearers with
    already known info. Obvious inferences will be
    made
  • Relation Speaker will only talk about things
    relevant to the interaction
  • Manner Speakers will be brief, orderly, clear,
    and unambiguous.

48
Pragmatics
  • Examples of where this breaks down
  • Piagetian conservation tasks
  • Representativeness The Linda Problem
  • Dilution effect nondiagnostic information
  • Implanted Memories cooperative vs. adversarial
    sources
  • Mutual Exclusivity

49
Pragmatics
  • Examples of where this breaks down
  • Framing effects
  • Inconsistent responses due to pragmatics the
    part whole problem
  • Conventional implicatures all vs. each and
    every

50
Manipulation Checks
  • Have them.
  • Lots of them.

51
Validity and Reliability
  • Graduate Methods
  • Becky Ray
  • Winter, 2003
  • For further reference see http//trochim.human.co
    rnell.edu/kb/
About PowerShow.com