Construction and analysis of test - PowerPoint PPT Presentation

1 / 58

About This Presentation

Title:

Construction and analysis of test

Description:

Construction and analysis of test – PowerPoint PPT presentation

Number of Views:117

Avg rating:3.0/5.0

Slides: 59

Provided by: Dell497

Category:

more less

Transcript and Presenter's Notes

Title: Construction and analysis of test

1
ITEM ANALYSIS
2
CONSTRUCTION of test analysis
3
Characteristics of Good Test

Validity
It refers to the appropriateness or truthfulness
of a tool. A tool is valid if it measures what it
is supposed to measure.
Reliability
It refers to the trustworthiness or consistency
of measurement of a tool , whatever it measures.

Objectivity
Refers to the absence of subjective bias in the
interpretation of responses obtained by a tool.
Economy
The test should be simple and administered in a
short time , saving money and time.

Practicability or Feasibility
The test should not require special infra-
structure like dark room, one way see-through
room etc.

6
Decision to gather evidence
â
Decision to allocate resources
â
Content analysis and test blue print
â
Item writing
â
Item review 1
â
Planning item scoring
â
Production of trial tests
â
Trials
â
Item review 2
â
Amendment (revise/replace/discard)
â
More items needed?
â
No
â
Assembly of final tests
7
Trail Test

It involves time and resources
Prepare content analysis and blue print
Review each item before trail testing

8
Content Analysis

What is the area of curriculum is selected?
Are there significant sections in the content?
Are there significant subdivisions in the
content?
Which of the representative areas should include ?

9
Blue Print

Title
Fundamental purpose
The aspects of curriculum covered
For whom the test is constructed
Time ,date, who will administer and who will
score
Weightage for recall , comprehensive and
reflective thinking

10
Blue Print
content Recall comprehension Critical thinking Total
PROSE 2 ITEMS 2 ITEMS 5 ITEMS 9
POETRY 2 ITEMS 4 ITEMS 5 ITEMS 11
GRAMMER 2 ITEMS 4 ITEMS 12 ITEMS 18
CRITICISM 4 ITEMS 2 ITEMS --------------- 6
COMPARISIONS 4 ITEMS 2 ITEMS ------------------- 6
TOTAL 14 ITEMS 14 ITEMS 22 ITEMS 50
11
Item Specification
content Recall comprehension Critical thinking Total
PROSE Items 2,5 Items 12,23 Items 28 ,31,32,40 ,50 9
POETRY Items 6,10 Items 13,14,16,17 Items 33,36,37,38.39 11
GRAMMER Items 1,7 Items 18,19,20,21 Items 21,29,30,41,42,43,44,45,46,47,48,49 18
CRITICISM Items 3,4,8,9 Items 34,35 --------------- 6
COMPARISIONS Items 11,15,22,25 Items 26,27 ------------------- 6
TOTAL 14 ITEMS 14 ITEMS 22 ITEMS 50
12
Scoring Key
1 2 3 4 5 6 7 8 9 10
2 5 1 2 3 4 1 4 5 3
13
Item Revision-1

The dependable inferences can be made about the
choice of the content
All important parts of curriculum is addressed
Achievement over the range is assessed

14
How to review?

Is the item is clear in expression ?
Are the items expressed in a simplest possible
language ?
Are there unintended clues to correct answer?
Is the format reasonably consistent?
Is there a single, clearly correct answer for
each item ?
Is the type of item appropriate to the
information required ?
Are there enough items to provide adequate
coverage to behaviour to be assessed ?

15
Purpose of Trail Test

Establishes the difficulty of each item
Identify the distracters which do not appear
plausible.
Suggest number of items to be included in the
final test
Establishing the contribution of each item to the
discrimination between candidates who achieve low
and high.
Check the adequacy of the administration
instructions to identify misconceptions held by
the students through analysis of their responses.

16
Choosing a Sample

Sample of 100 to 150 students of varied abilities
may be selected
Approximately male and female students are equal
Judgment Sampling technique- Target group

17
Try out of the Test

The test to be administered on a representative
sample , chosen from the target population for
whom the test is intended , and scored . This
pilot study will be useful for the following
To identify the weak or defective item and to
reveal needed improvements.
To determine the difficulty level and
discriminating power of each individual item in
order that a selection of item may be made.

To provide data needed to determine appropriate
time limit for the final test.
To standardize the instruction and procedures.
To know how to organize the items.
To decide the proper format.

19
Scoring of Trail Test

Needs training
Not according to the scorers' judgment
Refer to scoring key
Mechanical scoring is recommended to maintain
accuracy

20
Scores in the Matrix
Item GEET RAI RAJU RANI SURI POO RITA JOE CATH RUTH Total
1 1 1 1 1 1 0 1 0 0 1 7
2 1 0 0 1 0 1 0 0 0 0 3
3 1 1 1 1 1 1 1 1 0 0 8
4 1 1 1 0 1 0 1 1 1 0 7
5 1 1 1 1 1 1 1 1 1 1 10
6 1 1 0 0 1 1 0 0 1 0 5
7 1 1 1 1 0 1 0 1 0 0 6
8 1 0 1 0 0 0 0 1 0 0 3
9 1 0 0 0 0 1 0 0 0 0 2
10 1 1 1 1 1 0 1 0 0 0 6
Total 10 7 7 6 6 6 5 5 3 2 57
21
Arranging Pupil

After scoring the test in the trial test ,
according to the total score value , individuals
are placed in order from high to low .

22
Arranging Pupils' Scores
Item GEET RAI RAJU RANI SURI POO RITA JOE CATH RUT Total
5 1 1 1 1 1 1 1 1 1 1 10
3 1 1 1 1 1 1 1 1 0 0 8
1 1 1 1 1 1 0 1 0 0 1 7
4 1 1 1 0 1 0 1 1 1 0 7
7 1 1 1 1 0 1 0 1 0 0 6
10 1 1 1 1 1 0 1 0 0 0 6
6 1 1 0 0 1 1 0 0 1 0 5
2 1 0 0 1 0 1 0 0 0 0 3
8 1 0 1 0 0 0 0 1 0 0 3
9 1 0 0 0 0 1 0 0 0 0 2
Total 10 7 7 6 6 6 5 5 3 2 57
23
Indices of difficulty and discriminating power of
items

Top 27 constitutes the high achievers and the
bottom 27 constitutes the low achieving group.
The indices of discriminating power and
difficulty level are computed for each item of
the test using the following formulae.

24
Analysis of an Item
I 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1
25

Discriminating power Ph-Pl
U
Difficulty level (Ph Pl )
U
Ph the proportion of pupils in the high
achieving group who answered the items correctly.
Pl the proportion of pupils in the low achieving
group who answered the items correctly.
UTotal number of pupils in both groups

26
Types of Discriminators

Positive Discriminator
Negative Discriminator
Non Discriminator

27
Graphical Analysis of Scores

Acceptable may be acceptable correct answer
response pattern.
Non acceptable correct answer response pattern.

28
Criteria for Selection
Discriminating Power Difficulty Level
.4 above Excellent item Between .4 .6 Average difficulty
Between .4 .3 Good Between .2 .4 Difficult item
Between .2 .3 Average item Between .6 .8 Easy item
Between .2 .1 Requires improvement Between .8 1 Very easy item
Less than .1 Item to be dropped Between 0 .2 Very difficult item
29
Is this a good item ?

Compute the difficulty and discrimination indices
for an item administered to 263 pupils where 74
pupils answered the item correctly, 32 pupils in
upper group and 23 pupil in the lower group
passed the item.
Is this a good item ?

30
Is this a good item ?

Compute the difficulty and discrimination indices
of a test item administered to 84 pupils if 52
test takers answered the item correctly, 20 in
the upper group and 12 in the lower group.
Is this a good item ?

31
Selection of Items

Based on the calculated values of item
discrepancy and difficulty , appropriate items
are chosen for the final form of the standardized
test.
Arranged the items in the increasing order of
difficulty.

32
Assembly of the test in the final form

Based upon discriminating power items are first
chosen and among the so chosen items, items with
proper difficulty level are finally selected for
the final form.
Care should be taken to see that at least 50 of
the items are of average difficulty, 25 are easy
, 20 difficult and 5 are very difficult.

A detailed scoring scheme is also to be prepared
so as to ensure objective evaluation of pupil
responses.
Appropriate instruction/procedure for
administering the test has also to be developed
and incorporated suitably in the test.

34
Advantages of Item Analysis

Powerful technique to improve instruction.
Helpful for guidance.
Valid measures of instructional objectives.
Gives clue to the nature of the misunderstanding
and suggests remediation.

35
Reliability

Stability and trustworthiness is called
reliability.
It should be free from error.
(E.G.) Standford Binets I.Q.
The score is a good estimate of the childs
mental ability.

36
Methods of determining Reliability

Four procedure for computing reliability
coefficient.
Test Retest method
Alternative or Parallel form
Split half technique
Rational Equivalence

37
Test Retest Method

Repetition of the test is the most simplest
method of determining agreement between two sets
of scores.
The test is given and repeated on the same group
and the correlation computed between the first
and second set of scores.

38
Defects in Test Retest method

If the test is repeated immediately, many
subjects will recall their first answer- tend to
increase their scores.
Practice and confidence induced by familiarity
also affect scores.
If the interval is longer ( six month) growth
changes will effect the retest.
Because of these defects test retest is generally
less useful than are the other methods.

39
Alternative or Parallel form method

When alternative or parallel forms of a test can
be constructed , the correlation between form A
and form B may be taken as a measure of the self
correlation of the test.
The alternative form method is satisfactory when
sufficient time has intervened between the
administration of the two forms to weaken or
eliminate memory and practice effects.

When form B of a test follows form A closely ,
scores on the second form of the test will often
be increased because of familiarity.
If such increases are approximately constant
(3 to 5 points) the reliability coefficient of
the test will not be affected, since the paired A
and B scores maintain the same relative positions
in the two distributions.

In drawing up alternative test forms ,care must
be exercised to match test materials for content,
difficulty and form.
When alternative forms are virtually identical ,
reliability will be too high otherwise
reliability will be too low.
An interval of at least two to four weeks should
be allowed between administration of the test.

42
The split half method

In this method the test is first divided into two
equivalent haves and the correlation found for
these half tests .
From the reliability of the half test the self
correlation of the whole test is then estimated
by the Spearman Brown Prophecy formula.

The split half method is regarded by many as the
best of the methods for measuring test
reliability.

Advantage
Advantage is the fact that all data for computing
reliability are obtained upon one occasion. So
that variations brought about by difference
between the two testing situations are
eliminated.

How to divide ?
Alternative Statements
All the items are of equal difficulty

46
Method of Rational Equivalence

This method represents an attempt to get an
estimate of the reliability of a test free from
the objections raised against the methods
outlined above.
Two forms of tests are equivalent when the items
a A , b B ,c C etc are inter changeable and when
the inter item correlations are the same for both
forms.

47
Errors

Chance Error
Many psychological factors affect the
reliability coefficient of a test fluctuations
in interest and attention shifts in emotional
attitude and differential effects of memory and
practice.
The environmental factors such as distractions,
noise , interruptions, scoring errors etc all
these are called chance error or error of
measurement
The scores may go up or down from the true value.

Constant Errors
Constant errors work in only one direction .
Constant error raise or lower all of the scores
on a test but doesn't affect the reliability
coefficient.
Such errors are easily be avoided than are chance
errors by subtracting two points from a retest
score to allow for practice.

49
Validity

The validity of a test or of any measuring
instrument , depends upon the fidelity with which
it measures , what it purports to measure.
A test is valid when the performances which it
measures correspond to the same performances as
otherwise independently measured or objectively
defined.

50
Difference between Reliability and Validity

Suppose that a clock is set forward 20 minutes ,
if the clock is a good time piece the time it
tells will be reliable(consistent) but will not
be valid as judged by standard time.
Validity is a relative term.

A test is valid for a particular purpose or in a
particular situation it is not generally valid.

Content Validity
This requires content analysis. Validity inferred
by subject experts after going through the test
items and giving their opinions to what extent
the test items forms a fair representative sample
of the universe of items that could be , form the
content areas being tested.

Construct Validity
This is the functional aspect of content
validity.
Suppose the test is to measure the creative
writing of students , then the items should cover
the creative expression only.
A well known test on creative expression , as
well as the newly constructed creative expression
test both are administered to a group of students
for whom it is meant.
The coefficient correlation computed for the
scores from the two tests is an index of validity
of the newly constructed test.

Predictive Validity
It is concerned with the relation of test scores
to some measures on future performance.
If scores on a spelling test help us to
differentiate between pupils who will succeed and
pupil who fail in stenography course, then we can
infer that the spelling test has predictive
validity as far as stenography is concerned.
This type of validity is mainly useful in
evaluating aptitude tests.

55
Relations of Validity and Reliability

They differ to different aspects for test
efficiency.
A reliable test is theoretically valid ,but may
be practically invalid , as judged by its
correlations with various independent criteria.
A highly valid test cannot be unreliable since
its correlation with a criterion is limited by
its own index of reliability.