Title: IN THE NAME OF GOD TEST CONSTRUCTION WORKSHOP
1IN THE NAME OF GOD TEST CONSTRUCTION WORKSHOP
- J.KOOHPAYEHZADEH M.D , MPH
- Education development center
- Iran University of Medical Sciences
2- Tell me, I forget.
- Ask me, I remember.
- Involve me, I understand.
3Why Test?
- Testing is 50 of Teaching
4Well defined educational objectives prerequsite
for assessment
- Example for this session
- At the end of this session participants will be
able - To named at list three differences between
summative and formative assessment - To make a list of at least three written AM
- To name the most effective AM to assess clinical
skills - To describe the most effective AM to assess
attitudes
5 ???? ?????
6 ???
???????? ??? ???
???????? ????? ????
?????????? ??? ??????
???????? ???? ???? ???????? ??????? ???????? ??????? ???????? ??????? ???? ????? ??? ????? ???? ???????
???????? ???? ???? ?????? ?????? ????? ???? ????? ??? ????? ???? ???????
7Evaluating StudentsTests ARE Not the Only Way!
- Tests
- Projects
- Performance
- Participation
8??????????? Measurement
- ??????? ?? ????? ?????? ?? ??? ?? ?? ?? ????? ??
?? ????? ?? ???? ???????. - ?????? ???? ???? ??? ???? ?????? ??????
9???? Assessment
- ???????? ??????? ???? ????? ?? ?? ????? ?? ??????
10EVALUATION ????????
????? ???? ???? ?? ??? ?? ????? ????? ?????
????? ?? ???? ????? ???? ?? ????? ????? ????? ??
????? .?????? ??????? ???? ????????? ????? ?
????? ??????? ?? ????? ????? ????? ??????? ??
????? ?????? ???????? ??????? ? ?????? ???? ??
??????????? ???? ????? ?? ????? ?? ????? ??????
?????? ?????? ?? ?? ?????? ????? ???.
11????? ??????????? ????? ?? ??? ?? ?? ???.
????? Test
12???????
???? Why
????? ? How
?? ????? When
?? ???? ??? What
13?? ???? ??????? ????? When?
-
- ?? ????? ????? Summative
- ?? ??? ????? Formative
- ??? ?? ????? Pre-test
14??? ??????? ????????WHY?
- ????? ?? ???????
- ???? ????? ??????
- ???? ????? ????
- ????? ????????? ???????
- ?????? ??????
- ????? ????
- ??? ?????? ??????
15Why Evaluate Students?
- To help students improve
- To assess student learning
- To determine if the teacher is teaching
- Motivation tool
- To communicate with others such as parents
16Why Assess?
- To certify competence (S. A.)
- To assess the progress of learning
- To aid learning (F. A.)
- To diagnosis learning problems (D. A.)
- To assess effectiveness of faculty teaching
- To assess effectiveness of educational program
17???? ???
- ??? ?????? ????????? ???? ???? ?? ???????? ?????
???? ?? ???? ???? ?? ?? ?????? ???? ?? ????? ????
?? ?????? ????? ???? ???? ?????? ??? ????? ??. - ??? ?? ?????? ????????? ???? ????? ?? ??????? ???
????? ?? ???? ?????? ?? ?? ???? ?? ???? ???
??????? ??? ?????? ?? ??? ????? ???.
18???? ???
- ????????
- ???? ?? ??? ???? ???? ???? ???????? ????? ?????
?????? ??? ?? ????????? ??? ????????. - ??? ?? ?? ?????? ?? ??? ?? ?? ?? ????? ??? ?????
???? ???? ??????? ?????? ???? ????? ?????? ??
????????? ?? ???? ?? ? ??? ???? ??? ??????.
19What?
????
?????
????
20What to Assess?
21What to Assess?
- Knowledge
- Relevant knowledge Objectives according to need
to know based on common clinical practice - Test knowledge application problem solving
(interpretation, analysis, synthesis) not just
facts
22What to Assess?
- Skills
- Clinical Hx, PE, Procedural skills
- Communication skills
- Critical reasoning skills Data interpretation
and decision-making
23What to Assess?
- Attitude and behaviors
- Honest, has integrity, not rigid
- Responsible, punctual regular, complete tasks
- Team player or leader
- Empathetic, patient advocate
- Effective communication skills
- Used best current evidence
24What to Assess?
- Competence
- Problem solving
25What can be assessed by different AM?
- Factual knowledge
- Interpretations
- Problem-solving skills
- Ethical
- Clinical skills
- Emotional reactions
- Communication
26Who Should Assess?
- Faculty
- Self
- Peers
- Tutors
- Other team members
- Standardized patients, patients
- External and internal examiners
- Public, society,
27Where?
Work Place Assessment
Does
Shows how
Test Center/Skill Lab
Knows how
Examination Hall
knows
Examination Hall
28?? ???? ?? ???????? ????????WHAT? ????
???????? ????????HOW?
?? ????
????? ???? (knowledge)
???? ? ????? ?????? Practice))
?????? ?? ??????? ?? ?? ???? ? Rating
Scale ????? Attitude)) ??????
?? ??????? ?? ?? ???? ? Rating Scale
29How to use assessment?
- Summative usually undertaken at the end of a
training programme and determines whether the
educational objectives have been successfully
achieved. - With summative assessment the students usually
receives a grade or a mark. Exam - Formative This is testing that is part of
developmental or ongoing teaching / learning
process. It should include delivery of feedback
to the student.
30Summative - Examination
- What the exams are?
- For students A difficult and unpleasent
steeplechase to run on a way for diploma - For teachers A less desirable teaching activity
- For public An important protection from
un-competent doctors
31Summative assessment
- The reasons
- A statement of achievement - university degree
(diploma) - An entrance requirement to an educational
institution - A guide as to the wisdom of continuing with
further study - A certification of competence public
responsibility (licence) - A determinant of programme effectiveness
32Formative assessment
- The reasons
- Information for the student about his/her
achievement of educational objectives - Repetitive progress measurement
- Discover week pointsteachers support
- Help to a teacher to correct programme
- The results should not be used in summative
assessment
33Formative assesssment
- Feedback
- Feedback
- Feedback
- Feedback
- Feedback
341. ???? ?????? (Knowledge) Cognitive Domain
2. ???? ????? Attitude Domain 3. ???? ??????
Psychomotor Domain
????? ???????? ???????
35?????? ??????? ? ????????? ?????? ?? ???? ????
?????? ??????????? ? ????????? ? ????? ?
??????????? ? ????? (?? ?????)?????? ??????? ?
????????? ?????? ?? ???? ???? ????? ???????? ??
???? ? ???? ?? ?? ???? ????? ?? ??????????
??????????? ??????? ? ????????? ?????? ?? ????
????? ???? ???? ?????????? ? ????? ??? ??
?????????? ? ????? ??? ???? ???????? ??? ????
?????? (????)
???? ???????
36General instructional objectives
GIO
- ???? ??????? ? ??????????? ?? ?????? ??? ?? ????
???? ???? ?? ???? ? ?????? ????? ?? ????? ?????
?? ?? ??? ???? ???. - ???????
- ?????? ??????? ???? ?????? ???? ?? ??? ????
?????. - ???? ??? ???? ?? ????? ????????? ?????
- ?????? ?? ???? ? ???? ????????? ? ????? ??? ?? ???
37?????? ????
?? ????? ?????? ??? ???? ?????? ???? ??????? ?
??????????? ??? ?? ???????? ?? ????? ????? ?? ??
??? ????????. ??????? ????? ?????? ?????? ????
???????.
38???????? ???? ???? S.O.B Specific
Observable Behaviors
- ????? ??? ???? ?????? ???? ? ?????? ?? ?????? ??
????? ????? ??? ???? ? ???? ??? ?? ??? ???? ?????
?? ??? ???? ???. - ???????? ????? ??????
- ????? ????? ?????
??? ?????? - ????
- ?? ????? ?? ?? ???? ??????? ? ?? ? ???
?????? ???? ???. - ????? ????? ????? ??
????? ???
39????? ????? ???? ????
- ?? ?? ?????????? ???? ???? ?? ????? (?????)
- ???? ?? ???? ?????? ???? ???????? ???? ( ?????)
- ?? ??? ??? ?? 2/1 ???????? ?? ??80 ????? ?????
??? -
-
?????
???
40ABCD model
- A (Audience)
- B (Behavior)
- C (Condition)
- D (Degree)
- Performance Agreement
41 42Stages of test development
- Conceptualization
- Construction
- Tryout
- Item analysis
- Revision
43Conceptualization
44Conceptualization
- What will it measure?
- What is the objective?
- Is there a need?
- Who will use it?
- Etc
45Test Construction Principles
- Adequate provision should be made for evaluating
all the teacher objectives of the instruction. - The test should reflect the approximate
proportion of emphasis in the course.
46Preparing the test
- The preliminary draft of the test should be
prepared as early as possible. - As a rule the test should include more than one
type of item.
47Preparing the test, continued
- The content of the test should range from very
easy to very difficult for the group being
measured. - The items in the test should be arranged in order
of difficulty. - The items should be so phrased that the content
rather than the form of the statement will
determine the answer.
48Preparing the test, continued
- A regular sequence in the pattern of response
should be avoided. - The directions to the pupils should be as clear,
complete and concise as possible. - One question should not provide the answer to
another question.
49Tryout
- Tried on similar population to that of interest
- Standardized conditions
- 5-10 people for each item on the test, but the
more the better - Good items
- Determined by item analysis
50Item Analysis
- Process of determining which items are good
- Tools in item analysis
- Item difficulty index
- Item reliability index
- Item validity index
- Item discrimination index
51Item Difficulty Index
- Underlying assumption Every item should be
failed or passed based on the testtakers level of
knowledge about the material - Proportion of the total number of testtakers who
got the item correct - Pn
- Can calculate the average item difficulty on a
test - Optimal average item difficulty is the midpoint
between 1.00 and chance success - For true/false .50 1.00 1.5/2 .75
- For four option multiple choice???
52Item Reliability Index
- Internal consistency of a test
- Higher this index greater internal consistency
- Use factor analysis
- Want to maximize internal consistency so choose
those items
53Item Validity Index
- Indication of the degree to which a test measures
what it is supposed to measure - Higher item validity index higher
criterion-related validity - Want to maximize criterion-related validity so
choose those items
54Item Discrimination Index
- Indicates how well an item discriminates between
high scores and low scorers - Want high scorers to answer correctly and low
scorers to answer incorrectly otherwise throw
out item - d
- Higher value of d, the great number of high
scorers answering correctly - Negative d, low scorers more likely than high to
answer correctly
55Characteristics of assessment Tools
56Reliability
- What is it?
- Given the same test on same person at the same
time Same test result - Should differentiate between well and ill
- Importance
- If the tests result changes, test is not reliable
- If test is unreliable, can not say whether
student passes or fails
57Reliability
- If an assessment is repeated with the same
trainees, they should get the same results
58Validity
- What is it?
- the degree to which a measurement instrument
truly measures what it is intended to measure - Importance
- If the assessment test does not test what it is
meant to test so the test is useless - Reliability is a pre-req for validity but not
sufficient by itself
59Validity
- Validity is the degree to which the inferences
based on scores are correct
60Standardization
- What is it?
- All students are tested on the same test items,
patients, tasks according to the same criteria - Importance
- So that no one gets more easy or difficult
questions (Fairness)
61Feasibility
62Objectivity
- What is it?
- it is a level of agreement among independent
assessors (experts) about the right answer to
certain question - Importance
- Decreases intra-rater and inter-rater bias
63?????? Validity????? ??? ?? ????? ??????????? ??
??????????? ????? ???? ??? ?????? ???????
Reliability????? ???? ?? ????? ??????????? ??
??????????? ?? ??????????? Objectivity????
????? ??? ???????? ????? ?????? ????? ???? ?? ??
??????? ??? ???? ?? ?? ?? ????? ?????
??????????????? ???? Practicability????? ???
??????? ?? ?? ????? ?? ???? ?????? ????? ? ??
???? ?????????
???????? ?? ?????
64????? ???? ????? ? ??????
Validity- Reliability
validity reliability
validity- Reliability-
65????? ???? ?? Validity ????? ???? ???? ? ??
????? ???? ???? ????? ???????? ??? ???????? ??
????? ??? ???? ????? ??? ????? ? ??????
?????? ????? ??? ???? ????? ???? ?????? ??? ?
????? ?????? (?? ???? ?? ???? )????? ???? ??
Reliability ????? ????? ?? ??????? ???? ????
????? ? ????? ?????? ???????? ????
???????? ?? ?????
66 ????? ???? ?? Objectivity ?? ???? ????
??? ?? ?? ????? ???? ?? ????? ????? ????. ?????
??????? ? ???????? ?????? ?? ?????? ???? ??????
??????? ????.????? ???? ?? Practicability ?????
? ?? ????? ????????? ????. ??????? ????? ?????
???? ????? ? ???? ???? ???? ?????? ?? ????
??????? vc
???????? ?? ?????
67Metric characteristics of AM
- Validity - the degree to which a measurement
instrument truly measures what it is intended to
measure - Reliability it is an expression of the
precision, consistency and reproducibility.
Ideally, measurements should be the same when
repeated by the same student or made by the
different assessors. - Relevance it is a degree to which the
assessment questions and educational objectives
are in concordance - Objectivity it is a level of agreement among
independent assessors (experts) about the right
answer to certain question
68Components of Good Test
- Validity
- Reliability
- Objectivity
- Discrimination
- Comprehensiveness
- Score-ability
69???? ?????? ?????(Table of specifications)
- ?? ???? ?????? ???
- 1- ??? ???? ?????? ?????? ???? ???
- 2- ??? ????? ???? ???? ??????
- (???? ? ????? ? ??????? ????? ? ??????..)
70????? ? ????? ?????? ??? ???? ? ???? ?????? ??????
0???? 0???? 1???? 2???? ??????? ???
1???? 1???? 1???? 2???? ???
0???? 1???? 1???? 1???? ??????? ?? ????????
71???? ?????? ?????
????? ?? ??????? 3. 2. 1. ??? ????? ??? ??? ??? ????? ??? ???
1. 2. 3. ????
1. 2. ??????
?????
?????
????????
????? ?? ??????? ????? ?? ???????
???? ??????? ???? ???????
72 ????? ???????? ?? ??? ????? ??
????? ??????? ??????? ????? ???? ??
?????(???) ????? ?? ??????? ?????
?? ???? (???? ????) ???? ??????? ?? ???
100 ???? ??????? ????? ?? ?????
????? ??????? ???????????? ??????? ????? ?????? ?? ???? ???? ?? 2???? ???? (36)
6 11 4 2 8 1. 2. 3.
50 100 36 ???
- ?? ?? ?????? ??? ?? 11100 11/0 4 ????
??????? ????? ??? ???? - ?? ????? 50 ????? ?? ??? ???? ???? ???? ???? ???
????? ??????? ????? ?? ??? ?? ?????? . - 11 100
- 50
36
6
73Thank you for your Time
- Any Questions or Comments?
74 1. ???? (Written) ???? MCQ ???
???? Essay 2. ????? (Oral)3. ????(Practical)
MSF OSCE DOPS Log Book Portfolio
MiniCEX
????? ???????
75What are assessment tools?
????? ?????? Assignments
76Student Assessment
- Direct Methods
- Real Group dynamics assessments, ward
observations, lab observations, ward evaluation. - Simulated OSCE, OSPE, GOSPE, ICE, SCOPE.
- Indirect methods
- Written tests MCQs, SEQs, MEQs, PMPs, long essay
questions, questionnaires. - Oral tests unstructured and structured oral
exams - Practical tests PETs, portfolios
77????? ???????? ??????
- ?????? ???? Extended response
- ??? ????? ? ????????
- ????? ???? Restricted response
- ???? ??????? ??????? ? ?????
78????? ???????? ????? ????
- ???? ???? ????? ???? ?????? (?????? ?? ????? ??
??? ????) - ?????
- ???? ?????
- ?????? (?????)
79????? ???????? ???? (objective)
- ????- ??? True- False
- ??? ????? matching ?
- ??? ???????? Multiple- choice
80- Action
- Professionalism Eval Form
- End-of-Rotation Eval
- 360 Evals
- Mini-CEX
- Critical Incident Reports
- Record Reviews
- Decision Making
- OSCE
- SP Exam
- Computer Simulated Patient
- Reasoning
- Oral Exam
- Essay
- MCQ
- Awareness
- Oral Exam
- Essay
- MCQ
ASSESSMENT TOOLS
Action Decision Making Reasoning Awarene
ss
DOES
Millers Pyramid Miller 1990
81How to assess Knowledge, Skills, Attitudes
Written Exams Clinical Exams Viva
Knowledge
Psychomotor skills - -
Attitude -
82True and False Items
- Make approximately half of the items true and
half false. - Do not lift statements directly from books.
- Use direct statements.
- Avoid words with general meanings such as large,
great, many and few.
83True and False Items, Continued
- Whenever you use words such as no, never, always,
may, should, all and only be sure that they do
not make the correct answers obvious. - The question is usually false when all, always,
none, never and all-inclusive terms are used. - The question is usually true when usually or
sometimes is used.
84True and False Items, Continued
- Do not make the true statements consistently
longer than the false statements. - Avoid negative statements.
85Matching
- The number of possible responses should exceed
the number of questions. - Have 5-7 items to be matched.
- Directions should tell if a response can be used
more than once.
86Recall Tests(Completion, Listing)
- Use direct questions whenever possible.
- Make sentence-completion items as specific as
possible. - In simple recall items place the blanks near or
at the end of the statement. - Construct the item so there is only one correct
response.
87Recall Tests, Continued
- Design enumeration items to call for specific
facts. - In fill-in-the-blank items, have all the blanks
the same length. - Do not leave too many blanks in the statements.
88Essay Tests
- Before writing the question, know exactly what
mental process of the student you want to bring
out. - Start essay questions with
- compare,
- contrast,
- give the reasons for,
- present the arguments for and against,
- give original examples of,
- explain how or why.
89Essay Tests, Continued
- Use clear, precise questions.
- Do not have too many questions for time
available. - Make a list of all pertinent points that should
be covered in the students answer for each
question. Use these when grading.
90????? ?? ????? ???????? ????
- ?????? ?? ?? ????? ??? ???? ????
- 1- ????- ???
- 2- ????????
- 3- ???????????
- 4- ????? ????
- 5- ??????
- ?????? ?? ???? ?? ????? ???? ???.
- ?????? ?? ?? ????? ?????? ???? ????? ?? ????? ??
???? ????.
91MCQ
??? ????
????? ?? ????
???? ??????? Destructor
???? ???? Key
92????? ???????? ??? ????????
- ???? ????? ????
- ?????? ????? ????
- ????
93?????? Millman ?? ???? MCQ
9421 ????? Millman ?? ???? MCQ
- 1- ???? ???? ????? ???? ? ?????? ?? ?? ??????.
- 2- ?? Item ???? ?? ?? ????? ????? ???? ( ??? ???
???? ?????) - 3- ?? ??? ??????? ???? ?? ???? ???????????
??????? ???. ?? ???? ????? ??? ??? ??? ???? ????
?? ????? ??? ?? ?? ???? ???? ????? ???.
9521 ????? Millman ?? ???? MCQ
- 4- ???? ????? ???? ????? ????? ??? ?? ???? ???
????? ?? ???? ????? ???????? ???? ????? ?????
???? ????. ????? ?? ??? ???? ??????????? ????? ??
?????? ????. - 5- ?????? ???? ???? ?????? ??? ?? ?? ?????
??????? ? ????? ??????? ???. (?? ??????? ??? ??
?? ???? ?????? ???? ????? ????) - 6- ?? ???? ???????? ?? ??? ???? ?????? ???????.
???? ??? ??? ?? ???? ????? ??????????? ?????
?????? ???? ?????? ???.
9621 ????? Millman ?? ???? MCQ
- 7- ?????????? ????? ???????? ???? ????? ????.
- 8- ?? ?? ????? ?? ???? ??? ?? ???? ???? ?????
???? ???. - 9- ??????????? ?? ????? ????? ?? ???????? ???????
??? ??? ????? ????? ???? ????? ????.
9721 ????? Millman ?? ???? MCQ
- 10- ??????? ??????? ???? ????? ? ???? ???? ????
(?? ????? ?? ???? ????? ??? ? ??? ????? ??
??????????? ?????). - 11- ???? ???????? ?? ??? ????? ???? ? ???? ?????
???? ????? ?? ???? ????? ???? ???? ??? ???? ?????
??? ??? ???????? ??? ??? ??? ?????. - 12- ????? ?? ??? ??? ????? ?????? ??? ? ???????
????? ?????.
9821 ????? Millman ?? ???? MCQ
- 13- ???? ? ???????? ???? ?? ??? ????? ???????
????? ?????? ? ??? ??????? ? ???? ????. - 14- ?? ????? ???? ???? ?? ?????? ??????? ???????
??????? ???. - (?????? ???? ?? ? ? ? ???? ???? ????? ?? ??????
?? ???? ? ?????)
9921 ????? Millman ?? ???? MCQ
- 15- ????? ?? ????? ????? 4 ????? ????? ?????.
- 16- ?? ???????? ??????? ?? ????? ????? ??? ????
? ????? ????? ???? ??????? ???. - 17- ?? ???????? ??? ????? ???? ??????? ???.
- 18- ?? ???? ???? ???? ???????? ?? ???? ?? ?????
???? ???? ??????? ???
10021 ????? Millman ?? ???? MCQ
- 19- ???????? ????? ???? ?????? ?? ?? ????? ?? ??
????? ?????. - 20- ?? ??????? ????? ? ??? ??? ?????? ????
??????? ???. - 21- ?? ???? ???? ?? ??? ? ??? ?? ?????? ??
?????? ????? ?????? ?? ????? ???? ? ??? ?? ?? ???
????? ? ?????? ????? ?? ?? ?????? ????.
101????? ?? ??? 21 ?????
- 22- ??????? ?? ????? ??? ???? ??????? ????? ????
???. - 23- ??????? ?? ?????? ???? ?? ???? ??????? ???.
- 24- ????? ???? ????? ?????? ??? TEXT ??????? ???.
102Thank you for your Time
- Any Questions or Comments?
103Blooms Taxonomy
Cognitive Affective Psychomotor
Knowledge-Recall Comprehension Application Analysis Synthesis Evaluation Receive/Attend Respond Valuing Synthesizing Characterized by internal values Perception of sense Preparatory Adjustment Guided Response Complex overt Response Adaptation Origination
104??? ????????
???????? ??? ?????
????? ????? ??? ????? ? ?????
????? ? ????? ????? ? ????? ????? ? ????? ??? ??????
?????? ?????? ?????? ?????? ??? ???
??? ??? ??? ??? ??? ??? ????
???? ???? ???? ???? ???? ????
?????? ?????? ????? ???? ??????
105 ???? ???????? ?????? Recognition Recall
??? ???? ??????. ????
??????. ????? ??????. ???????? ????
??????. ???? ??????. ?????
??????. ????? ??????. ????? ?? ???
??? ???? ?????? (Knowledge)
Tax-1
106??? ????? ???
????? ??? ??? ?????
????? ????? ??? ???? ?????
???? ????? ???? ????? ???? ????? ??? ?????
????? ????? ????? ????? ??? ??????
?????? ?????? ?????? ?????? ??????
?????? ?????? ????? ???? ?????
107 ????? ??????? ?????? ?????? ???????
??????? ???????????????? ??????? ? ??????
?????????? ???? ?? ?????? ????.???? ??????-
??? ??????- ????? ???????. ?????? ??????- ??????
??????. ????? ??????- ?????? ??????. ??????
??????- ????? ??????. ???? ??????- ?????
??????.????? ?? ??? ??? ???? ??????
Interpretation for Application
Tax-2
108???? ???
??????? ?????
????? ??? ???? ???
?????? ? ?????
?????? ?????? ????? ???? ????
109 ??????? ????? ?????? ????????
???? Problem Solving???? ???????
?????? ????? ?????? ?????? ?????? ?????-
????????? ???????- ????? ???????????- ?????
??? ???????- ??? ???? ????? ??
??? ??? ???? ?????? Evolution
Tax-3
110Thank you for your Time
- Any Questions or Comments?
111 ???? ?????? ???? ??? ?? ????? ??????????
????? ???? ?? ????? ????? ?????? ???? ???
?? ???? ???????? ????? ?????? ????? ???????
???????? ???? ????? ???? ??????
????? ???????
?????? ?? ???? ?????
M.P.L. Minimum Pass Level
112Item Analysis
- Main purpose of item analysis is to improve the
test - Analyze items to identify
- Potential mistakes in scoring
- Ambiguous/tricky items
- Alternatives that do not work well
- Problems with time limits
113Criterion- ReferencedandNorm- ReferencedTESTS
????? ???????
???????? ??????(?????) ???????? ?????? (??????)
114TYPES OF TESTS BY PURPOSE
1. Norm-referenced Tests a. Discrimination
most important aspect b. Easy items
eliminated 2. Criterion-referenced Tests a.
Discrimination not of critical importance.
b. Items not altered or eliminated due to
difficulty
115Criterion- Referenced
??? ?? ??????? ????? ???????? ???? ??? ??????? ??
??? ????? ???? ? ??????????? ??? ????? ?????? ?
???? ?????? ?? ??? ?????? ?????? ?? ????? ??
?????? ????? ?? ?? ???????? ????? ??? ?????
???????. ??? ??? ????? ???? ???????? ????? ? ???
????? ????????? ?????? ????. ???? ????? ?????
??????? ?????? ????? ???????? ?????
116Norm- Referenced
????? ???? ???? ?? ???? ????????? ?? ?? ??????
???????. ?????? ????? ????? ??????? ? ?? ?? ????
?? ????? ??? ??? ???? ????????? ?????
??????. ??? ??? ????? ???? ???????? ????? ?
?????? ?????? ????. ???? ????? ????? ?????????
117????? ?????? ??????? ?? ???????? ??????Norm
Reference
118ITEM ANALYSIS
- an Assessment tool
- has 3 parts
- 1. Item Difficulty
- 2. Item Discrimination
- 3. Distraction Analysis
119 1. ????? ???? ?? ?? ?? ????????? 2. ???? ????
????????? ?????? ??????? 3. ????? ??????? ???? ?
????? 4. ?????? ???? ? ???? ?????? ???? ??
????? 5. ?????? ???? ? ???? ????? ???? ??
????? 6. ??????? ??????? ???????
????? ????? ? ????? ???????
120???? ????? ?????
????? ????? ???? ???????? ????? ????? ????? 2/11/73 ????? ????? ???? ??????? ???? ?? ?? ????? ??? ???? ???? ??????? ?????? ???? ???- 55/0 ?- 61/0 ?- 49/0 ?- 23/0 ????? ????? ???? ???????? ????? ????? ????? 2/11/73 ????? ????? ???? ??????? ???? ?? ?? ????? ??? ???? ???? ??????? ?????? ???? ???- 55/0 ?- 61/0 ?- 49/0 ?- 23/0 ????? ????? ???? ???????? ????? ????? ????? 2/11/73 ????? ????? ???? ??????? ???? ?? ?? ????? ??? ???? ???? ??????? ?????? ???? ???- 55/0 ?- 61/0 ?- 49/0 ?- 23/0 ????? ????? ???? ???????? ????? ????? ????? 2/11/73 ????? ????? ???? ??????? ???? ?? ?? ????? ??? ???? ???? ??????? ?????? ???? ???- 55/0 ?- 61/0 ?- 49/0 ?- 23/0 ????? ????? ???? ???????? ????? ????? ????? 2/11/73 ????? ????? ???? ??????? ???? ?? ?? ????? ??? ???? ???? ??????? ?????? ???? ???- 55/0 ?- 61/0 ?- 49/0 ?- 23/0 ????? ????? ???? ???????? ????? ????? ????? 2/11/73 ????? ????? ???? ??????? ???? ?? ?? ????? ??? ???? ???? ??????? ?????? ???? ???- 55/0 ?- 61/0 ?- 49/0 ?- 23/0 ????? ????? ???? ???????? ????? ????? ????? 2/11/73 ????? ????? ???? ??????? ???? ?? ?? ????? ??? ???? ???? ??????? ?????? ???? ???- 55/0 ?- 61/0 ?- 49/0 ?- 23/0
???? ???? ? ? ? ??? ??????
10 10 2 0 0 0 3 3 5 2 0 5 25 ???? 25 ????? ???? ?????? 35 ???? ????3/0
121Tests of individual differences
- Two groups of individuals
- U Upper group 27 of highest scorers
- L Lower group 27 of lowest scorers
- U L
Upper group individuals who got the item right
Lower group individuals who got the item right
item difficulty index
item discrimination index
122Example cont.
- 60 students who took the test.
- Item 14 Among 16 upper scorers, 5 have the item
right. Among 16 lower scorers, only 1 has the
item right.
123Guidelines for p
- Consider the purpose of the test
- p should be low for selection tests that will
select a small of examinees (e.g.,
scholarships) - p should be high if the test is assessing need
for remedial education - p should be around .5 if testing a broad range of
abilities - In a MC test, p should depend on the number of
options
124Guidelines for d
- D is ideally 1, but it never really is 1 in
practice - D.30 is usually assumed acceptable
- D and p are interdependent, so if p is extreme, D
may be lower than .30
125Item Validity
- Validity of an item w.r.t. an external criterion y
Mean criterion score for those who got the item
right
Total number of examinees
point biserial correlation
Number of examinees who got the item right
Mean criterion score for everyone
SD of criterion score
126Example
- A new translation of S-B IQ test administered to
60 Turkish students - Turkish version of WISC-R is also administered to
check validity - Item 12 18 got it right. Mean WISC-R for those
who got it right is 106. Mean WISC-R for all is
97. SD of WISC-R is 14. - What is the validity of item 12?
127ITEM ANALYSIS
- Difficulty Index
- Level of difficulty of an exam or a question
- 0 Difficulty 1 Easy
- Discrimination Index
- AKA Discriminant Index
- Ability of question to discriminate between
- Students who know the information
- Students who DO NOT know the information
128ITEM ANALYSIS
- Difficulty Index (D) 0 - 1
- Top 1/3 Scores Bottom 1/3
Scores - D N correct N correct
- N N
129ITEM ANALYSIS
- Difficulty (D) 0 - 1
- 0______________0.5____________1.0
- Hard Moderate
Easy -
-
130ITEM ANALYSIS
- Example
- 30 students in class
- 5 of Top 10 scorers got ? correct
- 3 of Bottom 10 scorers got ? correct
- D 5 correct 3 correct 8 .4
(Moderate 10 10 20
Difficulty) -
131Item Difficulty
- Defined as the proportion of people who get the
item correct - Symbolized by p
- p ( who were correct)/ ( who responded)
- Difficulty should be greater than the percent who
could get the item correct by chance
132?????? ???? ?????? ?????
- ????????? ???? ???? ????? ????????? ????
???? ???? - ????? ????? ???? ???? ????? ????? ????
????? - ???? ?????? ???? ?????? ?????
- 25
- 1010
- 7
- 20
- 35
- ?? ?????? ???? ?????? ?? ????? ?????? (?? 100
???????) ???? ?? ????? ?????? ???
100?
???? ?????? ????? P
100?
???? ?????? ????? P
100?
133????? ???? ?????? ?????
- ???? ??????? ????? ???? ?? ?? ????? ?????? ??
????? ?????? ???? ?? ????? ????? ??? - (P-1) ? P ??????? ?????
- 0(0-1) ?0
- 0(1-1) ?1
- (P-1) ? P ??????? ?????
- (5/0-1) ? 5/0
- 25/05/0 ?5/0
- ?? ?????? ?? ???? ?????? ???? ??????? ?? ???
????? ?????? ????????? ???? ????? ?? ???? ??????
???? ?? 1 ???? ? ?? ??? ????? ? ?? 5/0 ?????
????. - ???? ?????? ?????
- 0.3-0.7
134ITEM ANALYSIS
- Discrimination Index (P) 0-1
- (AKA Discrimination Index)
- Top 1/3 Scores
Bottom 1/3 Scores - P N correct - N correct
- 1/2 (N)
135ITEM ANALYSIS
- Discrimination Index
- 0____________0.5_____________1.0
- No Moderate Excellent
- (-) Something is wrong
136ITEM ANALYSIS
- Example
- 30 students in class
- 10 of Top 10 scorers got ? correct
- 2 of Bottom 10 scorers got ? correct
- D 10 correct - 2 correct 8 .8
(Good (10 10)/2 10
Discrimination) -
137?????? ???? ???? ?????
- ???? ????? ?? ?? ????? ????? ?? ????? ??? ????
??? ? ???? ????? ??????? ???? ?????? - ????????? ???? ???? ?????- ????????? ???? ????
???? - ????? ????? ?? ???? (???? ?? ?????)
- ???? ???? ???? ?
- 2-5
- 10
- 3
- 10
- 3/0
???? ???? ????? d
???? ???? ????? d
138????? ???? ???? ?????
- ?? ??? ???? ???? ?????? ????? ??? ???? ?? ?????
????? ? ?? ?????? ???? ?????? ???? ??? ???? ??
???? ???. - ?? ????? ???????? ??? ?? ????? ?????? ????? ??
????? ???? ?????? ????? ? ???? ???? ?????? ???.
139Things to Remember about D Index D
Index Interpretation Maximum D (100) all
students in upper group got item right and
none in lower group got it right Zero D
(0) equal numbers in both groups got item
right Negative D (-75) more students in lower
group than upper group got item
right Zero or Negative D discard or vastly
improve item before using again on a test
140D Index Rule of Thumb for Classroom Tests D
Index Interpretation
gt40 excellent discrimination 25 to
39 acceptable discrimination lt 25 poor
discrimination
141Summary of Standards of Acceptance Item
Difficulty (P) 30 - 90 Item Discrimination
(by D) 25 and above
142Difficulty Index
- 0,3 0,5 0,6
0,7 - ------/---------------(------------)----------/--
--------- -
recommended - ------------------------
------------------- -
acceptable - too difficult
too easy
143- Format Ideal
Difficulty - Five-response multiple-choice
70 - Four-response multiple-choice 74
- Three-response multiple-choice 77
- True-false (two-response multiplechoice) 85
-
144Discrimination Index
- 0.15 0.25 0.35
- ----------/----------/----------/---------
- - throw off to check good
excelent
145Be aware
- very easy or very difficult test items have
little discrimination - items of moderate difficulty
- (60 to 80 answering correctly)
- generally are more discriminating.
146Point-biserial correlation
- Used to correlate a dichotomous variable with a
continuous variable - In testing, used to correlate a persons
performance on an item (correct, incorrect) with
their total test score - Used as an index of item discrimination
- the point biserial ranges from 1.00 to 1.00
- The higher, the better. As a general rule, gt0.20
is desirable
147Point-biserial formula
IF for item
1 IF for item
Mean on the test for people who got item correct
Mean on the test for people who got item incorrect
Standard deviation for test
148What is the reliability of the exam
- Kuder- Richardson 20
- Kuder-Richardson 21
- Cronbach alpha
149What is the reliability of the exam
- Range 0-1
- Higher value indicates a strong relationship
between items and test - Lower value indicates a weaker relationship
between test item and test
150Guided Practice
Student Raw score Item 1 Item 2 Item 3 Item 4 Item 5
A 8 a b a d e
B 6 c b e c e
C 6 a c e c b
D 4 a b e a c
E 2 c a b d c
F 8 a b c c e
G 10 a b a c e
H 6 a b c d e
I 8 a c a c e
J 4 a c a d b
151Difficulty Factor
- Item 1 .8
- Item 2 .6
- Item 3 .4
- What does it mean?
- Item 1 .8 may be too easy
- Item 2 .6 good
- Item 3 .4 good
-
152What does it mean?
- Kuder 20
- Item 1 .88
- Item 2 .63
- Item 3 .40
- Item 4 .76
- Item 5 .89
- Item 3 may not relate as well
- Overall the test is reliable
153More Practice
Item Difficulty Discrimination Reliability
1 .28 .40 .80
2 .30 .68 .76
3 .80 .78 .70
4 .10 -1.00 .20
154 RURL 100 P
TDifficulty Index PRU ????? ????????? ???? ??
???? ???? ???????? RL ????? ????????? ????
????? (??? ????) ?? ???? ???? ????????.T ?????
?? ????????? ???? ???? (????) ? ???? ????? (???
????) ?? ?? ????? ???? ????????.?? ????
???? ?????? ???? ???? ????? ??????? ???.????
????? ???????????? ???? ?????? 50 ???????.????
????? ????? ?? ????? ??? ?????? ???? ?????? ????
??????.
?????? ???? ??????
155 ????? ????????? ???? ? ??????? ?? ?? ??????
???? ??????. RU-R
DI 2/1 T ????? ????? ?? 50 ???? ????
????????? ??????? ??? ?? ????????? ???? ?? ?????
???? ???? ???????????? ??? ????? ?????? ?? ??
?? ???? ???? ???? ????????.
?????? ???? ?????
156?????? ?????????? ????? ???? ???????
???? ?????? ????????? ???? ??????
100 ????? ?? 70 60 50
30 ????? ??? ???? ???? ???? ?????
(????- ?????) ???? ????? 2/1 ????? ??
350/ 0/25 150/ 0/0 ???? ??? ?????
??? ??? ???? ??? ???? (??
?????? ????)
157????? ?????? ??????? ?? ????????
??????Criterion Reference
158Criterion referenced tests
- Two groups of individuals
- U Upper group (above criterion)
- L Lower group
Upper group individuals who got the item right
Lower group individuals who got the item right
item difficulty index
item discrimination index
159Example
- A test of mastery of Istanbul geography. Outcome
is that 60 individuals are masters and 20
failed the test. - Item 3 45 masters and 10 who failed got the
item right. - What are the item difficulty and item
discrimination indices?
160 ??? ????? ??????? ????? ?? ???? ???? ??? ?? ??
?? ???? - ?? ??? ??? ?????? ????? ???? ??? ?????
?? ???? ????.- ???? ?????? ?? ??? ?????? ????
?????? ????- ??????? ????? ???? ? ?? ????? ????
??????? ???? ?? ????? ?? ??? ??? ????? (???
?????? ???? ????? ????)- ???? ????? ??????? ??
??? ??????? ?? Pretest, Post test ? ?????? ?????
???? ??????? ??????.
????? ?????? ??????? ?? ???????? ??????
Criterion Reference
1615 5 4 4 3 3 2 2 1 1 ????? ????
??? ? ??? ? ??? ? ??? ? ??? ? ??? Post test ?Pre test ??? ?????
- - - - - ?. ?
- - - - ?. ?
- - - - - ?. ?
- - - - - ?. ?
- - - - - ?. ?
- - - - - - ?. ?
Ra - Rb S T SSensitivity
Instructional Effects ????? ????? ?? ?? ??
????? ?? ???? ???? ???? ????????Ra
????? ????? ?? ??? ?? ????? ?? ???? ???? ????
????????Rb ?????????? ?? ?? ???? ?? ???
? ??? ?? ?? ????? ???? ????????T
162???? S ???? ?????? ????? ? ???????? ?????? ?????
?? ???.???????? ?? ?? ???? S ??? ? ?? ???? ??
???? ???? ???? ?? ???? ????? ????? ?????? ???.
163????? ???????? ?????? ? ???????
- ???? ??????? ????
2/4 - ???? ??????
- ????? ???? ????? ????
1-6 - ????? ??? ????? ??????? ??????? ???? ? ?????
???? ???? 8/2-3/5 - ???? ????
- ????? ???? ?????
???? 1-6 -
164????? ????? ??? ???????
- ?? ????? ??????? ???? ????? ?? ??? ?? ???? ????
?? ?? ??? ??? ???. - ????? ??????? ???? ????? ???? ?? ??? ?? ????? ???
?? ??? ??? ???.
165Thank you for your Time
- Any Questions or Comments?
166Two issues in using instruments...
1. Validity the degree to which the instrument
measures what it purports to measure
2. Reliability the degree to which the
instrument consistently measures what it purports
to measure
167Types of reliability...
1. Stability
2. Equivalence
3. Internal consistency
1681. Stability (test-retest) the degree to which
two scores on the same instrument are consistent
over time
1692. Equivalence (equivalent forms) the degree
to which identical instruments (except for the
actual items included) yield identical scores
1703. Internal consistency (split-half reliability
with Spearman-Brown correction formula ,
Kuder-Richardson and Cronbacks Alpha
reliabilities, scorer/rater reliability) the
degree to which one instrument yields consistent
results
171RELIABILITY
- TEST-RETEST
- (COEFFICIENT OF STABILITY)
- PARALLEL FORM
- (COEFFICIENT OF EQUIVALLENCE)
- INTERNAL CONSISTENCY
172INTERNAL CONSISTENCY
- SPLITHALF METHOD
- SPEARMAN BROWN PROPHECY FORMULA
- KRUDER-RICHARDSON METHOD
- COEFFICIENT ALPHA
173KR20
- KR20 K / (K-1) x (S2x - ?pq) / S2x
- K of trials or items
- S2x variance of scores
- p percentage answering item right
- q percentage answering item wrong
- ?pq sum of pq products for all k items
174KR20 Example
Item p q 1 .50 .50 2 .25 .75 3 .80 .20 4
.90 .10 If Mean 2.45 and SD 1.2, what is
KR20?
pq .25 .1875 .16 .09 ?pq 0.6875 KR20
(4/3) x (1.44 0.6875)/1.44 KR20 .70
175KR21
- If assume all test items are equally difficult,
KR20 can be simplified to KR21 - KR21 (K x S2)-(Mean x (K - Mean)
- (K-1) x S2
- K of trials or items
- S2 variance of test
- Mean mean of test
176RELIABILITY OF ORAL TESTS
- ???? ???? ?? ????? ?????? ?????
- ???? ?????? ???? ???? ?? ???? ??? ?? ????? ?????
- ??????? INTERRATER RELIABILITY CO. .6
- ??? ??? ???? ? ??? ???? ????
- ??? ?????? ? ??????? ???? ???? ???? ????? ?????
??
177- RELIABILITY OF
- CRITERION REFERENCED
- LINDMAN AND MERENDA
178Rule of Thumb for Acceptable Reliability
Coefficients for Classroom Tests
Reliability Coefficient Interpretation .70 or
higher acceptable reliability
179???????? ??? ???????
- Types of Validity
- Face
- Content
- Predictive
- Concurrent
- Construct
- Item validity
- Sampling validity
Determined by expert judgment
Blueprinting
180Types of validity...
1. Content validity
2. Criterion-related validity
3. Construct validity
1811. Content validity the degree to which an
instrument measures an intended content area
1823. Construct validity a series of studies
validate that the instrument really measures what
it purports to measure
183forms of content validity
sampling validity does the instrument reflect
the total content area?
item validity are the items included on the
instrument relevant to the measurement of the
intended content area?
1842. Criterion-related validity an individual
takes two forms of an instrument which are then
correlated to discriminate between those
individuals who possess a certain characteristic
from those who do not
185forms of criterion-related validity
concurrent validity the degree to which scores
on one test correlate to scores on another test
when both tests are administered in the same time
frame
predictive validity the degree to which a test
can predict how well individual will do in a
future situation
186Types of Validity
- 1. Content Validity
- Face Validity
- Sampling Validity (content validity)
- 2. Empirical Validity
- Concurrent Validity
- Predictive Validity
- 3. Construct Validity
187(No Transcript)
188Item discrimination
- How well does the item separate those that know
the material from those that do not. - In LXR, measured by the Point-Biserial (rpb)
correlation (ranges from -1 to 1). - rbp is the correlation between item and exam
performance
189Item discrimination
- rpb means that those scoring higher on the exam
were more likely to answer the item correctly.
(better discrimination) - - rpb means that high scorers on the exam
answered the item wrong more frequently than low
scorers. (poor discrimination) - A desirable rpb correlation is 0.20 or higher.
190Evaluation of Distractors
- Distractors are designed to fool those that do
not know the material. Those that do not know
the answer, guess among the choices. - Distractors should be equally popular.
- ( expected answered item wrong / of
distractors) - Distractors ideally have a low or -rpb
191LXR Example 1( correct answer)
A B C D E
N 86 0 0 1 0
99 0 0 1 0
Avg Correct on Exam 85.3 0 0 82.0 0
rpb .06 ---- --- -.06 ---
Very easy item, would probably review the
alternates to make sure they are not ambiguous
and/or provide clues that they are wrong.
192LXR Example 2( correct answer)
A B C D E
N 0 21 65 2 0
0 24 74 2 0
Avg Correct on Exam 0 80.7 87.2 78.7 0
rpb --- -.33 .36 -.13 ---
Three of the alternatives are not functioning
well, would review them.
193LXR Example 3( correct answer)
A B C D E
N 3 1 15 5 66
3 1 17 6 76
Avg Correct on Exam 83.0 80.0 83.4 82.2 86.8
rpb -.07 -.09 -.15 -.12 .23
Probably a miskeyed item. The correct answer is
likely option E.
194LXR Example 4( correct answer)
A B C D E
N 11 43 3 22 8
13 49 3 25 9
Avg Correct on Exam 81.5 87.4 82.3 84.5 82.4
rpb -.24 .35 -.09 -.08 -.15
Relatively hard item with good discrimination.
Would review alternatives C D to see why they
attract a relatively low high number of
students.
195LXR Example 5( correct answer)
A B C D E
N 3 60 1 5 18
3 69 1 6 21
Avg Correct on Exam 83.0 85.3 80.0 82.2 86.8
rpb -.07 .002 -.09 -.12 .13
Poor discrimination for correct choice B.
Choice E actually does a better job
discriminating. Would review item for proper
keying, ambiguous wording, proper wording of
alternatives, etc. This item needs revision.
196(No Transcript)