Computer-Assisted Coding of Text CASCOT - PowerPoint PPT Presentation

1 / 79
About This Presentation
Title:

Computer-Assisted Coding of Text CASCOT

Description:

ComputerAssisted Coding of Text CASCOT – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 80
Provided by: www2War
Category:

less

Transcript and Presenter's Notes

Title: Computer-Assisted Coding of Text CASCOT


1
Computer-Assisted Coding of TextCASCOT
  • Software demonstration
  • Rob Jones and Peter Elias

2
Structure of presentation
  • Background manual text coding
  • Development of software history, aims
  • CASCOT demonstration

3
Coding text to a classification
  • Coding is the process of categorising the range
    of all possible answers to a pre defined set of
    categories.
  • The full set of categories is termed a
    classification. Examples are
  • SOC 2000 (Standard Occupational Classification
    2000)
  • SIC 92 (Standard Industrial Classification 1992)
  • Three parts to a classification the structure,
    the index and the classification rules

4
Text responses in surveys
  • Q What is your job title ?
  • Q Briefly describe your duties.
  • Q What does the organisation you work for
    mainly make or do?

5
Manual coding procedures
  • Manual methods
  • code books
  • temporary labour
  • query resolution systems.
  • No standardised approach, major variations
    between institutions, companies, etc. in quality
    of coding.
  • Time-consuming, expensive.

6
Development of software
  • CASOC Pascal/C text coding software for DOS
    1993 2001.
  • CASCOT Java text coding software for any
    operating system.
  • CASOC was ad hoc development, funded from sales
    revenue.
  • CASCOT funded by ESRC.

7
Occupational coding in practice
  • Quality of coding reflects quality of text
    available for coding.
  • Need rules which specify how to deal problems
    such as ambiguous job titles (e.g. engineer,
    teacher).
  • Need to be aware that machine coding of text can
    introduce bias.
  • Need to establish trade off between accuracy
    and cost.

8
Cascot
  • Cascot will provide
  • A list of recommendations.
  • Code, title, best matching index entry, and
    certainty score
  • Certainty Score
  • Approximates the probability that the recommended
    code is correct.
  • This is represented by a number in the range
    0-100.
  • People never 100 right. Computer cant be 100
    right.

9
(No Transcript)
10
Text Input Area
11
civil engineer
Type job title
Press enter, or click Code button
12
Recommendations Table
13
Code
Score
Group Title
Index Entry
Recommendations Table
14
Classification Structure
15
Index Entries
16
Output
17
Best recommendation selected automatically.
Select another by clicking a different line.
18
Structure Index entry list will change.
19
And output has changed.
20
Change selection via structure
21
Index entry list will change.
22
And output has changed.
23
Reading from a file
  • Instead of typing every job title in, we can read
    job titles from a file.
  • File must be in an acceptable format.

24
Reading from a file.
  • Simplest file - each line is a job title.
  • But how do we know which job title is for which
    person? (solution use a delimited file)

25
Reading from a file.
  • Tab delimited file.
  • Each line Person ID TAB Job Title

26
Reading from a file.
  • Comma delimited file.
  • Each line Person ID Comma Job Title

27
Recording codes from Cascot
  • Rather than having to copy the code produced by
    Cascot we can have Cascot record the codes to a
    file.
  • Open an Output File.
  • One line written for each piece of text coded.

28
Output Items
  • After coding we have the following facts
  • The text that was coded.
  • The code it was given.
  • The title for that code
  • The best matching index entry within that code.
  • The score Cascot assigned the match.
  • Each of these facts is a Output Item
  • We can choose which we wish to output (on the
    screen or to a file).
  • Can also output items from the input file.

29
Example Using Files
Input file (tab delimited).
30
Example Using Files.
31
Step 1 Open Input File.
32
Select file, click open.
33
Confirm / Select File Format.
34
Choose selection options.
35
Click ok.
36
Input File Details
First job title coded
37
Step 2 Choose Output Items.
38
Step 2 Choose Output Items.
Click Edit.
39
(No Transcript)
40
Available Items
41
Current Items
42
Current Output
43
To add score click Add
44
Then, click OK
45
Step 3 Open Output File.
46
(No Transcript)
47
Select file, or type in name for new file.
Click Save
MyOutputFile.txt
48
You will be asked if you wish to make the first
output row be column titles.
49
Output File Details
50
Select the preferred recommendation.
Or navigate to the correct code.
Once you are happy with the code - click 'Accept'
51
The next job title appears. (Automatically
read from file after Accept)
52
Select the preferred recommendation.
Or navigate to the correct code.
Once you are happy with the code - click 'Accept'
53
If you dont know the code, or wish to defer
coding to a more expert coder. Click No
Conclusion.
54
Output set to zeros.
55
The no conclusion output is not final until you
click Accept.
56
Example Using Files
Input file (tab delimited).
57
Example Using Files
Output file (Output items Input Record, Code,
Title, Score)
58
A fully automated run.
  • Rather than clicking Accept to agree to the
    best recommendation every time we can automate
    the process.
  • But how good is this ?
  • Example follows
  • Random sample of real data
  • 1200 unique job titles
  • Coded automatically, sorted by score.

59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
Skipping some pages ..
63
(No Transcript)
64
Skipping some more pages ..
65
(No Transcript)
66
Skipping some more pages ..
67
(No Transcript)
68
Skipping some more pages ..
69
(No Transcript)
70
Skipping some more pages ..
71
(No Transcript)
72
Semi automatic coding.
  • Job titles with high certainty scores right.
  • Humans agree with Cascot for high scores.
  • Job titles with low certainty scores wrong
  • We need human intelligence to decide the correct
    code when we have low certainty.
  • Automatically agree to high scores but have human
    decision for low scores.
  • What score threshold ?
  • Small study by IER, University of Warwick shows
    manual coders happy with 70-75 (some with 60).
  • Balance between time ( money) vs. quality
  • Best practice sort input file alphabetically by
    job title.

73
Automated Assisted Modes
  • Requires input and output files.
  • Threshold level certainty score.
  • Assisted mode
  • score below threshold user prompted
  • Fully Automatic mode
  • score below threshold no code/zeros written
  • Set Automation using Options Automation from
    the menu bar.

74
Using additional information to aid coding.
  • Ambiguous job title.
  • Coding manually look at other questions
  • E.g. Q Briefly describe your duties
  • Do the same with Cascot.
  • But The data must be present in the input file.
  • Best if The input file is a delimited file.

75
Teacher is ambiguous.
Click View Record Button
76
This Information can be used to determine that we
want Secondary Teacher
77
Click X to close.
78
Now select Secondary Teachers
79
And Accept to move on.
Write a Comment
User Comments (0)
About PowerShow.com