Factiva Intelligent Indexing - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Factiva Intelligent Indexing

Description:

450 news subjects. 370 regions. 22 languages. FII Structure. One universal taxonomy ... Three million articles coded a month. All receive a level of autocoding ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 20
Provided by: CaponG
Category:

less

Transcript and Presenter's Notes

Title: Factiva Intelligent Indexing


1
Factiva Intelligent Indexing
  • SLA 2004

2
Agenda
  • Factiva Intelligent Indexing
  • Application of Factiva Intelligent Indexing
  • Pros and Cons
  • Quality Control

3
Factiva Intelligent Indexing
  • Factiva Taxonomy
  • 320,000 companies
  • 760 industries
  • 450 news subjects
  • 370 regions
  • 22 languages

4
FII Structure
  • One universal taxonomy
  • Building blocks
  • Inclusive hierarchy
  • Polyarchy
  • Synonyms and alias names
  • Full descriptions
  • Variable depth and breadth

5
Polyarchy
  • Internet/Online services
  • E-commerce
  • Internet browsers
  • Internet portals
  • Internet search engines
  • Internet service providers
  • etc.
  • Computers
  • Computer hardware
  • Computer services
  • Computer stores
  • Networking
  • Semiconductors
  • Software
  • Applications software
  • GroupWare
  • Intelligent agents
  • Internet browsers
  • etc.

6
Factiva Intelligent Indexing
Codes
Company Codes
Industry Codes
Subject Codes
Region Codes
7
FII Application
  • Code mapping
  • Entity extraction
  • Rule-based system
  • Linguistic analysis software
  • Manual review

8
Code Mapping
  • Most information providers provide some form of
    metadata. This is matched to relevant Factiva
    indexing terms.
  • Advantages
  • Easy and quick
  • Efficient use of existing data
  • Disadvantages
  • Mismatches between coding schemes
  • Different interpretations of same concepts
  • Variable quality which sources do you trust?

9
Entity extraction
  • This tool finds company names which are then
    compared to our controlled vocabulary.
  • Advantages
  • Consistent
  • Precise
  • Disadvantages
  • Ambiguous names
  • High maintenance costs

10
Symbology Snapshot
11
Rule-based system
  • Sets of IF-THEN statements established by
    editors, information architects, or
    subject-matter experts.
  • Advantages
  • Good at highly formulaic content
  • Precise
  • Disadvantages
  • Need thousands of rules for a complete system
  • Maintenance of the rules themselves becomes VERY
    expensive!
  • Only captures explicit concepts

12
Example
13
Linguistics-based categorization
  • This tool is currently employed across all
    English, French, German and Spanish language
    publications. A combination of linguistic
    analysis and statistical algorithms allows new
    content to be compared to example data and coded
    appropriately.
  • Advantages
  • Scales to millions of documents, thousands of
    categories, multiple languages
  • Copes well with change
  • Fits editorial workflow
  • Good fine-tuning tools editorial control
  • Codes implicit as well as explicit concepts
  • Disadvantages
  • Training time and cost

14
Editorial Control
  • Set relevance levels
  • Maintain training set
  • Stop words - correlation and multiple meanings
  • "Chechnya" to the industries model, as it was
    triggering the freelance journalist code (because
    so many of them were dying there)

15
Manual coding
  • About 200 editors spread across main time zones
  • Advantages
  • Humans easily grasp the gist of the story
  • Cope well with exceptions
  • Visible/Controllable
  • Disadvantages
  • Very resource-intensive Expensive
  • Slow
  • Inconsistent (subjective and temporal)
  • Not scalable

16
Review process
  • Lists reviewed every three months, redefinition,
    new codes, expansion changes
  • Market research/customer feedback and behavior
  • Changes to parent schemes/standards
  • Editorial/Quality control feedback
  • Internal coding forum
  • 45-day notice period

17
Quality control
  • Sampling by editors
  • Scoring for precision and recall
  • Analysis by source, language, code, editor etc.
  • Feedback to editors and systems
  • Corrective action

18
Results
  • Three million articles coded a month
  • All receive a level of autocoding
  • Seventy-nine percent automation or more than two
    million are auto-coded with no further manual
    review

19
Recap
  • Factivas taxonomy is Factiva Intelligent
    Indexing
  • Factiva uses a hybrid methodology for application
  • Factiva has a coding team for governance and
    maintenance
  • End result Factiva Intelligent Indexing
    leverages our editorial strengths, combining
    human experience and expertise with the latest
    automation software to implement a completely
    flexible and granular indexing system across all
    of our content.
Write a Comment
User Comments (0)
About PowerShow.com