CSA Branding 101 October 2004 Presented By Giving Tree Group - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

CSA Branding 101 October 2004 Presented By Giving Tree Group

Description:

Time savings: database (thesaurus)-specific ... Consider that howlers will be out-of-context thesaurus terms ... thesaurus maintenance has improved; quality of ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 19
Provided by: nfa2
Category:

less

Transcript and Presenter's Notes

Title: CSA Branding 101 October 2004 Presented By Giving Tree Group


1
Automated Indexing Implementation Managing
Expectations by Craig Emerson, CSA
  • The CSA landscape
  • 6 offices all use Cuadras STAR production
    software
  • 59 AI Databases in Social Sciences, Natural
    Science, Technology, Arts Humanities
  • 700,000 abstracts indexed each year
  • 100 editorial staff (editors and editorial
    assistants)
  • 18 thesauri

2
  • Focus on CSAs Head office in Bethesda
  • 300k records/yr 6 thesauri natural science
  • journals, conferences, reports, books
  • MAI program must assign specific vocabulary
    rather than general concepts that may or may not
    match our existing thesauri.
  • Rule-based systems
  • Text-analysis (Lexical analysis)
  • Marching orders Cost reduction rather than
    increased productivity

3
Objective Use Machine-Assisted Indexing to
increase Production Efficiency in Bethesda Office
  • What does Production Efficiency really mean?
  • if it just reflects cost savings, wed outsource
    to India
  • Focus must include
  • increasing the quality and utility of existing
    products (e.g. consistency of indexing, thesaurus
    maintenance, etc.)
  • identification and development of new AI products

4
  • But pragmatism dictates a short-term goal of
    more-for-less
  • Setting my Opportunity Costs
  • target freelancer costs US200k yr-1
  • MAI packages are 40,000-250,000 (often per
    server)
  • annual maintenance costs between 5-20,000
  • IT support and maintenance ½ head for 6
    thesauri and 3 major authority files
  • whatever we invested, we wanted to make it back
    within 2-3 years
  • reduce freelancer costs by about 35,000 each year

5
  • Savings are not uniform over 3 years
  • You may actually need to spend more in the first
    year (time taken to rule-build)
  • Rather than 17.5 cost reduction per year, you
    may be faced with 35 cost reduction in year 2

6
35 isnt daunting given MAI propaganda. Buy
now! Productivity will increase x-fold!
  • Such claims are attainable but problematic to
    validate
  • Time savings database (thesaurus)-specific
  • Rule building eats some of the time savings (1rst
    year)
  • Even with high accuracy, indexing still requires
    checking
  • Give editors more time and theyll spend it
    polishing abstracts

7
  • Set Productivity Goals Regardless of Theoretical
    Constraints
  • largest hurdle is staff management, not MAI
    technology
  • initially, staff are worried database quality
    will suffer, and their jobs may disappear
  • editors realize extra time allows them to do
    additional indexing, or other editorial jobs
    (mission creep)
  • Whether you want time savings to translate into
    bigger and better databases, or a cost reduction,
    you must set your unambiguous expectations in
    stone by the end of the year, your indexing
    quota will double, and by the end of the
    following year, triple.

8
thesaurus structure rule-builders ability IT
support document type quality of source
text howler acceptance factor
Productivityest f
  • Are thesaurus terms close to natural language?
    If so, 70-80 accuracy within a year.

9
  • Rule-builders Ability
  • Management Constraints
  • 1 editor per rule base (thesaurus, authority
    file)
  • 20-50 of editors time for rule-building
  • Editors Limits
  • Several databases have a single editor the de
    facto rule builder. If a logic maven, 70-80
    accuracy within a year.

10
  • IT Support
  • Fulltime IT person for 1-2 months (bases)
  • ½ time person for maintenance through 12 months
  • ¼ time person beyond 12 months
  • Software constraints
  • likely to push software beyond limits
  • unique requirements probably not provided by
    software
  • identification of software limitations unknown to
    provider, ß version or not

11
Importance of document type
We therefore conclude that narrow (less than
50 kilometres wide) compositional streaks, as
well as the larger-scale bilateral zonation, are
vertically continuous over tens to hundreds of
kilometres within the plume.
Melvin Anthony stormed to third with a posing
routine that is setting new standards and may
force his contemporaries to re-think their
smooch-strut-and-jiggle approach.
12
Importance of text choice for MAI
  • Article Title (Non-English)
  • Abstract (Non-English)
  • Author keywords
  • Source Title
  • Conference Title
  • Special notes

Note Fulltext isnt great for rule-based
systems. It is better for concept-oriented MAI
where thesaurus matching isnt as important
13
  • The Howler Factor
  • Web translation tools are grist for the humour
    mill
  • Do you need to eyeball every index term?
  • Important for long-term estimates (1 year away)
  • If yes, then you budget for at least 30-60
    seconds per record
  • If not, then the critical statistic is the
    percentage of records youre willing to be
    released not vetted
  • Consider that howlers will be out-of-context
    thesaurus terms

14
(No Transcript)
15
  • The remaining issue is validation of results
  • Youll achieve your efficiency goal editors will
    double the output, but at what quality cost?
  • Design some simple tests
  • Editors should re-index material theyve already
    indexed, again in a blind study
  • Randomly choose 50 records indexed manually, and
    50 indexed with the help of MAI. Rank the 100
    records with respect to indexing in a single
    blind study. Any pattern?
  • As above, but with MAI-only

16
  • So what happened with the Bethesda
    implementation?
  • started the process 10 months ago
  • MAI accuracy 25-75 (x63)
  • increase in indexing rate 20-300 (x50)
  • no change in quality in records with MAI Manual
  • Significantly lower quality in MAI-only indexing
  • not as a result of howlers
  • caused by addition of more general terminology

17
  • Unanticipated problems
  • software much more limited than anticipated
  • Craig The requirements for java changed with
    the new software, but they didn't tell us about
    it. How can we go about installing v.1.4.2_06 of
    the java JRE in Apollo? Would it break anything
    else that relies on Java? Francis
  • internal IT support much less than anticipated
  • Unanticipated benefits
  • thesaurus maintenance has improved quality of
    indexing has increased (editors focus on
    difficult terminology)
  • editors have become invigorated they live for
    arguments and MAI rule-building provides a feast
  • new projects have developed Reindexing
    backfiles custom index for cluster databases
    ability to index document types not previously
    considered because of volume

18
  • In Summary
  • Be clear on goal increased production or cost
    savings
  • Lock-in IT support
  • Youll be able to integrate immediately
  • Assume youll need 1 year to realize a
    significant benefit and 2 years to optimize the
    benefit
  • Conduct a pilot study to estimate productivity
    gains otherwise assume a 2-3x increase in
    indexing rate (higher if thesaurus matching isnt
    an issue)
  • Determine efficiencies ongoing and track editors
    time as never before
Write a Comment
User Comments (0)
About PowerShow.com