Automatic Eurovoc indexing an Experiment in the Czech Parliament - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Automatic Eurovoc indexing an Experiment in the Czech Parliament

Description:

indexing - descriptors from complementary thesaurus - searching in Czech language only ... STOP-WORDS LIST (negative dictionary) LEMMATIZER - set of rules for ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 10
Provided by: kance4
Category:

less

Transcript and Presenter's Notes

Title: Automatic Eurovoc indexing an Experiment in the Czech Parliament


1
Automatic Eurovoc indexing an Experiment in the
Czech Parliament
  • Anna Lhotská, Václav Sklenár
  • Office of the Chamber of Deputies, Parliament of
    the Czech Republic

2
History of the Czech version
  • 1993 - preliminary translation of the second
    version into Czech
  • 1995 - Czech version - edition 3.0
  • 07/2003 - Czech version - edition 4.0.
  • 07/2004 - Czech version edition 4.1
  • The Czech Eurovoc is fully compatible with other
    official language versions

3
Application of Eurovoc in the Information System
of Parliament
  • library database - aRL
  • database of petitions - Lotus Notes
  • intellectual indexing of parliamentary documents
  • multilingual searching in parliamentary
    documentation

4
Manual indexing/1
  • all parliamentary documents that are publicly
    accessible in full text in an electronic form via
    Internet are indexed intellectually with Eurovoc
    terms. At present it represents 3.500 documents
  • retrospective indexing of older materials
    continues, great number of older documents still
    remains not indexed

5
Manual indexing/2
  • document types - bills, budgets, agreements,
    agendas, parliamentary questions
  • classification - 127 Eurovoc Microthesaury
  • indexing - descriptors from Eurovoc -
    multilingual searching
  • indexing - descriptors from complementary
    thesaurus - searching in Czech language only

6
Automatic indexing tool/1
  • SELECTION of terms from document text
  • STOP-WORDS LIST (negative dictionary)
  • LEMMATIZER - set of rules for grammatical
    alterations
  • COMPARISION of a set of basic-form words from a
    text with Eurovoc terms

7
Automatic indexing tool/2
  • MULTIWORD EXPRESSIONS recognition
  • NON-DESCRIPTOR/DESCRIPTOR transformation
  • ABSOLUTE FREQUENCY WEIGHTING

8
Limitations of the automatic indexing tool
  • insufficient term weighting system
  • term location within the text (non-structured
    texts)
  • insufficient recognition of multiword expressions
  • lack of automatic non-descriptor/descriptor
    proposals.

9
Thank you for your attentionAnna Lhotská,
lhotska_at_psp.czVáclav Sklenár, sklenar_at_psp.cz
Write a Comment
User Comments (0)
About PowerShow.com