Multilingual and cross-lingual news topic tracking - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Multilingual and cross-lingual news topic tracking

Description:

Automatic Eurovoc Indexing: Results and Evaluations Bruno Pouliquen Lang Tech group, JRC, European Commission Ispra-Italy http://www.jrc.cec.eu.int/langtech – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 23
Provided by: IPSC7
Category:

less

Transcript and Presenter's Notes

Title: Multilingual and cross-lingual news topic tracking


1
Automatic Eurovoc Indexing Results and
Evaluations Bruno Pouliquen Lang Tech group,
JRC, European Commission Ispra-Italy http//www.jr
c.cec.eu.int/langtech Addressing the Language
Barrier Problem in the Enlarged EU Automating
Eurovoc Descriptor Assignment
2
Contents
  • Viewing the results
  • Browser
  • Exports
  • Validation interface
  • Evaluation method
  • Test set
  • Precision/Recall
  • Evaluation interface
  • Results

3
Browser
  • For a given text
  • The original text
  • The pre-processed text
  • Keywords in the text (associates)
  • Eurovoc descriptors manually assigned
  • Eurovoc descriptors assigned automatically
  • With context
  • Access to parallel texts

4
Browser example
COMMISSION DECISION of 8 September 1997 on the
temporary suspension of imports of pistachios and
certain products derived from pistachios
originating in or consigned from Iran (Text with
EEA relevance) (97/613/EC) THE COMMISSION OF THE
EUROPEAN COMMUNITIES,Having regard to the Treaty
establishing the European Community,Having
regard to Council Directive 93/43/EEC of 14 June
1993 on the hygiene of foodstuffs (1), and in
particular Article 10 thereof,Whereas pistachios
originating in or consigned from Iran are in many
cases contaminated with excessive levels of
Aflatoxin B1Whereas the Scientific Committee
for Food has noted that Aflatoxin B1, even at
extremely low doses, causes cancer of the liver
and in addition it is genotoxicWhereas this
constitutes a serious threat to public health
within the Community and it is imperative to
adopt urgently protective measures at Community
levelWhereas, in the absence, at this time, of
sanitary guarantees from the Iranian authorities,
it is necessary to suspend imports of pistachios
and certain products derived from pistachios
originating in or consigned from Iran
5
Pre-processed text
6
Keywords (associates) in text
Keywords occurring in the text
Histogram
7
Eurovoc descriptors assigned
Descriptors assigned manually
Descriptors assigned automatically
8
Browser online demo
  • Resolution on human rights in Ethiopia
  • Commission Decision (import of meat products)

9
Export of the results
  • XML file containing the assignment

ltassignmentgt ltdescriptor ID"1006020102000000"
COSINE"0.20" OKAPI"8.83"gt PRESIDENCY OF THE
EC COUNCILlt/descriptorgt ltdescriptor
ID"1016030000000000" COSINE"0.17"
OKAPI"9.08"gt EUROPEAN UNIONlt/descriptorgt ltdescr
iptor ID"1006040100000000" COSINE"0.15"
OKAPI"9.63"gt PRESIDENTlt/descriptorgt ltdescriptor
ID"2826020000000000" COSINE"0.14"
OKAPI"7.82"gt SOCIAL POLICYlt/descriptorgt ltdescri
ptor ID"1011020102000000" COSINE"0.14"
OKAPI"8.22"gt PRINCIPLE OF SUBSIDIARITYlt/descrip
torgt ... lt/assignmentgt
10
Validation interface overview
This text was previously indexed with this
descriptor
Financial Instrument for Fisheries Guidance is
in the text
fish, fisherman, fishery,
conservationand fishery_resources are in the
text
11
Validation interface example of good assignment
12
Validation interface example of bad assignment
on a small text
Strangely, Austria was manually assigned
UN convention was manually assigned
13
Validation interface other example of bad
assignment...
14
Evaluation method
  • A test set is built
  • Not used for training
  • Should be representative
  • After training, we compare automatically the
    manually assigned descriptors to the
    automatically assigned ones
  • Depending on the rank (number of descriptors)
  • Depending on the various parameters and formulae
  • Use precision/recall

15
Evaluation results
  • Use precision/recall (here English)

Rank Precision Recall Prec RT Rec RT F1-measure
1 76.418 14.560 82.890 15.886 24.45
2 68.617 25.800 76.463 28.051 37.49
3 61.820 34.531 71.011 37.823 44.31
4 57.114 42.279 67.509 45.850 48.58
5 51.489 47.496 63.209 51.279 49.41
6 47.015 51.843 59.427 55.776 49.31
7 43.364 55.687 56.130 59.453 48.75
8 40.027 58.660 53.147 62.378 47.58
9 36.879 60.782 50.325 64.738 45.90
F1-Measure combines precision and recall
(Harmonic average)
16
Evaluation interface
Graph showing precision/recall/F-measure
depending on the number of descriptors
17
Results across languages
With pre-processing (Frenchgt only stop words)
Without pre-processing
18
Validation expert judgment
  • Expert judgment on automatic assignment
  • G for good descriptor
  • NT for good, but it would have been better to
    use a NT instead
  • BT for good, but its BT would have been better
  • ? for unknown /not possible to make a
    judgement in the time available
  • B for clearly bad
  • S for semantically related, but wrong.

19
Manual Evaluation of the Assignment
20
Manual Evaluation
Manual Evaluation - Overview
Manual Evaluation of manual assignment
21
Manual Evaluation of Automatic Assignment
  • Correct descriptors compared to benchmark of
    manual assignment
  • English 65 / 78 83
  • Spanish 69 / 87 80

22
Online demo
  • Validation list of texts to be validated
  • Some examples
  • In French
  • Already validated in Spanish
  • With the manual assignment
Write a Comment
User Comments (0)
About PowerShow.com