Title: Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System
1Multilingual Cataloguing of Product Information
of Specific Domains Case Mkbeem System
- Aarno Lehtola, Jarno Tenni and Tuula Käpylä
- VTT Information Technology
- Contents Motivation
- Mkbeem in a nutshell
- Multilingual Cataloguing Tool
- Meaning extraction
- Experiences of test users
- Future
- DEMO
2Online Language Challenges for eCommerce
Native English speakers comprise less than 9
of the world population.
"If I'm selling to you, I speak your language. If
I'm buying, dann müssen Sie Deutsch sprechen".
(Willy Brandt)
Ref Global Reach http//www.glreach.com/
3An Answer MKBEEM and Multilingual eCommerce
Mediation
MKBEEM Mediation System
Monolingual CP/SP
Multilingual cataloguing write once, publish many
Customer language information retrieval trading
- Language adaptation via automatic HL translation
and interpretation - Natural dialogues combining HL navigation
- Harmonised ontologies enabling localised views to
products and trading contracts
CP/SP User
Customer
CP/SP eCom Service
Transactions with contract adaptation
- Generic solutions proved by trials in Finnish,
French and English in the domains of travel and
mail-order sales - More information www.mkbeem.com
- EC FP 5 IST/HLT project in 2000-2002, budget 4,9
M - Goal Develop intelligent knowledge-based key
components (HLP KRR) for applications in
multilingual eCommerce
4Generic Architecture of Mkbeem
Customer
Content/Ser
vice Provider
CP
Interface
User
Interface
CP E
-
Commerce
platform
CP
CP
User
Human Language Processing Server
CP
Agent
Agent
Agent
Agent
CP
Information
System
Trading Ontology Server
Domain Ontology Server
MKBEEM
Manager
System
Agent
Manager
Manager
Interface
Rational Agent
5Mkbeem Bridging Languages via Language Neutral
Ontologies
Extracting Product Properties
Meaning extraction Machine translation Dialogue
processing ...
User Information Request Proc.
"Toppatakki. Muhkea malli, olkapäissä vahvikkeet.
Painonapeilla kiinnitetty huppu, jossa joustava
nyöri. Vetoketjun alla suojalista. Kaksi
kannellista taskua...
A brown jacket made of natural material
Ontological Formula in CARIN (c_colour)(X), (r_na
me)(X,brown), (c_product)(Y), (r_name)(Y,jacket),
(c_material)(Z), (r_name)(Z,nat_mat).
14 products found 1. Beige winterjacket of
wool 2. Ochre quilted jacket of cotton ... Any
further requirements?
"Toppatakki. Muhkea malli..." "Quilted jacket.
Puffy model with reinforcements on the
shoulder..." jacket(X,quilted_jacket),
model(X,puffy), part(X,Y,sleeves),
property(Y,Z,reinforcement)...
Multilingual Product Data
One with a hood
Product Model
Material Ontology
Colour Ontology
6Mkbeem Multilingual Cataloguing Tool
- Starting point
- The new product belongs to the supported product
domains - Available a textual product description in one of
the supported languages and a photograph - Basic functionalities
- Text checking
- Property extraction
- Product Categorisation
- Machine Translation
- NL Query Processing
- Technical key challenge
- Formalising relationship of ontologies and HL
and - Extracting meaning of input HL texts with respect
to provided ontologies into the form of
Ontological Formulas
7Meaning Extraction Example in Clothing Domain
- Long skirt with cargo pockets
- Jupe longue avec des poches battle-dress
- Pitkä hame, jossa reisitaskut
-
- (c_MKBEEM81007clothingProduct)(H6641),
- (r_name)(H6641,H6989),
- (c_MKBEEM83383property)(H6552),
- (r_name)(H6552,H6889),
- (c_MKBEEM81011part)(H6730),
- (r_name)(H6730,H7295),
- (l_dependency)(H6989,adjAttr,H6889),
- (l_dependency)(H6989,prepAttr,H7295),
- (l_constituent)(H6889,0,long,en,long,adj,n
om,sg,property), - (l_constituent)(H6989,1,skirt,en,skirt,nou
n,nom,sg,product), - (l_constituent)(H7295,4,cargopockets,
- en,cargopocket,noun,nom,pl,prodpart)
Concept Bindings
Linguistic Dependencies Lexical info
8VTTs implementation of HLP Services in Mkbeem
checkText
Functions
extractMeaning
translateText
Ontologic Formula
OK or correction
Translated string
HL string
HL string
HL string
Linguistic Services
Meaning Extractor
Unifier Text Correction S/W
Webtran MT System
Webtran Dependency Parser
Verification
Concept Bindings
Linguistic Ontology
Cone Onto S/W Inference
KBs
ALEs for MT (965 btw Finnish, French English)
9Augmented Lexical Entries
- Augmented Lexical Entries (ALE) rules (see MT
Summit 99) - Bilingual or multilingual non-directed entries
representing phrase and sentence structures and
possibly their translation relations. - Both surface form entries and generalised rules
- Possible to declare multidirectional entries
- Declarative and intuitive formalism - to be used
by translators - Uniform way of representing phenomena on
different levels of language - Designed to be suitable for automated or machine
supported language modelling (see SMC 99 paper on
learning translation grammars) - Can be viewed as a forest of partial dependency
parse trees - Near relationship obtainable to the corresponding
conceptual structures (concept bindings to
ontologies) - Lexicon
- All the allowed words
- Monolingual and bilingual entries
10Meaning Extraction A Product Ontology with ALEs
Embedded
11Syntax of ALEs
- augmented_lexical_entry entry_name
pattern.. opt_message opt_repair - entry_name name . number_index
- name hierarchical_name_w_dots_betw_parts
- pattern opt_language_id constituent_def..
- opt_message e message
string_w_opt_binding - opt_repair e repair string_w_opt_binding
- constituent_def constituent_def
- constituent_def constituent_def..
- constituent_def lt constituent_def.. gt
- constituent_def opt_regent_mark opt_lexeme
opt_binding opt_feature_constraint - opt_language_id e ISO_std_lang_identifier
ISO_std_lang_identifier - ISO_std_lang_identifier ee en fi fr
se Å - opt_regent_mark e
- opt_lexeme e lexeme tag name
- opt_binding e binding
- opt_feature_constraint e feature..
- binding ( variable_name ) ()
- feature feature_value property_type
binding
12Examples of ALEs - 1/3
footwear.word.27 se allväderskänga fi
jokasäänkenkä en all weather
shoe price.tax.4 se inkl. moms
tag_price(X) fi sis. alv tag_price(X)
en incl. VAT tag_price(X)
cloth.material.composition fi
(A)clothProd tag_percentage(X)
(B)textileMaterial ptv fr
(A)clothProd en tag_percentage(X)
(B)textileMaterial en
(A)clothProd of tag_percentage(X)
(B)textileMaterial se
(A)clothProd av tag_percentage(X)
(B)textileMaterial
- Basic word correspondence definition
- Specific idiom correspondence
- Generalised ALE, e.g. "shirt of 100 cotton
13Examples of ALEs - 2/3
cloth.property.1 se (A)adj clothProp gender(B)
number(B) (B)noun clothProd fi (A)adj
clothProp case(B) number(B) (B)noun
clothProd en (A)adj clothProp (B)noun
clothProd property.expr.1 se (A)adj prop
gender() number() fi (A)adj prop
number() case() en (A)adj prop
property.expr.2 property.expr.2 tag_comma
property.expr.3 property.expr.3 property.expr
.1 conjAND property.expr.1
- Semantical and grammatical restrictions,
- e.g. agreement in miellyttävä pusero
- or miellyttävää puseroa
- (comfortable blouse)
- An iterative phrase, obs! tree flattening
- cloth.property.2
- se property.exprclothProp
- (B)cloth
- fi property.exprclothProp
- (B)cloth
- en property.exprclothProp
- (B)cloth)
14Examples of ALEs - 3/3
- Negative Instances - Correction ALEs
- correct.ellos.3
- se kardborrstängning(A)
- se kardborreförslutning(A)
- se kardborrknäppning(A)
- se kardborreknäppning(A)
- message Use the correct synonym
- kardborrestängning instead of word(A)
- repair kardborrestängning(A)
15Meaning Extraction Process
Input phrase
Set of CARIN formulas
Syntactico-semantic analysis
Inference of CARIN formulas
Set of approved lexical-semantic graphs with
concepts identified
16Meaning Extraction Process Example
Input phrase
Set of CARIN formulas
Syntactico-semantic analysis
Inference of CARIN formulas
musta hame, jossa halkio ja taskut une jupe noire
avec fente et poches a black skirt with split and
pockets
(c_MKBEEM81098colour)(H1017), (r_name)(H1017,H64
1), (c_MKBEEM84731clothingProduct)(H984), (r_nam
e)(H984,H684), (c_MKBEEM81011part)(H951), (r_nam
e)(H951,H813), (c_MKBEEM81011part)(H918), (r_nam
e)(H918,H899), (l_dependency)(H684,adjAttr,H641),
(l_dependency)(H684,prepAttr,H813), (l_dependency)
(H684,prepAttr,H899), (l_constituent)(H641,0,musta
, fi,colour,musta,adj,nom,sg), (l_cons
tituent)(H684,1,hame,
fi,product,hame,noun,nom,sg), (l_constituent)(H7
27,2,tag_comma, fi), (l_constituent)(H770,3,joss
a, fi,jossa,pron,ine,sg), (l_constitue
nt)(H813,4,halkio, fi,prodpart,halkio,n
oun,nom,sg), (l_constituent)(H856,5,ja,fi,conj,j
a,coord_c), (l_constituent)(H899,6,taskut,
fi,prodpart,taskut,noun,nom,pl)
(c_product)(H1606), (r_name)(H1606,skirt
), (c_colour)(H1573),
(r_name)(H1573,black), (c_part)(H1540),
(r_name)(H1540,split), (c_part)(H1506),
(r_name)(H1506,pocket)
Set of approved lexical-semantic graphs with
concepts identified
17Cataloguing Tool Testing by End-Users
- Goals
- Proof of concept (Swiss army knife of a
cataloguer) - Usability in real working environment
- Ellos' test group consisted of 8 persons
(translators, cataloguers and call-centre
workers) - familiar with Internet 5 yes, 1 almost yes, 2
yes at home - languages used 8 Finnish, 6 English, 4 Swedish,
1 French - familiar w. catalogue maintenance 6 yes, 2 no
- Schedule
- Short training and preliminary interviews on
August 30, 2002 - Interviews of experiences and summary of the
results ready by October 14, 2002
18... Trial experiences of the Ellos test group
- Cataloguing tool considered to be useful
- cataloguing process as a whole was seen as an
easy and efficient way of producing and
classifying product information - each of the main features was considered good
- very importantsemi-automatic translation into
target languages - property extraction and inference with colours
and materials seen as important in bringing
value-adding services to customers - helps in producing consistent and uniform
information - can make the working process faster and reduce
the amount of manual, repeated routine procedures - KB management tools considered suitable to their
task - Reported difficulty
- occasionally long response times gt boring of the
user e.g. repeating queries - e.g. "hourglass" or provision of partial results
could bring quick help - will be eventually solved by continued product
development
19MT Part (Webtran) in Production Use at Ellos
since 2000
Ellos Sweden
Ellos Finland
MacQuarkXPress
Cataloguer
Catalogue author
LocalisedDB
MacQuarkXPress
Swedish
Finnish
AutomaticSw -gt Fi Translation
LanguageModeller
SourceDB
PC Server
WebtranMachine TranslationSoftware
About 2000 translated catalogue pages and
10000-15000 product descriptions per
year Benchmark by CSC Inc. reports over 30 time
savings after one year of use
Language technology solutions are necessary to
embed into business processes and IT
infrastructure
20Work Needed for Adding Domain and Languages
- Marginal cost of adding a new domain or a new
language is reasonable with respect to the
added-value gained - Based on experiences from modelling vacation
cottage domain to the system (fi,fr,en) we have
estimated that introducing a comparable new
domain would require - semantic-lexicon 2 man-months
- translation and meaning extraction rules 1
man-month - product models 2-4 man-weeks
- We also estimate that adding a language to a
pre-existing domain would need - semantic-lexicon 1-2 man-month
- translation and meaning extraction rules 2-4
man-week - product models 1 man-week
21Future Development Recommendations
- Further development of could focus on the
following issues - information request processing dialogues
- question answering capabilities (e.g. qualitative
questions about the goods selection) - proper way of handling null queries (e.g.
graceful relaxation of the search constraints
based on the ontology models and the actual goods
selection) - new languages to the system Russian, Norwegian,
Estonian, German ... - user-friendlier ways for the acquisition and
maintenance of language models and product models
(knowledge acquisition bottleneck) machine
learning - special requirements of mobile terminals (e.g.
automatic text abstraction)
22DEMO