Sin t - PowerPoint PPT Presentation

About This Presentation
Title:

Sin t

Description:

TALP. Linguistic and Logical Tools. for an Advanced Interactive Speech System in Spanish ... (HORA-SALIDA) CIUDAD-ORIGEN: Guadalajara. CIUDAD-DESTINO: C ceres ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 26
Provided by: lcl53
Learn more at: https://www.cs.upc.edu
Category:
Tags: hora | sin

less

Transcript and Presenter's Notes

Title: Sin t


1
Linguistic and Logical Tools for an Advanced
Interactive Speech System in Spanish
J. Álvarez, V. Arranz, N. Castell M. Civit TALP
Research Centre UPC, Barcelona
2
Contents
  • Introduction
  • Corpora Construction
  • System Architecture
  • Understanding Module
  • Input Problems Solutions Adopted
  • Language Processing
  • Morphology
  • Syntax
  • Semantic Extraction
  • Dialogue Manager
  • Conclusions

3
Introduction
  • Increasing need for more natural HMI
  • Development of a dialogue system
  • spontaneous speech
  • restricted-domain railway information
  • rather user-friendly communication exchange
  • language of application Spanish
  • Other related systems
  • ATIS, TRAINS, LIMSI ARISE, TRINDI, ...

4
Corpora Construction
  • Project objective none available in Spanish
  • Two different corpora developed
  • human-human
  • human-machine (Wizard of Oz technique)
  • 150 different situations
  • an open scenario
  • total of 227 dialogues

5
System Architecture
6
Understanding Module
7
Input Problems (1)
  • Recognition Errors
  • Excess of information
  • U sábado treinta de octubre (Saturday,
    October 30)
  • R un tren que o sábado treinta de octubre
  • (a train that or...)
  • Erroneous recognition
  • U gracias (thank you)
  • R sí pero ellos (yes but they)

8
Input Problems (2)
  • Grammar errors
  • Lack of prepdet contractions de el ? del
  • Wrong use of indefinite determiner un de
    octubre
  • ? uno de octubre (1 October)
  • Wrong orthographical transcriptions qué/que,
    a/ha,...
  • ? (what/that, to/has,...)

9
Input Problems (3)
  • Problems caused by spontaneous speech
  • Syntactic disfluencies
  • U a ver los horarios de los trenes que van de
    Teruel a Barcelona el este próximo viernes y que
    vayan de Barcelona a Teruel el próximo que
    vuelvan de Barcelona a Teruel el próximo domingo
  • Lexical disfluencies, pauses, noises, ...

10
Solutions
  • Adapting the recogniser to the domain
  • Adapting the recogniser to spontaneous speech
  • Adapting the understanding module
  • Closing the entry channel

11
Tools
  • MACO Morphological Analyzer Corpus Oriented
    Carmona et al., 98
  • RELAX Relaxation Labelling Based Tagger Padró,
    97
  • TACAT Tagged Corpus Analyzer Tool Castellón et
    al., 98
  • PRE Production Rule Environment Turmo, 99

12
Example
  • User turn
  • Me gustaría información sobre trenes de
    Guadalajara a Cáceres para la primera semana de
    agosto
  • (I would like some information about trains
    from Guadalajara to Caceres for the first week of
    August)

13
Language Processing
Transcription
14
Morphology (1)
  • MACO
  • contains knowledge organised into classes and
    inflection paradigms
  • uses a task/domain lexicon less ambiguity and
    better execution time
  • provides all possible labels per word
  • RELAX
  • disambiguates obtained labels
  • is constraint based with relaxation labelling

15
Morphology (2)
me yo PP1CSO00 gustaría gustar VMCP1S0 información
información NCFS000 sobre sobre SPS00 trenes
tren NCMP000 de de SPS00 Guadalajara guadalajara
NP000C0 a a SPS00 Cáceres cáceres NP000C0 para
para SPS00 la la TDFS0 primera primero
MOFS00 semana semana NCFS000 de de SPS00 agosto
agosto NCMS000 . . Fp
16
Syntax (1)
  • TACAT
  • shallow parser
  • context-free grammar adapted for the domain
  • rules re-written for dates, timetables and proper
    names
  • bottom-up strategy
  • this adaptation helps semantic searches

17
Syntax (2)
posgtS posgtpatons
posgtpp1cso00 , formagt"Me" , lemagt"yo"
posgtgrup-verb posgtvmcp3s0 ,
formagt"gustaría" , lemagt"gustar"
posgtsn posgtncfs000 ,
formagt"información" , lemagt"información"
posgtgrup-sp posgtsps00 ,
formagt"sobre" , lemagt"sobre"
posgtsn posgtncmp000 , formagt"trenes"
, lemagt"tren" posgtgrup-sp
posgtsps00 , formagt"de" , lemagt"de"
posgtsn posgtnp000c0 , formagt
"Guadalajara" , lemagt " Guadalajara"
posgtgrup-sp posgtsps00 ,
formagt"a" , lemagt"a" posgtsn
posgtnp000c0 , formagt"Cáceres" , lemagt "
Cáceres" posgtgrup-sp
.........
18
Semantic Extraction (1)
Aim generation of semantic frames
19
Semantic Extraction (2)
  • System implemented in PRE
  • PRE
  • production rule environment
  • very flexible and robust
  • rule conditions contain syntactic patterns and
    lexical items to search for
  • priority, score and control allow to specify
    rule application, location of concept to
    extract,...

20
Semantic Extraction (3)
(rule CiudadOrigen3 ruleset CiudadOrigen
priority 10 score 0,_,1,0
control forever ending Postrule
(InputSentence tree ltagttree_matching(
posgtgrup-sp
lemagt dedesde
posgt np000c0, formagt?forma
)) -gt (?_ Print(CiudadOrigen,?
forma)) (?_ REM(CiudadOrigen,X,a)))
21
Understanding Module
22
Dialogue Manager (1)
  • Implemented using YAYA Alvarez, 00
  • Reasoning engine combines
  • frames from the understanding module, with
  • facts from the dialogue history, and with
  • axioms
  • in order to generate
  • reaction facts from the system
  • Output based on frames
  • for the natural language generator (content)
  • for the recogniser (Speech Act prediction)

23
Dialogue Manager (2)
Output Frame
Sentence to generate De Guadalajara a Cáceres
qué día desea viajar? (From Guadalajara to
Caceres, when do you wish to travel?)
24
Conclusions
  • Corpus development valuable resource
  • Adaptation of general NLP tools for
  • domain
  • spontaneous speech dialogue
  • Development of new tools
  • semantic extraction (use of PRE) flexible
    robust
  • dialogue manager (use of YAYA) fast to develop
    easy to modify
  • Challenge processing in real time

25
Linguistic and Logical Tools for an Advanced
Interactive Speech System in Spanish
J. Álvarez, V. Arranz, N. Castell M. Civit TALP
Research Centre UPC, Barcelona
Write a Comment
User Comments (0)
About PowerShow.com