Title: A Newbie Experience of Dialogue System Construction Using the Ravenclaw Framework
1A Newbie Experience of Dialogue System
Construction Using the Ravenclaw Framework
2Introduction
- Do you know?
- Arthur Chan actually takes classes in CMU !
- Course he took this year
- Project Course Dialogue Systems
- The course required the use of Ravenclaw/Olympus
- A journal was kept on the experience I learned in
the process - Requested by gang members such as Dan and Thomas
3Speakers Bio
- Mainly a speech recognition guy
- i.e. the part that transform speech to text
- Not very experienced in dialogue system
- Only work on directed dialogue system
- Speechwork 6.5
- i.e. an all-in-one dialogue system speech
recognizer - Dialogues are modularized
- E.g. Digits, Alphabets, ZipCode
4What did we do this year?
- 3 systems by 3 groups
- RoadFinder Aaron, Dave and Wen
- ICSLPInfo Arthur, Lingyun and Rohit
- Extension of Vera Mohit, Kaimin and ?
- The actual situation
- Dave did most of the stunts
- Each group has a person just to take care
development kick-start and system issues - Mailing list became the collaborative means
5New Development
- Sphinx3_Engine
- With Sphinx 3.6 RCI
- With Powerful Wideband Models (CALO) and
Narrowband Models (Communicator) - LM Training Scripts
- With tools newly built in Project L
(CMU-Cambridge LM Toolkit V3) - IAX_Server
- Allow systems to be used in Asterisk server(?)
6This talk
- Mini Case Study of ICSLPInfo
- Try to learn what information we could give to
users for a conference - The type of information is unknown
- Two perspectives
- From a new user perspective
- From a developer perspective
7The New Users Perspective
- Generally, as a new user, is it easy to learn
Ravenclaw? - Related Questions
- Do I hate Dan? (Forever? Or even for a moment?)
- Is it scary to use Ravenclaw?
- What do we know /not know at a certain stage?
- What is the general comment on the software?
8The Developers Perspective
- From a developers standpoint, what are the
issues of development? - Issues in speech recognition?
- Issues in dialogue system development?
- Issues in general application development?
- Issues in multi-developer development?
- When should we work on SR/RP/DS/BE?
9The Development Process
- Stage 0 Planning, drawing diagrams and stuffs
- Stage 1 Making some existing systems run
- Stage 2 Making simple systems run
- Making SR works without the backend
- Making the backend works without the SR
- Stage 3 Making the first end-to-end system to
run - (Not cover today) Stage 4 Final adjustment and
final demo
10Stage 0 Planning (2-3 weeks)
- Major issue
- The type of useful information could be unknown
- Author?
- Session?
- Title?
- Venue?
- We actually didnt know what is the most useful
at Stage 0
11Stage 1 Making some existing systems run (1
month)
- Wide varieties of pre-built systems using
Ravenclaw - Path 1 Starting from ConvertProj
- ConvertProj is a very simple project
- Path 2 Starting from RoomLine
- Path 3 Starting from scratches
- Path 1 was first chosen so that everyone could
get an initial system
12Note in Stage 1
- Not everyone has easy time to get the initial
setup running (1-2 weeks) - Forgot to install active perl and miscellaneous
tools - At the beginning, didnt know where to debug
- The synthesizer turns out to be not pre-built
(1-2 weeks) - Speech Recognizer is not running yet
- Dont know why at that point.
13If we starts from ConvertProj
- How do we write the first system then?
- ConvertProj is very simple but we didnt know
what it does - We didnt understand how Phoenix/Ravenclaw works
- Rohit Let us start from Roomline then.
- Turns out to be a very good idea
- Why?
- Roomline is complicated but the learner can learn
from the code - There are also couple of patterns could be reused
e.g for-loop, if-then-else
14Note
- We already got a hold of Description of
Ravenclaw Agent Description Language - Not a tutorial, no examples
- We didnt know how to start based on it
- Thats why a template was needed
- We end up trace the whole Roomline system
15Stage 2a Making a system with working SR
- Our biggest problem Name Recognition
- Recognizing 1000 names
- Many of them are Asian names
- No training data
- Dave hasnt built the LM building script
- The type of information is not yet set
- Should we handle names?
16Stage 2a Making a system with working SR (cont.)
- Our first bootstrapping system
- Use Sphinx3_Engine CALO model
- Probably the strongest SR we could use
- Use Roomline language model
- Just tweak the grammar a little bit
- Add a lot of compound words into classes
- Also, only use session chairs (180 names) is in
grammar
17The First System (No BE)
icslpinfo
Reset DateTime
Welcome
Logout
Task
Request Satistfied
Inform Logout
HMIHY
18Note at Stage 2a
- Finally gotten something running
- But the system did nothing
- We are still very vague in
- how message is passed in Galaxy and
- how results transferred from SR to RP to DM
19Stage 2b Making the backend works without the SR
- The backend is finally built at this stage
- The backend/DM/RP is working and text console
mode is working - DM now gives the abstract when asked about the
author - But this time, SR fails because
- the grammar accept too many
- the Roomline LM was used.
20Note at Stage 2b
- Another difficult issue shows up
- SR/RP/DM are very tightly coupled with each other
- Other problems
- Occasionally, is shown in the prompts
- Because some prompts wasnt filled in
- Good part
- The first type of information we will handle is
finally decided - This constrains SR
- We start to feel time is running short
21Stage 3 Making the first end-to-end system to run
- Speech Recognition
- Retrain LM using faked corpora
- Significantly trimmed down the number of authors
to recognize (From 200 to 30) - Few author names are easily recognized still.
- The lucky ones
- Alan Black
- Arthur Toth
- Julia Hirschberg
- Andrew Rosenberg
- (Alex is not very happy about this. His name is
confused with context key)
22Note at this point
- Started to realized that SR couldnt have quick
improvement - The problem of DM starts to be glaring
- No disambiguation
- When multiple results are return, no strategy to
take care. - Also, SR always couldnt recognize things in
grammar. - A lot of GARBAGE is recognized
- See a lot On Alan Black
23DM
- Allow disambiguation using author name and
session name - Taken care of different scenarios of results
- If there is no results,
- Say Sorry and restart.
- If there is one result
- Present the detail of the paper,
- Then ask whether to present the abstract of the
paper - If there is less than or equal to 5 results
- Tell the user the number of papers found
- Then ask whether to present the summary of the
paper. - (List of titles of the paper)
- If there is more than 5 results
- Say sorry
24Other small things We Hacked Out
- Confidence of The Recognizer
- Audio Server is hacked such that
- We are always confident about the results.
- Annoying restarting issue
- Commented the restarting routine in Windows
25Backend and NLG
- Backend
- (may be for this demo only)
- SQL-based
- Could do author-search and session-name-search
- NLG
- Fill in all sorts of prompts
- A lot of Implicit Confirmation and Explicit
Confirmation are missing - That caused a lot of in the system
26Demo
- Scenario
- A user want to know information of the papers
written by - Alan Black
- Julia Hirschberg and
- Andrew Rosenberg
- What it shows
- How bad recognition is taken care now.
- What happened when the number of answers returned
are multiple or single.
27Note
- Rohit Kummar and Lingyun Gao actually holds the
latest and greatest system. - This system only shows how we built up from
ground zero.
28Summary 3 Difficult Issues in the Task
- 1, Tight coupling of SR/RP/DM
- When one part is right, others could failed
- 2, SR issues
- The SR task could be affected by different
constraints. - First system is hard to be up
- Compound with 1
- 3, Lack of documentation in DM
- The current documentation base is not strong
enough - Read-and-implement approach doesnt work yet
- Some concepts are difficult to understand
- Say COMPLETE/SUCCEED/FAILED
- GRAMMAR_MAPPING
29Lessons learn
- Iteratively develop the system by boostrapping
each with simple systems - This would greatly reduce the pain of coupling
- SR issue
- The first system could be completed by some
smaller grammars first - In some task, SR shouldnt be the focus at a
certain point. - Aligned with common observation
- DM Development
- A good working template is necessary
- What we need for loop, if-then-else templates
30The bright side 1 birthday gift for Dave
- Once understood, pretty easy to program
- E.g. birthday celebration system
- Sample Dialogue
- S Do you want to know whats going on?
- U Yes (or No)
- S No matter whether you say yes or no, I will
have to tell you. Begin message. - Hmm-hm. Today is Mr David Huggines Daines
Birthday. Because everyone is too shy to sing
the birthday song for him. Me, Frank, will have
to sing it. Here you go. Happy Birthday to you,
Happy Birthday to you. Happy Birthday to David.
Happy Birthday to you. This message is bought to
you by - End message
31Bright Side 2
- If compared to a directed dialogue system, the
current system could give unexpected results. - Why?
- several sub-systems of Dialogue system is working
together - Built in Libraries
- Grounding
- Focuses
- Developer-defined libraries
- It is delightful to use it in general
32Bright Side 3
- Source code has consistent coding style
- Development problem will be mainly stemmed from
- 1, Lack of automatic regression test
- 2, Lack of central manager
- Not a bad thing in dialogue system if
developer/system 1
33Conclusion
- Summarize the system development of how the
end-to-end system of ICSLPInfo is first developed - Discussed several issues including
- Coupling of systems
- SR
- DM development
- Overall speaking
- Thrilled when getting the system running and
working