Title: Pragmatic Influences on Sentence Planning and Surface Realization: Implications for Evaluation
1Pragmatic Influences on Sentence Planning and
Surface Realization Implications for Evaluation
- Amanda Stent
- Stony Brook University
2Things Ive Noticed
- Different basic approaches
- Text-to-text vs. KR to text
- Corpus linguistics vs. systems engineering vs.
empirical research - Evaluation vs. Experimentation
3Three Possibilities
- Shared evaluation resources
- Data
- Metrics
- Tools
- Shared evaluation task(s)
- Shouldnt be unnecessarily limiting
- Shared evaluation framework
- Encompasses/organizes both tasks and resources
4Evaluation Framework
- Three dimensions
- Discourse type
- Summaries, explanations.
- Application
- Tutoring, Q/A, dialog.
- Generation task
- Proposal is for a Wiki on generation evaluation
McKeown, Walker, Green, Viethan, Gatt
McKeown, Walker, Reiter, Rus et al, Byron et al
Cf. Mellish and Scott, Pario et al.
5Evaluation Framework
6Uses of Framework
- Facilitiate discussion
- What is generation?
- Where do certain generation tasks take place?
- Set up shared workspace
- Wiki
- Focus choice of shared tasks
- Choose initial shared tasks from
discourse/application/task triples where there is
already data and/or multiple implementations
7Generation in Context
- Context is used in several generation
tasks/applications - User modeling content selection, sentence
planning - Topic/focus RE generation, sentence planning,
surface realization - Style surface realization, multimedia generation
KR is a huge issue CALO Rudnicky, Di
Fabbrizio et al.
8Generation in Context
- Existing evaluation metrics measure (to some
extent) fluency and adequacy (Stent et al. 05) - Context affects both fluency and adequacy
- But existing automatic metrics (for surface
realization) do not take context into account - Cf. human evaluation methods like those used in
(Walker et al. 02)
9Generation in Context -- Examples
- Parallelism/Awkwardness
- Italys industrial wholesale sales index rose
13.2 in June from a year earlier - The June increase compared with a rise of 10.5
in may from a year earlier - The June increase compared with a rise in May of
10.5 from a year earlier (Zhong and Stent)
10Generation in Context -- Examples
- Unwanted implications
- He didnt start it but Mohandas Gandhi certainly
provided a recognizable beginning to non-violent
civil disobedience as we know it today - The mahatma instigated several campaigns of
passive resistance against the British government
in India - The mahatma instigated several campaigns against
the British government in India of passive
resistance (Zhong and Stent) - An explosion was reported in a shopping center in
central Israel Monday, and paramedics said there
were many casualties - A detonation was described in a shopping eye in
central Israel Monday, and paramedics said there
were many injured parties. (Barzilay and Lee)
11Generation in Context -- Examples
- Unwarranted assertions, loss of meaning
- Three Israeli soldiers were killed when a
Palestinian suicide bomber blew himself up at a
West Bank Jewish settlement Sunday, while two
Palestinian men were killed in a gunbattle with
Israeli troops to the north in Nablus. - Police spokeswoman said 3 people were killed by
bomber at a West Bank Jewish settlement attack on
Sunday. (Barzilay and Lee)
12Generation in Context -- Examples
- Unhelpful use of discourse cues
- Chanpen Thai has the best overall quality among
the selected restaurants because it is a Thai
restaurant. (Walker et al.) - Poor choice of referring expression
- Sally gave John the box. She gave the doll to
Sam, and she gave the box to him. (constructed
based on examples seen in dialog systems)
13Generation in Context -- Examples
- Gold standard / human data inadequate/disfluent/
awkward - The Pad-Thai was soooo not good and so kinda
concluded that the other entree's we ordered
wouldn't be well executed (Restaurant reviews on
Yelp.com) - a colleague of mine at work got some information
over the computer network called internet
(Switchboard) - a person lives across the street from me brought
her home from work because a coworker of hers had
this dog appear on its front doorstep
(Switchboard)
14What I Want I
- Shared resources
- Facilitate comparative evaluation
- The more KR info the better
- Shared framework
- Identify understudied areas
- Shared tasks
- Divide the load of human evaluation
- Facilitate comparative evaluation
15What I Want II
- Context
- Represent local and global context
- Info about representations less important than
shared representations - Sign me up for the virtual worlds!
- Users available
- Text-to-text and DB-to-text
- One-off and interactive
- (Partially) solve KR, context issues
- CONTROL for studies