CS5545: Natural Language Generation - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

CS5545: Natural Language Generation

Description:

Background Reading: Reiter and Dale, Building Natural Language ... A bit rapid vs too fast vs unwise vs ... Ascended vs rose vs rose to surface vs ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 40
Provided by: computin7
Category:

less

Transcript and Presenter's Notes

Title: CS5545: Natural Language Generation


1
CS5545 Natural Language Generation
  • Background Reading Reiter and Dale, Building
    Natural Language Generation Systems, chaps 1, 2

2
Words instead of Pictures
  • Natural Language Generation (NLG) Generate
    English sentences to communicate data, instead of
    visualisations (or tables)
  • A research focus of Aberdeen CS Dept

3
Example FoG
  • Produces textual weather reports in English and
    French
  • Input
  • Graphical/numerical weather depiction
  • User
  • Environment Canada (Canadian Weather Service)

4
FoG Input
5
FoG Output
6
Why use words?
  • Many potential reasons
  • Media restrictions (eg, text messages)
  • Users not knowledgeable enough to interpret a
    graph correctly
  • Words also communicate background info, emphasis,
    interpretation,
  • People (in some cases) make better decisions from
    words than graphs

7
Too hard for 1/3 of patients
8
Easier for many people?
  • Im afraid to say that you have a 1 in 3 chance
    of dying from a heart attack before your 65th
    birthday if you carry on as you are. But if you
    stop smoking, take your medicine, and eat better,
    a fatal heart attack will be much less likely
    (only a 1 in 12 chance).

9
Text vs Graph
  • Focus on key info (absolute risk, optimum risk)
  • Integrate with explanation (optimum risk means if
    you stop smoking, eat better, take medicine)
  • Add emphasis, perspective, spin (eg. Im
    afraid to say indicates this is a serious
    problem)

10
Experiment Decision Making
  • Showed 40 medical professionals (from junior
    nurses to senior doctors) data from a baby in
    neonatal ICU
  • Text summary of graphical depiction
  • Asked to make a treatment decision
  • Better decision when shown text
  • But said they preferred the graphic

11
Graphic Depiction
12
Text Summary
13
What is NLG?
  • NLG systems are computer systems which produces
    understandable and appropriate texts in English
    or other human languages
  • Input is data (raw, analysed)
  • Output is documents, reports, explanations, help
    messages, and other kinds of texts
  • Requires
  • Knowledge of language
  • Knowledge of the domain

14
Language Technology
Meaning
Text
Text
Speech
Speech
15
Aberdeen NLG Systems
  • STOP (smoking cessation letters)
  • SumTime (weather forecasts)
  • Ilex (museum description)
  • SkillSum (feedback on assessment)
  • StandUp (help children make puns)
  • BabyTalk (summary of patient data)

16
(No Transcript)
17
How do NLG Systems Work?
  • Usually three stages
  • Not including data analysis (eg segmentation)!
  • Document planning decide on content and
    structure of text
  • Microplanning decide how to linguistically
    express text (which words, sentences, etc to use)
  • Realisation actually produce text, conforming to
    rules of grammar

18
Scuba example input
  • Input result of data interpretation
  • Trends segmented data (as in pract 2)
  • Patterns eg, rapid ascent, sawtooth, reverse
    dive profile, etc (as in pract 2)

19
Scuba example output
  • Your first ascent was a bit rapid you ascended
    from 33m to the surface in 5 minutes, you should
    have taken more time to make this ascent. You
    also did not stop at 5m, we recommend that anyone
    diving beneath 12m should stop for 3 minutes at
    5m. Your second ascent was fine.

20
Document Planning
  • Content selection Of the zillions of things I
    could say, which should I say?
  • Depends on what is important
  • Also depends on what is easy to say
  • Structure How should I organise this content as
    a text?
  • What order do I say things in?
  • Rhetorical structure?

21
Scuba content
  • Probably focus on patterns indicating dangerous
    activities
  • Most important thing to mention
  • How much should we say about these?
  • Detail? Explanations?
  • Should we say anything for safe dives?
  • Maybe just acknowledge them?

22
Scuba structure
  • Order by time (first event first)
  • Or should we mention the most dangerous patterns
    first?
  • Linking words (cue phrases)
  • Also, but, because,

23
Microplanning
  • Lexical choice Which words to use?
  • Aggregation How should information be
    distributed across sentences and paras
  • Reference How should the text refer to objects
    and entities?

24
SCUBA microplanning
  • Lexical choice
  • A bit rapid vs too fast vs unwise vs
  • Ascended vs rose vs rose to surface vs
  • Aggregation 1 sentence or 3 sent?
  • Your first ascent was a bit rapid you ascended
    from 33m to the surface in 5 minutes, it would
    have been better if you had taken more time to
    make this ascent.

25
Scuba Microplanning
  • Aggregation (continued)
  • Phrase merging
  • Your first ascent was fine. Your second ascent
    was fine vs
  • Your first and second ascents were fine.
  • Reference
  • Your ascent vs
  • Your first ascent vs
  • Your ascent from 33m at 3 min

26
Realisation
  • Grammars (linguistic) Form legal English
    sentences based on decisions made in previous
    stages
  • Obey sublanguage, genre constraints
  • Structure Form legal HTML, RTF, or whatever
    output format is desired

27
Scuba Realisation
  • Simple linguistic processing
  • Capitalise first word of sentence
  • Subject-verb agreement
  • Your first ascent was fine
  • Your first and second ascents were fine
  • Structure
  • Inserting line breaks in text (pouring)
  • Add HTML markups, eg, ltPgt

28
Multimodal NLG
  • Speech output
  • Text and visualisations
  • Produce separately, OR
  • Tight integration
  • Eg, text refers to graphic, OR
  • graphs has text annotations
  • Prelim study suggest users prefer this for scuba
    reports (Sripada and Gao, 2007)

29
Scuba - graph
30
Scuba NLG text
  • Risky dive with some minor problems. Because your
    bottom time of 12.0min exceeds no-stop limit by
    4.0min this dive is risky. But you performed the
    ascent well. Your buoyancy control in the bottom
    zone was poor as indicated by saw tooth
    patterns.

31
Combined (Preferred)
Risky dive with some minor problems. Because your
bottom time of 12.0min exceeds no-stop limit by
4.0min this dive is risky. But you performed the
ascent well. Your buoyanc control in the bottom
zone was poor as indicated by saw tooth
patterns marked A on the depth-time profile.
32
Building NLG Systems
  • Knowledge
  • Representations
  • Algorithms
  • Systems

33
Building NLG Systems Knowledge
  • Need knowledge
  • Which patterns most important?
  • What order to use?
  • Which words to use?
  • When to merge phrases?
  • How to form plurals
  • Etc
  • Where does this come from?

34
Knowledge Sources
  • Imitate a corpus of human-written texts
  • Most straightforward, will focus on
  • Ask domain experts
  • Useful, but experts often not very good at
    explaining what they are doing
  • Experiments with users
  • Very nice in principle, but a lot of work

35
Scuba Corpus
  • See which patterns humans mention in the corpus,
    and have the system mention these
  • See the order used by humans, and have the system
    imitate these
  • etc

36
Building NLG Systems Algorithms and
Representations
  • Various algorithms and representations have been
    designed for NLG tasks
  • Will discuss in later lectures
  • But often can simply code NLG systems
    straightforwardly in Java, without special
    algorithms
  • Knowledge is more important

37
Building NLG Systems Systems
  • Ideally should be able to plug knowledge into
    downloadable systems
  • Unfortunately very little in the way of
    downloadable NLG systems
  • Mostly specialised stuff primarily of interest to
    academics, eg http//openccg.sourceforge.net/
  • I would like to improve situation

38
Simplenlg package
  • Java class library
  • Currently handles realisation, some parts of
    microplanning
  • Not document planning
  • Coverage should expand over time

39
Aberdeen NLG group
  • 20 academic staff, researchers, PhD students
  • Leader Prof Chris Mellish
  • (one of) the best NLG groups in world
  • Always looking fot good PhD students
Write a Comment
User Comments (0)
About PowerShow.com