WordsEye: From Text To Pictures - PowerPoint PPT Presentation

1 / 98
About This Presentation
Title:

WordsEye: From Text To Pictures

Description:

... (interpret-sentence 'the boys on the beach said that the fish swam to island') Parse: ... John holds the red shark. Bloopers: Jack carried the television ... – PowerPoint PPT presentation

Number of Views:160
Avg rating:3.0/5.0
Slides: 99
Provided by: www1CsC
Category:
Tags: wordseye | island | long | on | pictures | shark | text

less

Transcript and Presenter's Notes

Title: WordsEye: From Text To Pictures


1
WordsEye From Text To Pictures
The very humongous silver sphere is fifty feet
above the ground. The silver castle is in the
sphere. The castle is 80 feet wide. The ground is
black. The sky is partly cloudy.
2
Why is it hard to create 3D graphics?
3
The tools are complex
4
Too much detail
5
Involves training, artistic skill, and expense
6
Pictures from Language
  • No GUI bottlenecks - Just describe it!
  • Low entry barrier - no special skill or training
    required
  • Give up detailed direct manipulation for speed
    and economy of expression
  • Language expresses constraints
  • Bypass rigid, pre-defined paths of expression
    (dialogs, menus, etc) as defined by GUI
  • Objects vs Polygons draw upon objects in
    pre-made 3D and 2D libraries
  • Enable novel applications in education, gaming,
    online communication, . . .
  • Using language is fun and stimulates imagination
  • Semantics
  • 3D scenes provide an intuitive representation of
    meaning by making explicit the contextual
    elements implicit in our mental models.

7
WordsEye Initial Version (with Richard Sproat)
  • Developed at ATT Labs
  • Graphics Mirai 3D animation system on Windows NT
  • Church Tagger, Collins Parser on Linux
  • WordNet (http//wordnet.princeton.edu/)
  • Viewpoint 3D model library
  • NLP (linux) and depiction/graphics (Linux)
    communicate via sockets
  • WordsEye code in Common Lisp
  • Siggraph paper (August 2001)

8
New Version (with Richard Sproat)
  • Rewrote software from scratch
  • Linux and CMUCL
  • Custom Parser/Tagger
  • OpenGL for 3D preview display
  • Radiance Renderer
  • ImageMagic, Gimp for 2D post-effects
  • Different subset of functionality
  • No verbs/poses yet
  • Web interface (www.wordseye.com)
  • Webserver and multiple backend text-to-scene
    servers
  • Gallery/Forum/E-Cards/PIctureBooks/2D effects

9
A tiny grey manatee is in the aquarium. It is
facing right. The manatee is six inches below the
top of the aquarium. The ground is tile. There is
a large brick wall behind the aquarium.
10
A silver head of time is on the grassy ground.
The blossom is next to the head. the blossom is
in the ground. the green light is three feet
above the blossom. the yellow light is 3 feet
above the head. The large wasp is behind the
blossom. the wasp is facing the head.
11
The humongous white shiny bear is on the American
mountain range. The mountain range is 100 feet
tall. The ground is water. The sky is partly
cloudy. The airplane is 90 feet in front of the
nose of the bear. The airplane is facing right.
12
A microphone is in front of a clown. The
microphone is three feet above the ground. The
microphone is facing the clown. A brick wall is
behind the clown. The light is on the ground and
in front of the clown.
13
(No Transcript)
14
Mary uses the crossbow. She rides the horse by
the store. The store is under the large willow.
The small allosaurus is in front of the horse.
The dinosaur faces Mary. The gigantic teacup is
in front of the store. The gigantic mushroom is
in the teacup. The castle is to the right of the
store.
15
Web Interface preview mode
16
Web Interface rendered (raytraced)
17
WordsEye Overview
  • Linguistic Analysis
  • Parsing
  • Create dependency-tree representation
  • Anaphora resolution
  • Interpretation
  • Add implicit objects, relations
  • Resolve semantics and references
  • Depiction
  • Database of 3D objects, poses, textures
  • Depiction rules generate graphical constraints
  • Apply constraints to create scene

18
Linguistic Analysis
  • Tag part-of-speech
  • Parse
  • Generate semantic representation
  • WordNet-like dictionary for nouns
  • Anaphora resolution

19
Example John said that the cat is on the table.
20
Parse tree for John said that the cat was on the
table.
21
Nouns Hierarchical Dictionary
22
WordNet problems
  • Inheritance conflates functional and lexical
    relations
  • Terrace is a plateau
  • Spoon is a container
  • Crossing Guard is a traffic cop
  • Bellybutton is a point
  • Lack of multiple inheritance between synsets
  • Princess is an aristocrat, but not a female
  • "ceramic-ware" is grouped under "utensil" and has
    "earthenware", etc under it. But there are no
    dishes, plates, under it because those are
    categorized elsewhere under "tableware"
  • Lacks relations other than ISA. Thesaurus vs
    dictionary.
  • Snowball made-of snow
  • Italian resident-of Italy
  • Cluttered with obscure words and word senses
  • Spoon as a type of golf club
  • Create our own dictionary to address these
    problems

23
Semantic Representation for John said that the
blue cat was on the table.
  • 1. Object mr-happy (John)
  • 2. Object cat-vp39798 (cat)
  • 3. Object table-vp6204 (table)
  • 4. Action say
  • subject ltelement 1gt
  • direct-object ltelements 2,3,5,6gt
  • tense PAST
  • 5. Attribute blue
  • object ltelement 2gt
  • 6. Spatial-Relation on
  • figure ltelement 2gt
  • ground ltelement 3gt

24
Anaphora resolution The duck is in the sea. It
is upside down. The sea is shiny and transparent.
The ground is invisible. The apple is 3 inches
below the duck. It is in front of the duck. The
yellow illuminator is 3 feet above the apple. The
cyan illuminator is 6 inches to the left of it.
The magenta illuminator is 6 inches to the right
of it. It is partly cloudy.
25
Indexical Reference Three dogs are on the table.
The first dog is blue. The first dog is 5 feet
tall. The second dog is red. The third dog is
purple.
26
Interpretation
  • Interpret semantic representation
  • Object selection
  • Resolve semantic relations/properties based on
    object types
  • Answer Who? What? When? Where? How?
  • Disambiguate/normalize relations and actions
  • Identify and resolve references to implicit
    objects

27
Object Selection When object is missing or
doesn't exist . . .
28
Object attribute interpretation (modify versus
selection)
29
Semantic Interpretation of Of
30
Implicit objects references
  • Mary rode by the store. Her motorcycle was red.
  • Verb resolution Identify implicit vehicle
  • Functional properties of objects
  • Reference
  • Motorcycle matches the vehicle
  • Her matches with Mary

31
Implicit Reference Mary rode by the store. Her
motorcycle was red.
32
Depiction
  • 3D object and image database
  • Graphical constraints
  • Spatial relations
  • Attributes
  • Posing
  • Shape/Topology changes
  • Depiction process

33
3D Object Database
  • 2,000 3D polygonal objects
  • Augmented with
  • Spatial tags (top surface, base, cup, push
    handle, wall, stem, enclosure)
  • Skeletons
  • Default size, orientation
  • Functional properties (vehicle, weapon . . .)
  • Placement/attribute conventions

34
2000 3D Objects
35
10,000 images and textures
36
3D Objects and Images tagged with semantic info
  • Spatial tags for 3D object regions
  • Object type (e.g. WordNet synset)
  • Is-a
  • represents
  • Object size
  • Object orientation (front, preferred supporting
    surface -- wall/top)
  • Compound object consituents
  • Other object properties (style, parts, etc.)

37
Spatial Tags
38
Spatial Tags
39
Spatial Tags
40
Spatial Tags
41
Stem in Cup The daisy is in the test tube.
42
Enclosure and top surface The bird is in the
bird cage. The bird cage is on the chair.
43
Spatial Relations
  • Relative positions
  • On, under, in, below, off, onto, over, above . .
    .
  • Distance
  • Sub-region positioning
  • Left, middle, corner,right, center, top, front,
    back
  • Orientation
  • facing (object, left, right, front, back, east,
    west . . .)
  • Time-of-day relations

44
Vertical vs Horizontal on, distances,
directions The couch is against the wood wall.
The window is on the wall. The window is next to
the couch. the door is 2 feet to the right of the
window. the man is next to the couch. The animal
wall is to the right of the wood wall. The animal
wall is in front of the wood wall. The animal
wall is facing left. The walls are on the huge
floor. The zebra skin coffee table is two feet in
front of the couch. The lamp is on the table. The
floor is shiny.
45
Attributes
  • Size
  • height, width, depth
  • Aspect ratio (flat, wide, thin . . .)
  • Surface attributes
  • Texture database
  • Color, Texture, Opacity, reflectivity
  • Applied to objects or textures themselves
  • Brightness (for lights)

46
Attributes The orange battleship is on the brick
cow. The battleship is 3 feet long.
47
Time of day cloudiness
48
Time of day lighting
49
Poses (original version only -- not yet
implemented in web version)
  • Represent actions
  • Database of 500 human poses
  • Grips
  • Usage (specialized/generic)
  • Standalone
  • Merge poses (upper/lower body, hands)
  • Gives wide variety by mixnmatch
  • Dynamic posing/IK

50
Poses
51
Poses
52
Combined poses Mary rides the bicycle. She plays
the trumpet.
53
The Broadway Boogie Woogie vase is on the Richard
Sproat coffee table. The table is in front of the
brick wall. The van Gogh picture is on the wall.
The Matisse sofa is next to the table. Mary is
sitting on the sofa. She is playing the violin.
She is wearing a straw hat.
54
Mary pushes the lawn mower. The lawnmower is 5
feet tall. The cat is 5 feet behind Mary. The cat
is 10 feet tall.
Dynamically defined poses using Inverse
Kinematics (IK)
55
Shape Changes (not implemented in web version)
  • Deformations
  • Facial expressions
  • Happy, angry, sad, confused . . . mixtures
  • Combined with poses
  • Topological changes
  • Slicing

56
Facial Expressions
57
The rose is in the vase. The vase is on the half
dog.
58
Depiction Process
  • Given a semantic representation
  • Generate graphical constraints
  • Handle implicit and conflicting constraints.
  • Generate 3d scene from constraints
  • Add environment, lights, camera
  • Render scene

59
Example Generate constraints for kick
  • Case1 No path or recipient Direct object is
    large
  • Pose Actor in kick pose
  • Position Actor directly behind direct object
  • Orientation Actor facing direct object
  • Case2 No path or recipient Direct object is
    small
  • Pose Actor in kick pose
  • Position Direct object above foot
  • Case3 Path and Recipient
  • Poserelations . . . (some tentative)

60
Some varieties of kick
Case1 John kicked the pickup truck
Case3 John kicked the ball to the cat on the
skateboard
Case2John kicked the football
61
Implicit Constraint. The vase is on the
nightstand. The lamp is next to the vase.
62
Figurative Metaphorical Depiction
  • Textualization
  • Conventional Icons and emblems
  • Literalization
  • Characterization
  • Personification
  • Functionalization

63
Textualization The cat is facing the wall.
64
Conventional Icons The blue daisy is not in the
army boot.
65
Literalization Life is a bowl of cherries.
66
Characterization The policeman ran by the
parking meter
67
Functionalization The hippo flies over the church
68
Future/Ongoing Work
  • Build/use scenario-based lexical resource
  • Word knowledge (dictionary)
  • Frame knowledge
  • For verbs and event nouns
  • Finer-grained representation of prepositions and
    spatial relations
  • Contextual knowledge
  • Default verb arguments
  • Default constituents and spatial relations in
    settings/environments
  • Decompose actions into poses and spatial
    relations
  • Learn contextual knowledge from corpora
  • Graphics/output support
  • Add dynamic posing of characters to depict
    actions
  • Handle more complex, natural text
  • Handle object parts
  • Add more 2D/3D content (including user uploadable
    3D objects)
  • Physics, animation, sound, and speech

69
FrameNet Digital lexical resource
http//framenet.icsi.berkeley.edu/
  • 947 hierarchically defined frames
  • 10,000 lexical entries (Verbs, nouns, adjectives)
  • Relations between frame (perspective-on,
    subframe, using, )
  • Annotated sentences for each lexical unit

70
Lexical Units in Revenge Frame
71
Frame elements for avenge.v
72
Annotations for avenge.v
73
Relations between frames
74
Frame element mappings between frames
  • Core vs Peripheral
  • Inheritance
  • Renaming (eg. agent -gt helper)

75
Valence patterns for verb sell (commerce_sell
frame) and two related frames
  • ltLU-2986 "sell.v" Commerce_sellgt patterns (33
    ((Seller Ext) (Goods Obj))) (11 ((Goods Ext)))
    (7 ((Seller Ext) (Goods Obj) (Buyer Dep(to))))
    (4 ((Seller Ext))) (2 ((Goods Ext) (Buyer
    Dep(to))))
  • ltframe Commerce_buygt patterns (91 ((Buyer
    Ext) (Goods Obj))) (27 ((Buyer Ext) (Goods Obj)
    (Seller Dep(from)))) (11 ((Buyer Ext))) (2
    ((Buyer Ext) (Goods Obj) (Seller Dep(at)))) (2
    ((Buyer Ext) (Seller Dep(from)))) (2 ((Goods
    Obj)))
  • ltframe Expensivenessgt patterns (17 ((Goods
    Ext) (Money Dep(NP)))) (8 ((Goods Ext))) (4
    ((Goods Ext) (Money Dep(between)))) (4 ((Goods
    Ext) (Money Dep(from)))) (2 ((Goods Ext) (Money
    Dep(under)))) (1 ((Goods Ext) (Money
    Dep(just)))) (1 ((Goods Ext) (Money Dep(NP))
    (Seller Dep(from))))

76
Parsing and generating semantic relations using
FrameNet
  • NLPgt (interpret-sentence "the boys on the beach
    said that the fish swam to island)Parse(S
    (NP (NP (DT "the") (NN2 (NNS "boys"))) (PREPP
    (PREPP (IN "on") (NP (DT "the") (NN2 (NN
    "beach")))))) (VP (VP1 (VERB (VBD "said")))
    (COMP "that") (S (NP (DT "the") (NN2 (NN
    "fish"))) (VP (VP1 (VERB (VBD "swam")))
    (PREPP (PREPP (TO "to") (NP (NN2 (NN
    "island"))))))))) Word Dependency((ltnoun
    "boy" (Plural) ID18gt (DEP ltprep "on"
    ID19gt)) (ltprep "on" ID19gt (DEP ltnoun
    "beach" ID21gt)) (ltverb "said" ID22gt
    (SUBJECT ltnoun "boy" (Plural) ID18gt)
    (DIRECT-OBJECT ltverb "swam" ID26gt))
    (ltverb "swam" ID26gt (SUBJECT ltnoun "fish"
    ID25gt) (DEP ltprep "to" ID27gt))
    (ltprep "to" ID27gt (DEP ltnoun "island"
    ID28gt))) Frame Dependency((ltrelation
    CN-SPATIAL-RELATION-ON ID19gt (FIGURE
    ltnoun "boy" (Plural) ID18gt) (GROUND
    ltnoun "beach" ID21gt)) (ltaction "say.v"
    ID22gt (ltframe-element "Text" ID29gt
    ltaction "swim.v" ID26gt) (ltframe-element
    "Author" ID30gt ltnoun "boy" (Plural) ID18gt))
    (ltaction "swim.v" ID26gt (ltframe-element
    "Self_mover" ID31gt ltnoun "fish" ID25gt)
    (ltframe-element ("Goal") ID32gt ltprep "to"
    ID27gt)) (ltprep "to" ID27gt (DEP ltnoun
    "island" ID28gt)))

77
Acquiring contextual knowledge
  • Where does eating breakfast take place?
  • Inferring the environment in a text-to-scene
    conversion system. K-CAP 2001 Richard Sproat
  • Default locations and spatial relations (by Gino
    Miceli)
  • Project Gutenberg corpus of online English prose
    (http//www.gutenberg.org/),
  • Use seed-object pairs to extract other pairs with
    equivalent spatial relations (e.g. cups are
    (typically) on tables, while books are on desks).
  • Leverage verb/preposition semantics as well as
    simple syntactic structure to identify spatial
    templates based on verb/preposition,particle
    plus intervening modifiers.

78
Pragmatic Ambiguity The lamp is next to the vase
on the nightstand . . .
79
Syntactic Ambiguity Prepositional phrase
attachment
John looks at the cat on the skateboard.
80
Potential Applications
  • Online communications Electronic postcards,
    visual chat/IM, social networks
  • Gaming, virtual environments
  • Storytelling/comic books/art
  • Education (ESL, reading, disabled learning,
    graphics arts)
  • Graphics authoring/prototyping tool
  • Visual summarization and/or translation of text
  • Embedded in toys

81
Storytelling The stagecoach is in front of the
old west hotel. Mary is next to the stagecoach.
She plays the guitar. Edward exercises in front
of the stagecoach. The large sunflower is to the
left of the stagecoach.
82
Scenes within scenes . . .
83
Greeting Cards
84
1st grade homework The duck sat on a hen the
hen sat on a pig...
85
Conclusion
  • New approach to scene generation
  • Low overhead (skill, training . . .)
  • Immediacy
  • Usable with minimal hardware text or speech
    input device and display screen.
  • Work is ongoing
  • Available as experimental web service

86
Related Work
  • Adorni, Di Manzo, Giunchiglia, 1984
  • Put Clay and Wilhelms, 1996
  • PAR Badler et al., 2000
  • CarSim Dupuy et al., 2000
  • SHRDLU Winograd, 1972

87
Bloopers John said the cat is on the table
88
Bloopers Mary says the cat is blue.
89
Bloopers John wears the axe. He plays the violin.
90
Bloopers Happy John holds the red shark
91
Bloopers Jack carried the television
92
Web Interface - Entry Page (www.wordseye.com)
  • Registration
  • Login
  • Learn more
  • Example pictures

93
Web Interface - Public Gallery
94
Web Interface - Add Comments to Picture
95
Web Interface - Link Pictures into Stories Games
96
The tall granite mountain range is 300 feet
wide. The enormous umbrella is on the mountain
range. The gray elephant is under the
umbrella. The chicken cube is 6 feet to the right
of the gray elephant. The cube is 5 feet tall.
The cube is on the mountain range. A clown is on
the elephant. The large sewing machine is on the
cube. A die is on the clown. It is 3 feet tall.
97
(No Transcript)
98
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com