Natural Language driven Image Generation - PowerPoint PPT Presentation

Loading...

PPT – Natural Language driven Image Generation PowerPoint presentation | free to download - id: 6ec873-NzY5N



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Natural Language driven Image Generation

Description:

Natural Language driven Image Generation Prepared by: Shreya Agarwal Guide: Mrs. Nirali Nanavati Introduction Natural Language driven Image Generation, as the name ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 47
Provided by: Shr112
Learn more at: http://shreyagarwal.blog.com
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Natural Language driven Image Generation


1
Natural Language driven Image Generation
  • Prepared by Shreya Agarwal
  • Guide Mrs. Nirali Nanavati

2
Introduction
  • Natural Language driven Image Generation, as the
    name suggests, refers to the task of mapping a
    natural language text to a scene.
  • The general processes involved in achieving this
    task are
  • Natural Language Understanding
  • Image Retrieval and Positioning

3
Natural Language Understanding
  • Natural Languages are those used by humans to
    communicate with each other on a daily basis.
    Example English
  • Computers cannot understand Natural Language
    unless it is parsed and represented in a
    predefined template-like form.

4
Image Retrieval and Positioning
  • This part of the process involves retrieving
    images from the local database or the internet
    relating to the text.
  • The final task is to position the images in a
    manner such that all elements are in their
    correct places in accordance with the natural
    language text.

5
Systems and Techniques
  • NALIG (NAtural Language driven Image Generation)
    1
  • Text-to-Picture Synthesis Tool 2
  • WordsEye 3
  • Carsim 4
  • Suggested Technique

6
NALIG
  • Generated images of static scenes
  • Proposes a theory for equilibrium and stability.
  • Based on description in the form of the following
    phrase
  • ltsubjectgt ltprepositiongt ltobjectgt
  • ltreferencegt

7
NALIG Object Taxonomy and Spatial Primitives
  • Defines primitive relationships
  • Example, H_SUPPORT(a,b)
  • Attributes like FLYING, REPOSITORY, etc.
    associated with each object
  • Conditions like CANFLY are used.
  • Example, the airplane on the desert vs. the
    airplane on the runaway

8
NALIG Object Instantiation
  • All objects mentioned in natural language text
    are initialized.
  • If existence of an object depends on another one,
    it is also instantiated.
  • Such dependence is stored in relation HAS(a,b)
    which defines the strict relationship.
  • Example, branch blocking the window

9
NALIG Consistency Checking and Qualitative
Reasoning
  • Rules known as naïve statics are defined to
    check for equilibrium and stability. Example,
  • Law of gravity is checked.
  • Space conditions are checked. (Object
    Positioning) Example, The book is on the table.

10
NALIG Advantages
  • Successful for limited static scene generation
  • Checks equilibrium, space and stability
    conditions
  • Instantiates implied objects

11
NALIG Limitations
  • Works for predefined form of phrases. Not
    suitable for full-blown natural language texts
  • Fails to construct dynamic scenes
  • Low success rate for complex scenes

12
Text-to-Picture Synthesis Tool
  • The technique has the following processes
  • Selecting Keyphrases
  • Selecting Images
  • Picture Layout
  • Example

13
Selecting Keyphrases
  • Uses keyword-based text summarization
  • Keywords and Phrases extracted based on
    lexicosyntactic rules
  • Unsupervised learning approach based on TextRank
    algorithm
  • Stationary Distribution of Random walk used to
    determine relative importance of words.

14
Selecting Images
  • Two sources are used in the search for images for
    the selected keyphrases
  • Local database of images
  • Internet based image search engine
  • 15 images retrieved and image processing is done
    to find the correct image.

15
Picture Layout
  • The technique aims to convey the gist of the
    text. Hence, a good layout is characterized as
    having
  • Minimum Overlap
  • Centrality
  • Closeness
  • A Monte Carlo Randomized algorithm is used to
    solve this highly non-convex optimization problem

16
Advantages
  • Successfully conveys the gist of the natural
    language text
  • Searches for images online, thus delivering an
    output for every natural language input
  • Capable of processing complex sentences
  • Fit to represent action sequences

17
Limitations
  • Does not render a cohesive image
  • Does not work well for all inputs without a
    healthy internet connection
  • Slower than other methods as it spends time on
    generating a TextRank graph and a co-occurrence
    matrix

18
WordsEye
  • This system generates a high quality 3D image
    from a natural language description.
  • It utilizes a large database of 3D models and
    poses.

19
WordsEye Linguistic Analysis
  • Utilizes a Part-of-Speech (POS) tagger and a
    statistical parser to generate a Dependency
    Representation of the input text.
  • For Example,

20
WordsEye Linguistic Analysis
  • This Dependency Representation is then converted
    into a Semantic Representation.
  • It describes the entities in the scene and the
    relations between them.

21
WordsEye Semantic Representation
  • WordNet is used to find relations between
    different words.
  • Personal names are mapped to male/female humanoid
    bodies.
  • Spatial propositions are handled by semantic
    functions which look at the dependents and
    generate semantic representation accordingly.

22
WordsEye Depictors
  • Depictors are low-level graphical
    specifications used to specify scenes.
  • They control 3D object visibility, size,
    position, orientation, surface color and
    transparency.
  • They are also used to specify poses, control
    Inverse Kinematics (IK) and modify vertex
    displacements for facial expression.

23
WordsEye Models
  • Models are stored in the database and have the
    following associated information
  • Skeletons
  • Shape Displacements
  • Parts
  • Color Parts
  • Opacity Parts
  • Default Size
  • Functional Properties
  • Spatial Tags

24
WordsEye Prepositions denote the layout
  • If we say The daisy is in the test tube, the
    system finds the cup tag for the test tube and
    the stem tag for the daisy. Hence, it puts the
    stem into the cupped opening of the test tube.

25
WordsEye Poses
  • Poses are used to depict a character in a
    configuration which suggests a particular action
    being performed.
  • They are categorized here as
  • Standalone pose
  • Specialized Usage pose
  • Generic Usage pose
  • Grip pose
  • Bodywear pose

26
WordsEye Pose examples
  • Specialized Usage pose (Cycling)
  • Grip pose
  • (hold wine bottle)
  • Generic Usage pose
  • (throw small object)

27
WordsEye Depiction Process
  • Process to convert high level semantic
    representation into low-level depictors.
  • Consists of the following tasks
  • Convert semantic representation from the node
    structure to a list of typed semantic elements
    where all references have been resolved
  • Interpret the semantic representation
  • Assign depictors to each semantic element

28
WordsEye Depiction Process
  • Resolve implicit and conflicting constraints of
    depictors.
  • Read in the referenced 3D models
  • Apply each assigned depictor to incrementally
    build up the scene while maintaining constraints.
  • Add background environment, ground plane, lights.
  • Adjust the camera (automatically or by hand)
  • Render

29
WordsEye Depiction Rules
  • Many constraints and conditions are applied so as
    to generate a coherent scene.
  • Constraints are explicit and implicit.
  • Sentences which cannot be depicted are handled by
    using one of Textualization, Emblematization,
    Characterization, Conventional Icons or
    Literalization.

30
WordsEye Advantages
  • Generates high quality 3D models
  • Ability to read poses and grips, constraints and
    use of IK makes the picture coherent.
  • Depiction rules help in mapping linguistically
    analyzed text to exact depictors.
  • Semantic representation lets the depiction
    process truly understand what is being conveyed.

31
WordsEye Limitations
  • Works on high quality 3D models, hence, required
    a lot of memory and fast searching algorithm.
  • Because of its restriction to its own database,
    the system does not guarantee an output for all
    natural language text inputs.

32
Carsim
  • Developed to convert text descriptions of road
    accidents into 3D scenes
  • 2-tier architecture communicating with a formal
    representation of the accident.

33
Carsim Formalism
  • The tabular structure generated after parsing the
    natural language text has the following
    information
  • Location of accident and configuration of roads
  • List of road objects
  • Event chains for object and movements
  • Collision description

34
Carsim Information Extraction Module
  • Utilizes tokenizing, part-of-speech tagging,
    splitting into sentences, detecting noun groups,
    named entities, non-recursive clauses and
    domain-specific multiwords for
  • Detecting the participants
  • Marking the events
  • Detecting the roads

35
Carsim Scene Synthesis and Visualization
  • The previously generated template is taken as
    input.
  • Rule-based modules are used to check consistency
    of the scene.
  • A planner is used to generate vehicle
    trajectories.
  • A temporal module is used to assign time
    intervals to all segments of these trajectories

36
Suggested Technique
  • This technique is a hybrid of the techniques we
    have seen so far along with a few additions.
  • It is a theoretical technique and has not been
    implemented yet.

37
Natural Language Understanding
  • Words of interest will be categorized into the
    following groups using a part-of-speech (POS)
    tagger and a named entity recognizer (NER).
  • OBJECT
  • STATE
  • SIZE
  • RELATIVITY

38
The template and the co-relation matrix
  • A co-relation matrix specifies position of each
    object in the scene with respect to every other
    object.
  • The template for each object in the list of
    objects to be instantiated contains the following
    information.
  • Size
  • Co-ordinates
  • Image Location

39
Image Selection Module
  • This module finds images using two sources
  • Internal database of images
  • Internet based image search engines
  • First 10 images are retrieved
  • Image processing is used to find the correct
    image
  • This image is stored in the database for future
    use

40
Position Determiner and Synthesis Module
  • The Position Determiner computes the co-ordinates
    of each and every image that is to be placed
    based on the input template (which has the image
    size and location paths).
  • The synthesis module resizes all images and
    places them at the co-ordinates in the template
    (supplied by the position determiner module)

41
Introducing Machine Learning
  • The aim is to finally make a computer think like
    a human. We can greatly enhance our system by
    using the techniques of Machine Learning.
  • The system can be made to learn the objects
    through unsupervised learning (clustering).
  • The system can be feedback controlled and let
    the user point out meanings of terms (SIZE,
    RELATIVITY, STATE) not previously known.

42
Advantages
  • Linguistic analysis is efficient since there is
    no statistical/rule-based parser being used.
  • Searching for images on the internet ascertains
    that an image is generated for every natural
    language input.
  • Introducing machine learning makes the system
    coachable (also, user feedback and instant
    adaptation)

43
Limitations
  • It might not generate coherent images for complex
    sentences since we do not make use of an advanced
    NLU technique.
  • It depends on internet availability for finding
    images not within its local database.

44
Summary
  • All the methods that have been developed till
    date for tackling the problem have been
    explained.
  • A technique based on some additions and the
    positives of the existing techniques has been
    specified.
  • Lot of research is still required to make a
    computer achieve this task as simply as a human
    brain does.

45
References
  • 1 ACL 1984 Proceedings of the 10th
    International Conference on Computational
    Linguistics, Natural Language driven Image
    Generation, Giovanni Adorni, Mauro Di Manzo,
    Fausto Giunchiglia, University of Geneo
  • 2 A Text-to-Picture Synthesis System for
    Augmenting Communication, Xiaojin Zhu, Andrew B.
    Goldberg, Mohamed Eldawy, Charles R. Dyer,
    Bradley Strock, University of Wisconsin, Madison
  • 3 Proceedings of the 28th annual conference on
    Computer Graphics and interactive techniques
    2001, WordsEye An Automatic Text-to-Scene
    Conversion System, Bob Coyne, Richard Sproat,
    ATT Labs (Research)
  • 4 Converting Texts of Road Accidents into 3D
    Scenes, Richard Johansson, David Williams, Pierre
    Nugues, 2004 TextMean Proceedings

46
Thank You! ?
About PowerShow.com