Whats in a link between an image and a text - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Whats in a link between an image and a text

Description:

Fine Art: corpus analysis. Corpus: ... e.g.: art movements (surrealism, cubism, pop art, abstract expressionism) ... Cues to chart the history of art ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 40
Provided by: css4
Category:
Tags: between | image | link | text | whats

less

Transcript and Presenter's Notes

Title: Whats in a link between an image and a text


1
Whats in a link between an image and a text?
  • Dr Andrew Salway
  • FINAL VERSION
  • ILASH Seminar, 3 May 2002

2
Overview
  • Representing the semantic content of aesthetic
    and scientific visual artefacts in digital
    libraries
  • Extracting information from collateral texts
    produced by experts who analyse visual
    information
  • Integrating multimedia information in
    computational systems

3
Scientific and Aesthetic Disciplines
Experts Analysing Visual Information
Multimedia Databases
Image and Video Data
Collateral Text Corpus
Image and Video Processing
Natural Language Processing
Machine-level representation of complex visual
artefacts
Content Technologies
Query-retrieval, browsing, summarisation
Information Access in Digital Libraries
4
Research Questions
  • How do experts put images into words?
  • Special Languages
  • Selection and organisation of information
  • How to instantiate the link between visual and
    textual information in multimedia computing
    systems?

5
Analysing Aesthetic Images
  • Fine Art (Panofsky 1939)
  • pre-iconographic iconographic iconological
  • Film (Metz 1974)
  • physical cinematic diegetic connotative
    sub-textual
  • Dance (Adshead-Lansdale 1988)
  • description of movements discernment of form
    interpretation evaluation

6
Fine Art collateral texts
  • Painting and caption, other texts

Turner, Joseph Mallord William 1775-1851The
Goddess of Discord Choosing the Apple of
Contention in the Garden of the Hesperides
(exhibited 1806) Discord chooses the apple that
will eventually be awarded by Paris to the
goddess Aphrodite, leading to the Trojan War.
First seen in the British Institution, this
picture was shown again in Turners Gallery in
1808 when its classical grandeur, based on the
work of Nicolas Poussin, must have formed a
striking contrast to the English pastoral
landscapes also shown that year. Its own
background is based on Turners experience of the
Alps in 1802.
7
Fine Art corpus analysis
  • Corpus
  • 804,939 words from Tate WWW-site (691,121 from
    painting captions and 113,818 from artist
    biographies)
  • Specialist terminology extracted, e.g.
    art movements (surrealism, cubism, pop art,
    abstract expressionism), techniques (brushwork,
    mezzotint), types of work (watercolour, pencil
    sketch), features of a painting (monochromatic,
    naturalistic)
  • Contrast between abstract art (colour, form,
    technique) and figurative art (mood, feeling,
    representation)

8
Cues for content
  • depict (295 occurrences) and convey (119
    occurrences)
  • this painting depicts a glass, two pears and a
    box
  • this work depicts a group struggling in a wind
  • this composition conveys the claustrophobia of
    the interior of an omnibus
  • an expressive use of colour and shape to convey
    the subjects mood
  • depict / convey (in earlier 305,913 word corpus)
  • pre-iconograpical (56 / 0)
  • iconographical (41 / 9)
  • iconological (3 / 91)

9
Cues to chart the history of art
  • influence (403 occurrences) and inspire (442
    occurrences) about 80 passive
  • his paintings of the Thames were influenced by
    Whistler
  • where he was influenced by expressionism
  • this picture was inspired by a performance of
    Shakespeares play Macbeth
  • Severini was inspired by modern machinery
  • 50 ARTIST influenced by ARTIST
  • 22 ARTIST influenced by MOVEMENT
  • 31 WORK inspired by PERSON / ENVIRONMENT / WORLD
    / WORK

10
Crime Scenes
  • I can see what appears to be a male laying in the
    prone position on the floor.
  • He is wearing a maroon striped shirt with white
    collar and cuffs, blue jeans, and has a pair of
    left and right training shoes which have become
    slightly dis-extended from the foot.
  • There appears to be a green tie down by his right
    hand and I can see a possible footwear impression
    in blood on his right hand.
  • Surrounding the body there are droplets of blood,
    footwear impression in blood and several pieces
    of broken glass and bottles.

11
Analysing moving images Dance
Swan Lake, Matthew Bourne (1995).
12
Text Types and their Relationship to the Moving
Image
13
Text Types and their Relationship to the Moving
Image
14
Eliciting spoken commentaries
  • Five dance experts each asked to Describe then
    to Interpret five dance sequences as they
    watched them (20 minutes in total) ?
  • 11,300 words of description
  • 9,754 words of interpretation
  • Appears to be systematic contrasts between
    description and interpretation
  • Some resonance with literature on Protocol
    Analysis (Ericsson and Simon 1993) and studies of
    language production like Chafe (1980).

15
(No Transcript)
16
Descriptions
  • Utterances
  • Single words in rapid sequence to identify
    movements
  • Spatio-temporal details and relationships between
    dancers
  • Most frequent open-class words referred literally
    to dancers, their movements and space woman,
    arm, leg, turn, jump, spin, arabesque, pirouette,
    left, right
  • Descriptions clustered on a Kohonen Map according
    to both dance and to expert
  • Cohesion by reference and lexical cohesion
    potentially useful for dance segmentation

17
Interpretations
  • Most frequent open-class words referred
    non-literally to dancers, their movements and
    themes of the dance swan, prince, wing, flight,
    ethereality
  • Longer utterances either referring to larger
    video intervals, or linking literal descriptions
    to interpreted meaning, conjoined by seems, as
    if, like, a sense of, suggest, appears to be
  • The stretching of the neck, like a swan
  • Aerial steps which could suggest flight
  • Moving faster as if something is driving him

18
KAB Knowledge-rich video Annotation and Browsing
19
KAB Knowledge-rich video Annotation and Browsing
  • Keyword indexing of video data using time-coded
    commentaries annotations can be layered
  • User can query for intervals or browse between
    moving images and texts and can add own term
    lists and annotations
  • OO design implemented in Java with JMF
  • Limitations
  • (i) only one kind of collateral text
  • (ii) temporal association of text and video
    interval
  • (iii) keyword-based representation of content.

20
(No Transcript)
21
(No Transcript)
22
Audio Description
  • Audio description enhances TV and films for
    visually impaired viewers and is scripted before
    it is recorded in effect the story told be the
    moving image is retold in words
  • Describers follow guidelines which restricts the
    language they use, i.e. normally the present
    tense, simple sentences and few pronominal
    references
  • Describers are encouraged not to make inferences
    on behalf of their audience (i.e. little / no
    interpretation)
  • We are interested in applying information
    extraction technology to generate machine-level
    representations of video content from audio
    description scripts
  • TIWO (Television in Words), EPSRC GR/R67194/01

23
Audio Description Script
  • 11.43 Hanna passes Jan some banknotes.
  • 11.55 Laughing, Jan falls back into her seat as
    the jeep overtakes the line of the lorries.
  • 12.01 An explosion on the road ahead.
  • 12.08 The jeep has hit a mine.
  • 12.09 Hanna jumps from the lorry.
  • 12.20 Desperately she runs towards the mangled
    jeep.
  • 12.27 Soldiers try to stop her.
  • 12.31 She struggles with the soldier who grabs
    hold of her firmly.
  • 12.35 He lifts her bodily from the ground,
    holding her tightly in his arms.
  • (NB. Some cue information removed)

24
Corpus Analysis
  • Audio Description Corpus
  • 70,856 words (12 movies, various genres)
  • Temporal Information, maybe use to
  • align fragment with interval (aspectual verbs)
  • recover event-event relations (simultaneity and
    cause, using as)
  • recover time period(s) of film dates /
    (?costumes and props?)
  • Other kinds of information
  • 50 most frequent verbs 84 material processes

25
Exploiting Collateral Text
  • Experts in a number of fields (both aesthetic and
    scientific) appear to use special languages to
    articulate the semantic content of visual
    information
  • Text types select and organise information about
    still and moving images differently
  • A distinction between Description and
    Interpretation, which is apparent in theoretical
    frameworks, seems to be realised in experts
    texts
  • The use of collateral text in digital libraries
    requires the integration of multimedia
    information

26
Integrating Multimedia Information WHY?
  • Functionality for Digital Libraries
  • synchronised presentations
  • hypermedia browsing
  • multimedia corpora
  • cross-modal IR
  • information fusion
  • multimedia summarisation
  • information conversion.

27
Integrating Multimedia Information HOW?
  • Image/Video-text data models
  • Statistical Image and Text Features
  • Multimedia Thesaurus
  • Intermediate Representations
  • ? The development of such systems may benefit
    from a computational framework in which to
    instantiate the image-text link ?

28
Issues for Instantiating the Image-Text Link
  • 1many relationships

JOURNAL
TEXTBOOK
BIOGRAPHY
EXHIBITION CATALOGUE
CAPTION
29
Issues for Instantiating the Image-Text Link
  • Text fragments and image regions

a young mother is strolling with her little girl
dressed in white with a salmon-coloured sash at
the extreme right, appears a scandalously
hieratic-looking couple
30
Issues for Instantiating the Image-Text Link
  • Text fragments and video intervals

351 four dancers stand in a ring 355 a female
dancer enters the ring
31
Issues for Instantiating the Image-Text Link
  • Choice of content representation scheme
  • Keywords
  • Propositions
  • Spatio-Temporal Logics
  • Causal Relationships
  • Mental States, e.g. for films
  • Mapping between levels of meaning, e.g. for
    interpretations
  • Maintaining multiple viewpoints

32
Issues for Instantiating the Image-Text Link
  • Typed Links?
  • Intuitively an image may, for example, illustrate
    or a text, whilst a text may describe or explain
    an image
  • May be a case for explicating the meaning of
    image-text links and considering precedents from
    the study of semantic networks (structural vs.
    assertional links) and from the development of
    hypertext systems (taxonomy of link types)

33
Closing Remarks
  • Great potential to exploit collateral text in
    specialist digital libraries
  • The development of multimedia information systems
    may benefit from a theoretical framework for
    instantiating the image-text link
  • Potential for synergy between different
    disciplines concerned with this link
    computational systems may help in understanding
    the relationship between vision, language and
    knowledge

34
Whats in a link between an image and a text?
  • Dr Andrew Salway
  • Department of Computing, University of Surrey
  • ILASH Seminar, 3 May 2002

35
(No Transcript)
36
Semiotic-based Frameworks for Multimedia
  • Gonzalez
  • Purchase
  • Warner..

37
The Family of Images (Mitchell 1986, Iconology
image, text and ideology. Chicago Uni. Press)
Image likeness, resemblance, similitude
Optical mirrors projections
Graphic pictures statues designs
Perceptual sense data species appearances
Mental dreams memories ideas fantasmata
Verbal metaphors descriptions
38
Vision and Language
39
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com