Whats in a link between an image and a text - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Whats in a link between an image and a text

Description:

Fine Art: corpus analysis. Corpus: ... e.g.: art movements (surrealism, cubism, pop art, abstract expressionism) ... Cues to chart the history of art ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 40

Provided by: css4

Category:

more less

Transcript and Presenter's Notes

Title: Whats in a link between an image and a text

1
Whats in a link between an image and a text?

Dr Andrew Salway
FINAL VERSION
ILASH Seminar, 3 May 2002

2
Overview

Representing the semantic content of aesthetic
and scientific visual artefacts in digital
libraries
Extracting information from collateral texts
produced by experts who analyse visual
information
Integrating multimedia information in
computational systems

3
Scientific and Aesthetic Disciplines
Experts Analysing Visual Information
Multimedia Databases
Image and Video Data
Collateral Text Corpus
Image and Video Processing
Natural Language Processing
Machine-level representation of complex visual
artefacts
Content Technologies
Query-retrieval, browsing, summarisation
Information Access in Digital Libraries
4
Research Questions

How do experts put images into words?
Special Languages
Selection and organisation of information
How to instantiate the link between visual and
textual information in multimedia computing
systems?

5
Analysing Aesthetic Images

Fine Art (Panofsky 1939)
pre-iconographic iconographic iconological
Film (Metz 1974)
physical cinematic diegetic connotative
sub-textual
Dance (Adshead-Lansdale 1988)
description of movements discernment of form
interpretation evaluation

6
Fine Art collateral texts

Painting and caption, other texts

Turner, Joseph Mallord William 1775-1851The
Goddess of Discord Choosing the Apple of
Contention in the Garden of the Hesperides
(exhibited 1806) Discord chooses the apple that
will eventually be awarded by Paris to the
goddess Aphrodite, leading to the Trojan War.
First seen in the British Institution, this
picture was shown again in Turners Gallery in
1808 when its classical grandeur, based on the
work of Nicolas Poussin, must have formed a
striking contrast to the English pastoral
landscapes also shown that year. Its own
background is based on Turners experience of the
Alps in 1802.
7
Fine Art corpus analysis

Corpus
804,939 words from Tate WWW-site (691,121 from
painting captions and 113,818 from artist
biographies)
Specialist terminology extracted, e.g.
art movements (surrealism, cubism, pop art,
abstract expressionism), techniques (brushwork,
mezzotint), types of work (watercolour, pencil
sketch), features of a painting (monochromatic,
naturalistic)
Contrast between abstract art (colour, form,
technique) and figurative art (mood, feeling,
representation)

8
Cues for content

depict (295 occurrences) and convey (119
occurrences)
this painting depicts a glass, two pears and a
box
this work depicts a group struggling in a wind
this composition conveys the claustrophobia of
the interior of an omnibus
an expressive use of colour and shape to convey
the subjects mood
depict / convey (in earlier 305,913 word corpus)
pre-iconograpical (56 / 0)
iconographical (41 / 9)
iconological (3 / 91)

9
Cues to chart the history of art

influence (403 occurrences) and inspire (442
occurrences) about 80 passive
his paintings of the Thames were influenced by
Whistler
where he was influenced by expressionism
this picture was inspired by a performance of
Shakespeares play Macbeth
Severini was inspired by modern machinery
50 ARTIST influenced by ARTIST
22 ARTIST influenced by MOVEMENT
31 WORK inspired by PERSON / ENVIRONMENT / WORLD
/ WORK

10
Crime Scenes

I can see what appears to be a male laying in the
prone position on the floor.
He is wearing a maroon striped shirt with white
collar and cuffs, blue jeans, and has a pair of
left and right training shoes which have become
slightly dis-extended from the foot.
There appears to be a green tie down by his right
hand and I can see a possible footwear impression
in blood on his right hand.
Surrounding the body there are droplets of blood,
footwear impression in blood and several pieces
of broken glass and bottles.

11
Analysing moving images Dance
Swan Lake, Matthew Bourne (1995).
12
Text Types and their Relationship to the Moving
Image
13
Text Types and their Relationship to the Moving
Image
14
Eliciting spoken commentaries

Five dance experts each asked to Describe then
to Interpret five dance sequences as they
watched them (20 minutes in total) ?
11,300 words of description
9,754 words of interpretation
Appears to be systematic contrasts between
description and interpretation
Some resonance with literature on Protocol
Analysis (Ericsson and Simon 1993) and studies of
language production like Chafe (1980).

15
(No Transcript)
16
Descriptions

Utterances
Single words in rapid sequence to identify
movements
Spatio-temporal details and relationships between
dancers
Most frequent open-class words referred literally
to dancers, their movements and space woman,
arm, leg, turn, jump, spin, arabesque, pirouette,
left, right
Descriptions clustered on a Kohonen Map according
to both dance and to expert
Cohesion by reference and lexical cohesion
potentially useful for dance segmentation

17
Interpretations

Most frequent open-class words referred
non-literally to dancers, their movements and
themes of the dance swan, prince, wing, flight,
ethereality
Longer utterances either referring to larger
video intervals, or linking literal descriptions
to interpreted meaning, conjoined by seems, as
if, like, a sense of, suggest, appears to be
The stretching of the neck, like a swan
Aerial steps which could suggest flight
Moving faster as if something is driving him

18
KAB Knowledge-rich video Annotation and Browsing
19
KAB Knowledge-rich video Annotation and Browsing

Keyword indexing of video data using time-coded
commentaries annotations can be layered
User can query for intervals or browse between
moving images and texts and can add own term
lists and annotations
OO design implemented in Java with JMF
Limitations
(i) only one kind of collateral text
(ii) temporal association of text and video
interval
(iii) keyword-based representation of content.

20
(No Transcript)
21
(No Transcript)
22
Audio Description

Audio description enhances TV and films for
visually impaired viewers and is scripted before
it is recorded in effect the story told be the
moving image is retold in words
Describers follow guidelines which restricts the
language they use, i.e. normally the present
tense, simple sentences and few pronominal
references
Describers are encouraged not to make inferences
on behalf of their audience (i.e. little / no
interpretation)
We are interested in applying information
extraction technology to generate machine-level
representations of video content from audio
description scripts
TIWO (Television in Words), EPSRC GR/R67194/01

23
Audio Description Script

11.43 Hanna passes Jan some banknotes.
11.55 Laughing, Jan falls back into her seat as
the jeep overtakes the line of the lorries.
12.01 An explosion on the road ahead.
12.08 The jeep has hit a mine.
12.09 Hanna jumps from the lorry.
12.20 Desperately she runs towards the mangled
jeep.
12.27 Soldiers try to stop her.
12.31 She struggles with the soldier who grabs
hold of her firmly.
12.35 He lifts her bodily from the ground,
holding her tightly in his arms.
(NB. Some cue information removed)

24
Corpus Analysis

Audio Description Corpus
70,856 words (12 movies, various genres)
Temporal Information, maybe use to
align fragment with interval (aspectual verbs)
recover event-event relations (simultaneity and
cause, using as)
recover time period(s) of film dates /
(?costumes and props?)
Other kinds of information
50 most frequent verbs 84 material processes

25
Exploiting Collateral Text

Experts in a number of fields (both aesthetic and
scientific) appear to use special languages to
articulate the semantic content of visual
information
Text types select and organise information about
still and moving images differently
A distinction between Description and
Interpretation, which is apparent in theoretical
frameworks, seems to be realised in experts
texts
The use of collateral text in digital libraries
requires the integration of multimedia
information

26
Integrating Multimedia Information WHY?

Functionality for Digital Libraries
synchronised presentations
hypermedia browsing
multimedia corpora
cross-modal IR
information fusion
multimedia summarisation
information conversion.

27
Integrating Multimedia Information HOW?

Image/Video-text data models
Statistical Image and Text Features
Multimedia Thesaurus
Intermediate Representations
? The development of such systems may benefit
from a computational framework in which to
instantiate the image-text link ?

28
Issues for Instantiating the Image-Text Link

1many relationships

JOURNAL
TEXTBOOK
BIOGRAPHY
EXHIBITION CATALOGUE
CAPTION
29
Issues for Instantiating the Image-Text Link

Text fragments and image regions

a young mother is strolling with her little girl
dressed in white with a salmon-coloured sash at
the extreme right, appears a scandalously
hieratic-looking couple
30
Issues for Instantiating the Image-Text Link

Text fragments and video intervals

351 four dancers stand in a ring 355 a female
dancer enters the ring
31
Issues for Instantiating the Image-Text Link

Choice of content representation scheme
Keywords
Propositions
Spatio-Temporal Logics
Causal Relationships
Mental States, e.g. for films
Mapping between levels of meaning, e.g. for
interpretations
Maintaining multiple viewpoints

32
Issues for Instantiating the Image-Text Link

Typed Links?
Intuitively an image may, for example, illustrate
or a text, whilst a text may describe or explain
an image
May be a case for explicating the meaning of
image-text links and considering precedents from
the study of semantic networks (structural vs.
assertional links) and from the development of
hypertext systems (taxonomy of link types)

33
Closing Remarks

Great potential to exploit collateral text in
specialist digital libraries
The development of multimedia information systems
may benefit from a theoretical framework for
instantiating the image-text link
Potential for synergy between different
disciplines concerned with this link
computational systems may help in understanding
the relationship between vision, language and
knowledge

34
Whats in a link between an image and a text?

Dr Andrew Salway
Department of Computing, University of Surrey
ILASH Seminar, 3 May 2002

35
(No Transcript)
36
Semiotic-based Frameworks for Multimedia

Gonzalez
Purchase
Warner..

37
The Family of Images (Mitchell 1986, Iconology
image, text and ideology. Chicago Uni. Press)
Image likeness, resemblance, similitude
Optical mirrors projections
Graphic pictures statues designs
Perceptual sense data species appearances
Mental dreams memories ideas fantasmata
Verbal metaphors descriptions
38
Vision and Language
39
(No Transcript)

Write a Comment

User Comments (0)