Title: Whats in a link between an image and a text
1Whats in a link between an image and a text?
- Dr Andrew Salway
- FINAL VERSION
- ILASH Seminar, 3 May 2002
2Overview
- Representing the semantic content of aesthetic
and scientific visual artefacts in digital
libraries - Extracting information from collateral texts
produced by experts who analyse visual
information - Integrating multimedia information in
computational systems
3Scientific and Aesthetic Disciplines
Experts Analysing Visual Information
Multimedia Databases
Image and Video Data
Collateral Text Corpus
Image and Video Processing
Natural Language Processing
Machine-level representation of complex visual
artefacts
Content Technologies
Query-retrieval, browsing, summarisation
Information Access in Digital Libraries
4Research Questions
- How do experts put images into words?
- Special Languages
- Selection and organisation of information
- How to instantiate the link between visual and
textual information in multimedia computing
systems?
5Analysing Aesthetic Images
- Fine Art (Panofsky 1939)
- pre-iconographic iconographic iconological
- Film (Metz 1974)
- physical cinematic diegetic connotative
sub-textual - Dance (Adshead-Lansdale 1988)
- description of movements discernment of form
interpretation evaluation
6Fine Art collateral texts
- Painting and caption, other texts
Turner, Joseph Mallord William 1775-1851The
Goddess of Discord Choosing the Apple of
Contention in the Garden of the Hesperides
(exhibited 1806) Discord chooses the apple that
will eventually be awarded by Paris to the
goddess Aphrodite, leading to the Trojan War.
First seen in the British Institution, this
picture was shown again in Turners Gallery in
1808 when its classical grandeur, based on the
work of Nicolas Poussin, must have formed a
striking contrast to the English pastoral
landscapes also shown that year. Its own
background is based on Turners experience of the
Alps in 1802.
7Fine Art corpus analysis
- Corpus
- 804,939 words from Tate WWW-site (691,121 from
painting captions and 113,818 from artist
biographies) - Specialist terminology extracted, e.g.
art movements (surrealism, cubism, pop art,
abstract expressionism), techniques (brushwork,
mezzotint), types of work (watercolour, pencil
sketch), features of a painting (monochromatic,
naturalistic) - Contrast between abstract art (colour, form,
technique) and figurative art (mood, feeling,
representation)
8Cues for content
- depict (295 occurrences) and convey (119
occurrences) - this painting depicts a glass, two pears and a
box - this work depicts a group struggling in a wind
- this composition conveys the claustrophobia of
the interior of an omnibus - an expressive use of colour and shape to convey
the subjects mood - depict / convey (in earlier 305,913 word corpus)
- pre-iconograpical (56 / 0)
- iconographical (41 / 9)
- iconological (3 / 91)
9Cues to chart the history of art
- influence (403 occurrences) and inspire (442
occurrences) about 80 passive - his paintings of the Thames were influenced by
Whistler - where he was influenced by expressionism
- this picture was inspired by a performance of
Shakespeares play Macbeth - Severini was inspired by modern machinery
- 50 ARTIST influenced by ARTIST
- 22 ARTIST influenced by MOVEMENT
- 31 WORK inspired by PERSON / ENVIRONMENT / WORLD
/ WORK
10Crime Scenes
- I can see what appears to be a male laying in the
prone position on the floor. - He is wearing a maroon striped shirt with white
collar and cuffs, blue jeans, and has a pair of
left and right training shoes which have become
slightly dis-extended from the foot. - There appears to be a green tie down by his right
hand and I can see a possible footwear impression
in blood on his right hand. - Surrounding the body there are droplets of blood,
footwear impression in blood and several pieces
of broken glass and bottles.
11Analysing moving images Dance
Swan Lake, Matthew Bourne (1995).
12Text Types and their Relationship to the Moving
Image
13Text Types and their Relationship to the Moving
Image
14Eliciting spoken commentaries
- Five dance experts each asked to Describe then
to Interpret five dance sequences as they
watched them (20 minutes in total) ? - 11,300 words of description
- 9,754 words of interpretation
- Appears to be systematic contrasts between
description and interpretation -
- Some resonance with literature on Protocol
Analysis (Ericsson and Simon 1993) and studies of
language production like Chafe (1980).
15(No Transcript)
16Descriptions
- Utterances
- Single words in rapid sequence to identify
movements - Spatio-temporal details and relationships between
dancers - Most frequent open-class words referred literally
to dancers, their movements and space woman,
arm, leg, turn, jump, spin, arabesque, pirouette,
left, right - Descriptions clustered on a Kohonen Map according
to both dance and to expert - Cohesion by reference and lexical cohesion
potentially useful for dance segmentation
17Interpretations
- Most frequent open-class words referred
non-literally to dancers, their movements and
themes of the dance swan, prince, wing, flight,
ethereality - Longer utterances either referring to larger
video intervals, or linking literal descriptions
to interpreted meaning, conjoined by seems, as
if, like, a sense of, suggest, appears to be - The stretching of the neck, like a swan
- Aerial steps which could suggest flight
- Moving faster as if something is driving him
18KAB Knowledge-rich video Annotation and Browsing
19KAB Knowledge-rich video Annotation and Browsing
- Keyword indexing of video data using time-coded
commentaries annotations can be layered - User can query for intervals or browse between
moving images and texts and can add own term
lists and annotations - OO design implemented in Java with JMF
- Limitations
- (i) only one kind of collateral text
- (ii) temporal association of text and video
interval - (iii) keyword-based representation of content.
20(No Transcript)
21(No Transcript)
22Audio Description
- Audio description enhances TV and films for
visually impaired viewers and is scripted before
it is recorded in effect the story told be the
moving image is retold in words - Describers follow guidelines which restricts the
language they use, i.e. normally the present
tense, simple sentences and few pronominal
references - Describers are encouraged not to make inferences
on behalf of their audience (i.e. little / no
interpretation) - We are interested in applying information
extraction technology to generate machine-level
representations of video content from audio
description scripts - TIWO (Television in Words), EPSRC GR/R67194/01
23Audio Description Script
- 11.43 Hanna passes Jan some banknotes.
- 11.55 Laughing, Jan falls back into her seat as
the jeep overtakes the line of the lorries. - 12.01 An explosion on the road ahead.
- 12.08 The jeep has hit a mine.
- 12.09 Hanna jumps from the lorry.
- 12.20 Desperately she runs towards the mangled
jeep. - 12.27 Soldiers try to stop her.
- 12.31 She struggles with the soldier who grabs
hold of her firmly. - 12.35 He lifts her bodily from the ground,
holding her tightly in his arms. - (NB. Some cue information removed)
24Corpus Analysis
- Audio Description Corpus
- 70,856 words (12 movies, various genres)
- Temporal Information, maybe use to
- align fragment with interval (aspectual verbs)
- recover event-event relations (simultaneity and
cause, using as) - recover time period(s) of film dates /
(?costumes and props?) - Other kinds of information
- 50 most frequent verbs 84 material processes
25Exploiting Collateral Text
- Experts in a number of fields (both aesthetic and
scientific) appear to use special languages to
articulate the semantic content of visual
information - Text types select and organise information about
still and moving images differently - A distinction between Description and
Interpretation, which is apparent in theoretical
frameworks, seems to be realised in experts
texts - The use of collateral text in digital libraries
requires the integration of multimedia
information
26Integrating Multimedia Information WHY?
- Functionality for Digital Libraries
- synchronised presentations
- hypermedia browsing
- multimedia corpora
- cross-modal IR
- information fusion
- multimedia summarisation
- information conversion.
27Integrating Multimedia Information HOW?
- Image/Video-text data models
- Statistical Image and Text Features
- Multimedia Thesaurus
- Intermediate Representations
- ? The development of such systems may benefit
from a computational framework in which to
instantiate the image-text link ?
28Issues for Instantiating the Image-Text Link
JOURNAL
TEXTBOOK
BIOGRAPHY
EXHIBITION CATALOGUE
CAPTION
29Issues for Instantiating the Image-Text Link
- Text fragments and image regions
a young mother is strolling with her little girl
dressed in white with a salmon-coloured sash at
the extreme right, appears a scandalously
hieratic-looking couple
30Issues for Instantiating the Image-Text Link
- Text fragments and video intervals
351 four dancers stand in a ring 355 a female
dancer enters the ring
31Issues for Instantiating the Image-Text Link
- Choice of content representation scheme
- Keywords
- Propositions
- Spatio-Temporal Logics
- Causal Relationships
- Mental States, e.g. for films
- Mapping between levels of meaning, e.g. for
interpretations - Maintaining multiple viewpoints
32Issues for Instantiating the Image-Text Link
- Typed Links?
- Intuitively an image may, for example, illustrate
or a text, whilst a text may describe or explain
an image - May be a case for explicating the meaning of
image-text links and considering precedents from
the study of semantic networks (structural vs.
assertional links) and from the development of
hypertext systems (taxonomy of link types)
33Closing Remarks
- Great potential to exploit collateral text in
specialist digital libraries - The development of multimedia information systems
may benefit from a theoretical framework for
instantiating the image-text link - Potential for synergy between different
disciplines concerned with this link
computational systems may help in understanding
the relationship between vision, language and
knowledge
34Whats in a link between an image and a text?
- Dr Andrew Salway
- Department of Computing, University of Surrey
- ILASH Seminar, 3 May 2002
35(No Transcript)
36Semiotic-based Frameworks for Multimedia
- Gonzalez
- Purchase
- Warner..
37The Family of Images (Mitchell 1986, Iconology
image, text and ideology. Chicago Uni. Press)
Image likeness, resemblance, similitude
Optical mirrors projections
Graphic pictures statues designs
Perceptual sense data species appearances
Mental dreams memories ideas fantasmata
Verbal metaphors descriptions
38Vision and Language
39(No Transcript)