The Multimedia Semantic Web presentation

About This Presentation

Transcript and Presenter's Notes

Title: The Multimedia Semantic Web

1
The Multimedia Semantic Web

Bill Grosky
Multimedia Information Systems Laboratory
University of Michigan-Dearborn
Dearborn, Michigan

2
Contents

Introduction
CBR Where are we?
Multimedia annotation
Context-rich environments
Semantic web
Our work
Anglograms
Finding latent semantics
Using text for improved image search
Using images for improved text search
Web page structure
A cross-modal theory of linked document
semantics

3
CBR Where are We?

Development of feature-based techniques for
content-based retrieval is a mature area, at
least for images
CBR researchers should now concentrate on
extracting semantics from multimedia documents so
that retrievals using concept-based queries can
be tailored to individual users
The semantic gap
(Semi)-automated multimedia annotation

4
Multimedia Annotation

Multimedia annotations should be semantically
rich
Multiple semantics
A social theory based on how multimedia
information is used
This can be discovered by placing multimedia
information in a natural, context-rich environment

5
Context-Rich Environments

Structural context Authors contribution
Documents author places semantically similar
pieces of information close to each other
User can cluster together semantically similar
pieces of information
Dynamic context Users contribution
Short browsing sub-paths are semantically coherent

6
Context-Rich Environments

The WEB is a perfect example of a context-rich
environment
Develop multimedia annotations through
cross-modal techniques
Audio
Images
Text
Video

7
Semantic Web

This program overlaps another very important
current research topic, the semantic web
Web page annotations are the backbone of this
research effort
We have something very important to offer to this
area
Multimedia documents
Deriving multiple semantics for a single
document
Combining our efforts will enrich both
communities

8
Semantic Web

The Semantic Web is a new initiative to
transform the web into a structure that supports
more intelligent querying and browsing, both by
machines and by humans. This transformation is to
be supported through the generation and use of
metadata constructed via web annotation tools
using user-defined ontologies that can be related
to one another.
Somewhere on the web

9
Semantic Web
End User
Ontology Articulation Toolkit
Agents
Ontology Construction Tool
Ontologies
Community Portal
?x C ? D
Inference Engine
Web-Page Annotation Tool
Annotated Web Pages
Metadata Repository
Based on www.semanticweb.org
10
Semantic Web

Plan a vacation within the next month
Bill instructed his semantic web agent through
his handheld browser.
An agent retrieved Bills vacation profile from
his travel agent, retrieved Bills availability
from his calendar, checked availability of
airlines, hotels and restaurants, and made all
the necessary arrangements.

11
Semantic Web

Multimedia semantic web
Plan a vacation close to where
is being exhibited.

12
Contents

Introduction
CBR Where are we?
Multimedia annotation
Context-rich environments
Semantic web
Our work
Anglograms
Finding latent semantics
Using text for improved image search
Using images for improved text search
Web page structure
A cross-modal theory of linked document
semantics

13
Anglograms

Image object
Entire image
Some meaningful portion of an image
semcon
Point-based features
corner points
color histograms

14
Anglograms
Point feature map for shape
15
Anglograms
Point feature map for color
16
Anglograms
Voronoi diagram of n 18 sites
17
Anglograms
18
Anglograms

Delaunay triangulation of a set of n points
O(n log n) algorithm
Invariance of Delaunay triangles of a set of
points to
translation
rotation
scaling

19
Anglograms

Spatial layout of point set
Anglogram
Computed by discretizing and counting the angles
of the Delaunay triangles
Which angles are counted?
O(max(n bins)) algorithm
What is bin size?

20
(No Transcript)
21
Anglograms

Computation of color anglogram of an image
Divide image evenly into a number of MN
non-overlapping blocks
Each individual block is abstracted as a unique
feature point labeled with its spatial location
and dominant colors

22
Anglograms

Computation of color anglogram of an image
Point feature map
Normalized feature points, after adjusting any
two neighboring feature points to a fixed
distance
Construct Delaunay triangulation for each set of
feature points labeled with identical color

23
Anglograms

Computation of color anglogram of an image
Compute anglogram based on each Delaunay
triangulation
Color anglogram for image
Concatenating all the anglograms together

24
Anglograms
Pyramid image
25
Anglograms
26
Anglograms
Hue component
27
Anglograms
Saturation component
28
Anglograms
Point feature map
29
Anglograms
Feature points of hue 2
30
Anglograms
Delaunay triangulation of hue 2
31
Anglograms
Delaunay triangulation of saturation 5
32
Anglograms
Anglogram of saturation 5
33
Contents

Introduction
CBR Where are we?
Multimedia annotation
Context-rich environments
Semantic web
Our work
Anglograms
Finding latent semantics
Using text for improved image search
Using images for improved text search
Web page structure
A cross-modal theory of linked document
semantics

34
Finding Latent Semantics

We want to transform low-level features to a
higher level of meaning
Used for dimension reduction in QBIC
Searching in high-dimensional spaces
More importantly, it creates clusters of
co-occurring features
So-called concepts

35
Finding Latent Semantics

Latent Semantic Analysis (LSA) was introduced to
overcome a fundamental problem in textual
information retrieval
Users want to retrieve on the basis of conceptual
content
Individual words provide unreliable evidence
about conceptual meanings
Synonymy
Many ways to refer to the same object
Polysemy
Most words have more than one distinct meaning

36
Finding Latent Semantics

Searching for documents concerning automobiles
Tend to use the key-word automobile
A statistical analysis determines that the
key-words automobile and car tend to co-occur
LSA will retrieve documents in which the key-word
car appears, but not the key-word automobile

37
Finding Latent Semantics

Term-document association
It is assumed that there exists some underlying
latent semantic structure in the data that is
partially obscured by the randomness of term
choice
By semantic structure we mean the correlation
structure in which individual terms appear in
documents
Semantic implies only the fact that terms in a
document may be taken as referents to the
document itself or to its topic
Statistical techniques are used to estimate this
latent semantic structure, and to get rid of
obscuring noise

38
Finding Latent Semantics

Singular-value decomposition (SVD)
Take a large matrix of term-document association
Construct a semantic space wherein terms and
documents that are closely associated are placed
near to each other
SVD allows the arrangement of space to reflect
the major associative patterns and ignore
smaller, less important influence
As a result, terms that did not actually appear
in a document may still end up close to the
document, if that is consistent with the major
patterns of association
Position in the space serves as the semantic
indexing
Retrieval proceeds by using the terms in a query
to identify a point in the semantic space, and
documents in its neighborhood are returned as
relevant results

39
Finding Latent Semantics

Term-document matrix
d documents
t terms
Represented by a t ? d term-document matrix A
Each document is represented by a column
document vector
Each term is represented by a row
term vector

40
Finding Latent Semantics
41
Finding Latent Semantics
42
Finding Latent Semantics

SVD is a dimension reduction technique
Reduced-rank approximation to both column space
and row space
Find a rank-k approximation to matrix A with
minimal change to that matrix for a given value
of k
This decomposition exists for any matrix A

43
Finding Latent Semantics

SVD of a term-document matrix A
A U ? VT
A is t ? d
U is a t ? r orthogonal matrix, where r is
rank(A)
The columns of U are a basis for the column space
of A
U is the matrix of eigenvectors of the matrix
AAT
? is an r ? r diagonal matrix having singular
values ?1 ? ?2 ? ? ?r of A in order along its
diagonal
?2 is the matrix of eigenvalues of AAT or ATA
VT is a r ? d orthogonal matrix
The rows of VT are a basis for the row space of
A
V is the matrix of eigenvectors of the matrix
ATA

44
Finding Latent Semantics
t ? d
t ? r
r ? r
r ? d
45
Finding Latent Semantics

A special rank-k approximation, Ak
Ak Uk ?k VkT
Uk
First k columns of U
?k
First k diagonal values of ?
VkT
First k rows of VT

46
Finding Latent Semantics
47
Finding Latent Semantics

Reduce the rank to 3

48
Finding Latent Semantics
Query
Score
49
Finding Latent Semantics
Query
Score
50
Contents

Introduction
CBR Where are we?
Multimedia annotation
Context-rich environments
Semantic web
Our work
Anglograms
Finding latent semantics
Using text for improved image search
Using images for improved text search
Web page structure
A cross-modal theory of linked document
semantics

51
Using Text for Improved Image Search

10 sets of 5 similar images

52
Using Text for Improved Image Search

Color anglogram
Each image is divided into 64 non-overlapping
blocks
Extract average hue and average saturation values
of each block
Hue and saturation each quantized into 10 values
Generate Delaunay triangles for each hue value
and each saturation value
Count two largest angles and quantize them into
36 bins, each of 5
Feature vector has 720 elements

53
Using Text for Improved Image Search

Annotations
Extra 15 elements
Category positions
sky, sun, land, water, boat, grass, horse, rhino,
bird, human, pyramid, column, tower, sphinx,
snow
Each image annotated with appropriate keywords
and the area coverage of each of these keywords
e.g., sky (0.55), sun (0.15), water (0.30)

54
Using Text for Improved Image Search
55
Using Text for Improved Image Search
56
Contents

Introduction
CBR Where are we?
Multimedia annotation
Context-rich environments
Semantic web
Our work
Anglograms
Finding latent semantics
Using text for improved image search
Using images for improved text search
Web page structure
A cross-modal theory of linked document
semantics

57
Using Images for Improved Text Search

Using documents collected from news Web sites
News headlines are often used as URL anchors and
document titles
Topic can be represented easily and clearly by a
group of keywords in the headline
News web sites often have extensive coverage of
the same topic during certain period of time
News documents often include multimedia
components which are closely related to the topic

58
Using Images for Improved Text Search

Discover the semantic correlation between
keywords and image in the same document
A collection of 20 documents from cnn.com
4 semantic categories of 5 documents each
43 keywords
Select 1 image from each document
Color anglogram

59
Using Images for Improved Text Search
60
Using Images for Improved Text Search
61
Using Images for Improved Text Search

Integrated feature vector F f1, f2,, f143T
Textual feature vector K k1, k2, , k43T
Image feature vector I i1, i2, , i100T
Feature document matrix A F1, F2, , F20
A USVT
U is 143 ? 143, S is 143 ? 20, and V is 20 ? 20
k 12
Ak UkSkVkT
Uk is 143 ? 12, Sk is 12 ? 12, and Vk is 20 ?
12

62
Using Images for Improved Text Search

Each image is normalized to 192 ? 128, and then
divided into 64 non-overlapping blocks
Extract average hue and saturation values of each
block
Hue and saturation each quantized into 10 values
Generate Delaunay triangles for each hue value
and each saturation value

63
Using Images for Improved Text Search

Count two largest angles and quantize them into
36 bins, each of 5
Image feature vector has 720 elements
Feature document matrix A is 763 ? 20
SVD
k 12

64
Using Images for Improved Text Search
Keywords only
Keywords using LSA
1 improvement
3 improvement
Image (global color histogram)
annotated keywords using LSA
21 improvement
Image (anglogram) annotated keywords using LSA
65
Contents

Introduction
CBR Where are we?
Multimedia annotation
Context-rich environments
Semantic web
Our work
Anglograms
Finding latent semantics
Using text for improved image search
Using images for improved text search
Web page structure
A cross-modal theory of linked document
semantics

66
Web Page Structure

Genre detection
We do the following
Display web page in the program
Get tag hierarchy with area co-ordinates
Normalize the web page to size 512 512
Divide page in 1616 blocks
Calculate area covered by each tag in each block
considering the level of the tag in tag
hierarchy
For each feature tag get the center coordinates
of the blocks where it is covering maximum area
as compared with other tags on the same level

67
Web Page Structure
68
Web Page Structure
69
Web Page Structure

Histogram
36 bins with two large angles
Tags independent of level
Try approach where tag on lower level overrides
upper-level tag

70
Web Page Structure

Set of tags defined -
Initially, a large set of feature tags (52) is
defined to ensure a powerful set of independent
features for the discrimination of web pages
A second set of tags (3) is defined based on
histograms created for initial set of tags so
that these tags will better differentiate web
pages

71
Web Page Structure

Experiment 1
Categories defined are
Detroit News
Times of India
Tribune India
Esakal
Amazon.com
Buy.com

72
Web Page Structure

Cluster category based on closest page

73
Web Page Structure

Experiment 2
Categories defined are
News paper environment
Detroit News
Times of India
Tribune India
Esakal
e - Commerce environment
Amazon.com
Buy.com

74
Web Page Structure
75
Contents

Introduction
CBR Where are we?
Multimedia annotation
Context-rich environments
Semantic web
Our work
Anglograms
Finding latent semantics
Using text for improved image search
Using images for improved text search
Web page structure
A cross-modal theory of linked document
semantics

76
A Cross-Modal Theory of Linked Document Semantics

Environment
Suppose one has a linked set of multimedia
documents
Web
Content-based hypermedia
This provides a rich context for individual
chunks of information
The structure of individual multimedia documents
The link structure

77
A Cross-Modal Theory of Linked Document Semantics

Goal
Derive document semantics based on user browsing
behavior
The same document has multiple semantics
Different people see different meanings in the
same document
Over short browsing paths, an individual users
wants and needs are uniform
The pages visited over these short paths exhibit
semantics in congruence with these wants and
needs

78
A Cross-Modal Theory of Linked Document Semantics

Questions
How can the semantics of a web page be derived
given a set of user browsing paths that end at
that page?
How can we characterize the semantics of a user
browsing path?
How can web page semantics help us in navigating
the web more efficiently?
How can our approach actually be implemented in
the real web world?

79
A Cross-Modal Theory of Linked Document Semantics

Our approach
We use actual browsing paths to find the latent
semantics of web pages
Textual features
Image features
Structural features
We hope to find general concepts comprising
various textual and image features which
frequently co-occur

80
A Cross-Modal Theory of Linked Document Semantics

We believe that a users browsing path exhibits
semantic coherence
While the users entire path exhibits multiple
semantics, especially pages far from each other
on the path, neighboring pages, especially the
portions close to the links taken, are
semantically close to each other

81
A Cross-Modal Theory of Linked Document Semantics

We would like to characterize the contiguous
sub-paths of a users browsing path that exhibit
similar semantics and detect the semantic break
points along the path where the semantics
appreciably change
Collect these sub-paths into a multiset

82
A Cross-Modal Theory of Linked Document Semantics

We categorize the semantics of each web page
based on a history of the semantically-coherent
browsing paths of all users which end at that
page
A browsing path will be represented by a
high-dimensional vector
The various positions of the vector correspond to
the presence of
textual keywords
image features (visual keywords)
structural features (structural keywords)

83
A Cross-Modal Theory of Linked Document Semantics

From the complete set of web pages under
consideration, we extract a set of textual,
visual, and structural keywords
For each multiset, M, of sub-paths that we are to
analyze, we form three matrices
term-path matrix
image-path matrix
structure-path matrix

84
A Cross-Modal Theory of Linked Document Semantics

The (i,j)th element of these matrices are
determined by
Strength of the presence of ith keyword along the
jth browsing path
Determined by
How many times this term occurs on the pages
along the path
How much time the user spends examining these
pages
How close each occurrence of the ith keyword is
to both the outgoing and incoming anchor
positions
How many times this browsing path occurs in M

85
A Cross-Modal Theory of Linked Document Semantics

These matrices may be concatenated together in
various ways to produce an overall keyword-path
matrix
Perform latent-semantic analysis to get concepts
A page is then represented by a set of concept
classes

86
Conclusions

Researchers in CBR should now be concentrating on
extracting semantics from multimedia documents
The web is a perfect testbed for studying
semi-(automated) techniques for multimedia
annotation due to contextual richness
CBR Semantic Web The Multimedia Semantic Web
Get Involved!!!

Write a Comment

User Comments (0)

About PowerShow.com

The Multimedia Semantic Web PowerPoint PPT Presentation