The Multimedia Semantic Web - PowerPoint PPT Presentation

1 / 86
About This Presentation
Title:

The Multimedia Semantic Web

Description:

News headlines are often used as URL anchors and document titles ... A collection of 20 documents from cnn.com. 4 semantic categories of 5 documents each ... – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 87
Provided by: valued76
Category:

less

Transcript and Presenter's Notes

Title: The Multimedia Semantic Web


1
The Multimedia Semantic Web
  • Bill Grosky
  • Multimedia Information Systems Laboratory
  • University of Michigan-Dearborn
  • Dearborn, Michigan

2
Contents
  • Introduction
  • CBR Where are we?
  • Multimedia annotation
  • Context-rich environments
  • Semantic web
  • Our work
  • Anglograms
  • Finding latent semantics
  • Using text for improved image search
  • Using images for improved text search
  • Web page structure
  • A cross-modal theory of linked document
    semantics

3
CBR Where are We?
  • Development of feature-based techniques for
    content-based retrieval is a mature area, at
    least for images
  • CBR researchers should now concentrate on
    extracting semantics from multimedia documents so
    that retrievals using concept-based queries can
    be tailored to individual users
  • The semantic gap
  • (Semi)-automated multimedia annotation

4
Multimedia Annotation
  • Multimedia annotations should be semantically
    rich
  • Multiple semantics
  • A social theory based on how multimedia
    information is used
  • This can be discovered by placing multimedia
    information in a natural, context-rich environment

5
Context-Rich Environments
  • Structural context Authors contribution
  • Documents author places semantically similar
    pieces of information close to each other
  • User can cluster together semantically similar
    pieces of information
  • Dynamic context Users contribution
  • Short browsing sub-paths are semantically coherent

6
Context-Rich Environments
  • The WEB is a perfect example of a context-rich
    environment
  • Develop multimedia annotations through
    cross-modal techniques
  • Audio
  • Images
  • Text
  • Video

7
Semantic Web
  • This program overlaps another very important
    current research topic, the semantic web
  • Web page annotations are the backbone of this
    research effort
  • We have something very important to offer to this
    area
  • Multimedia documents
  • Deriving multiple semantics for a single
    document
  • Combining our efforts will enrich both
    communities

8
Semantic Web
  • The Semantic Web is a new initiative to
    transform the web into a structure that supports
    more intelligent querying and browsing, both by
    machines and by humans. This transformation is to
    be supported through the generation and use of
    metadata constructed via web annotation tools
    using user-defined ontologies that can be related
    to one another.
  • Somewhere on the web

9
Semantic Web
End User
Ontology Articulation Toolkit
Agents
Ontology Construction Tool
Ontologies
Community Portal
?x C ? D
Inference Engine
Web-Page Annotation Tool
Annotated Web Pages
Metadata Repository
Based on www.semanticweb.org
10
Semantic Web
  • Plan a vacation within the next month
  • Bill instructed his semantic web agent through
    his handheld browser.
  • An agent retrieved Bills vacation profile from
    his travel agent, retrieved Bills availability
    from his calendar, checked availability of
    airlines, hotels and restaurants, and made all
    the necessary arrangements.

11
Semantic Web
  • Multimedia semantic web
  • Plan a vacation close to where
  • is being exhibited.

12
Contents
  • Introduction
  • CBR Where are we?
  • Multimedia annotation
  • Context-rich environments
  • Semantic web
  • Our work
  • Anglograms
  • Finding latent semantics
  • Using text for improved image search
  • Using images for improved text search
  • Web page structure
  • A cross-modal theory of linked document
    semantics

13
Anglograms
  • Image object
  • Entire image
  • Some meaningful portion of an image
  • semcon
  • Point-based features
  • corner points
  • color histograms

14
Anglograms
Point feature map for shape
15
Anglograms
Point feature map for color
16
Anglograms
Voronoi diagram of n 18 sites
17
Anglograms
18
Anglograms
  • Delaunay triangulation of a set of n points
  • O(n log n) algorithm
  • Invariance of Delaunay triangles of a set of
    points to
  • translation
  • rotation
  • scaling

19
Anglograms
  • Spatial layout of point set
  • Anglogram
  • Computed by discretizing and counting the angles
    of the Delaunay triangles
  • Which angles are counted?
  • O(max(n bins)) algorithm
  • What is bin size?

20
(No Transcript)
21
Anglograms
  • Computation of color anglogram of an image
  • Divide image evenly into a number of MN
    non-overlapping blocks
  • Each individual block is abstracted as a unique
    feature point labeled with its spatial location
    and dominant colors

22
Anglograms
  • Computation of color anglogram of an image
  • Point feature map
  • Normalized feature points, after adjusting any
    two neighboring feature points to a fixed
    distance
  • Construct Delaunay triangulation for each set of
    feature points labeled with identical color

23
Anglograms
  • Computation of color anglogram of an image
  • Compute anglogram based on each Delaunay
    triangulation
  • Color anglogram for image
  • Concatenating all the anglograms together

24
Anglograms
Pyramid image
25
Anglograms
26
Anglograms
Hue component
27
Anglograms
Saturation component
28
Anglograms
Point feature map
29
Anglograms
Feature points of hue 2
30
Anglograms
Delaunay triangulation of hue 2
31
Anglograms
Delaunay triangulation of saturation 5
32
Anglograms
Anglogram of saturation 5
33
Contents
  • Introduction
  • CBR Where are we?
  • Multimedia annotation
  • Context-rich environments
  • Semantic web
  • Our work
  • Anglograms
  • Finding latent semantics
  • Using text for improved image search
  • Using images for improved text search
  • Web page structure
  • A cross-modal theory of linked document
    semantics

34
Finding Latent Semantics
  • We want to transform low-level features to a
    higher level of meaning
  • Used for dimension reduction in QBIC
  • Searching in high-dimensional spaces
  • More importantly, it creates clusters of
    co-occurring features
  • So-called concepts

35
Finding Latent Semantics
  • Latent Semantic Analysis (LSA) was introduced to
    overcome a fundamental problem in textual
    information retrieval
  • Users want to retrieve on the basis of conceptual
    content
  • Individual words provide unreliable evidence
    about conceptual meanings
  • Synonymy
  • Many ways to refer to the same object
  • Polysemy
  • Most words have more than one distinct meaning

36
Finding Latent Semantics
  • Searching for documents concerning automobiles
  • Tend to use the key-word automobile
  • A statistical analysis determines that the
    key-words automobile and car tend to co-occur
  • LSA will retrieve documents in which the key-word
    car appears, but not the key-word automobile

37
Finding Latent Semantics
  • Term-document association
  • It is assumed that there exists some underlying
    latent semantic structure in the data that is
    partially obscured by the randomness of term
    choice
  • By semantic structure we mean the correlation
    structure in which individual terms appear in
    documents
  • Semantic implies only the fact that terms in a
    document may be taken as referents to the
    document itself or to its topic
  • Statistical techniques are used to estimate this
    latent semantic structure, and to get rid of
    obscuring noise

38
Finding Latent Semantics
  • Singular-value decomposition (SVD)
  • Take a large matrix of term-document association
  • Construct a semantic space wherein terms and
    documents that are closely associated are placed
    near to each other
  • SVD allows the arrangement of space to reflect
    the major associative patterns and ignore
    smaller, less important influence
  • As a result, terms that did not actually appear
    in a document may still end up close to the
    document, if that is consistent with the major
    patterns of association
  • Position in the space serves as the semantic
    indexing
  • Retrieval proceeds by using the terms in a query
    to identify a point in the semantic space, and
    documents in its neighborhood are returned as
    relevant results

39
Finding Latent Semantics
  • Term-document matrix
  • d documents
  • t terms
  • Represented by a t ? d term-document matrix A
  • Each document is represented by a column
  • document vector
  • Each term is represented by a row
  • term vector

40
Finding Latent Semantics
41
Finding Latent Semantics
42
Finding Latent Semantics
  • SVD is a dimension reduction technique
  • Reduced-rank approximation to both column space
    and row space
  • Find a rank-k approximation to matrix A with
    minimal change to that matrix for a given value
    of k
  • This decomposition exists for any matrix A

43
Finding Latent Semantics
  • SVD of a term-document matrix A
  • A U ? VT
  • A is t ? d
  • U is a t ? r orthogonal matrix, where r is
    rank(A)
  • The columns of U are a basis for the column space
    of A
  • U is the matrix of eigenvectors of the matrix
    AAT
  • ? is an r ? r diagonal matrix having singular
    values ?1 ? ?2 ? ? ?r of A in order along its
    diagonal
  • ?2 is the matrix of eigenvalues of AAT or ATA
  • VT is a r ? d orthogonal matrix
  • The rows of VT are a basis for the row space of
    A
  • V is the matrix of eigenvectors of the matrix
    ATA

44
Finding Latent Semantics
t ? d
t ? r
r ? r
r ? d
45
Finding Latent Semantics
  • A special rank-k approximation, Ak
  • Ak Uk ?k VkT
  • Uk
  • First k columns of U
  • ?k
  • First k diagonal values of ?
  • VkT
  • First k rows of VT

46
Finding Latent Semantics
47
Finding Latent Semantics
  • Reduce the rank to 3

48
Finding Latent Semantics
Query
Score
49
Finding Latent Semantics
Query
Score
50
Contents
  • Introduction
  • CBR Where are we?
  • Multimedia annotation
  • Context-rich environments
  • Semantic web
  • Our work
  • Anglograms
  • Finding latent semantics
  • Using text for improved image search
  • Using images for improved text search
  • Web page structure
  • A cross-modal theory of linked document
    semantics

51
Using Text for Improved Image Search
  • 10 sets of 5 similar images

52
Using Text for Improved Image Search
  • Color anglogram
  • Each image is divided into 64 non-overlapping
    blocks
  • Extract average hue and average saturation values
    of each block
  • Hue and saturation each quantized into 10 values
  • Generate Delaunay triangles for each hue value
    and each saturation value
  • Count two largest angles and quantize them into
    36 bins, each of 5
  • Feature vector has 720 elements

53
Using Text for Improved Image Search
  • Annotations
  • Extra 15 elements
  • Category positions
  • sky, sun, land, water, boat, grass, horse, rhino,
    bird, human, pyramid, column, tower, sphinx,
    snow
  • Each image annotated with appropriate keywords
    and the area coverage of each of these keywords
  • e.g., sky (0.55), sun (0.15), water (0.30)

54
Using Text for Improved Image Search
55
Using Text for Improved Image Search
56
Contents
  • Introduction
  • CBR Where are we?
  • Multimedia annotation
  • Context-rich environments
  • Semantic web
  • Our work
  • Anglograms
  • Finding latent semantics
  • Using text for improved image search
  • Using images for improved text search
  • Web page structure
  • A cross-modal theory of linked document
    semantics

57
Using Images for Improved Text Search
  • Using documents collected from news Web sites
  • News headlines are often used as URL anchors and
    document titles
  • Topic can be represented easily and clearly by a
    group of keywords in the headline
  • News web sites often have extensive coverage of
    the same topic during certain period of time
  • News documents often include multimedia
    components which are closely related to the topic


58
Using Images for Improved Text Search
  • Discover the semantic correlation between
    keywords and image in the same document
  • A collection of 20 documents from cnn.com
  • 4 semantic categories of 5 documents each
  • 43 keywords
  • Select 1 image from each document
  • Color anglogram

59
Using Images for Improved Text Search
60
Using Images for Improved Text Search
61
Using Images for Improved Text Search
  • Integrated feature vector F f1, f2,, f143T
  • Textual feature vector K k1, k2, , k43T
  • Image feature vector I i1, i2, , i100T
  • Feature document matrix A F1, F2, , F20
  • A USVT
  • U is 143 ? 143, S is 143 ? 20, and V is 20 ? 20
  • k 12
  • Ak UkSkVkT
  • Uk is 143 ? 12, Sk is 12 ? 12, and Vk is 20 ?
    12

62
Using Images for Improved Text Search
  • Each image is normalized to 192 ? 128, and then
    divided into 64 non-overlapping blocks
  • Extract average hue and saturation values of each
    block
  • Hue and saturation each quantized into 10 values
  • Generate Delaunay triangles for each hue value
    and each saturation value

63
Using Images for Improved Text Search
  • Count two largest angles and quantize them into
    36 bins, each of 5
  • Image feature vector has 720 elements
  • Feature document matrix A is 763 ? 20
  • SVD
  • k 12

64
Using Images for Improved Text Search
Keywords only
Keywords using LSA
1 improvement
3 improvement
Image (global color histogram)
annotated keywords using LSA
21 improvement
Image (anglogram) annotated keywords using LSA
65
Contents
  • Introduction
  • CBR Where are we?
  • Multimedia annotation
  • Context-rich environments
  • Semantic web
  • Our work
  • Anglograms
  • Finding latent semantics
  • Using text for improved image search
  • Using images for improved text search
  • Web page structure
  • A cross-modal theory of linked document
    semantics

66
Web Page Structure
  • Genre detection
  • We do the following
  • Display web page in the program
  • Get tag hierarchy with area co-ordinates
  • Normalize the web page to size 512 512
  • Divide page in 1616 blocks
  • Calculate area covered by each tag in each block
    considering the level of the tag in tag
    hierarchy
  • For each feature tag get the center coordinates
    of the blocks where it is covering maximum area
    as compared with other tags on the same level

67
Web Page Structure
68
Web Page Structure
69
Web Page Structure
  • Histogram
  • 36 bins with two large angles
  • Tags independent of level
  • Try approach where tag on lower level overrides
    upper-level tag

70
Web Page Structure
  • Set of tags defined -
  • Initially, a large set of feature tags (52) is
    defined to ensure a powerful set of independent
    features for the discrimination of web pages
  • A second set of tags (3) is defined based on
    histograms created for initial set of tags so
    that these tags will better differentiate web
    pages

71
Web Page Structure
  • Experiment 1
  • Categories defined are
  • Detroit News
  • Times of India
  • Tribune India
  • Esakal
  • Amazon.com
  • Buy.com

72
Web Page Structure
  • Cluster category based on closest page

73
Web Page Structure
  • Experiment 2
  • Categories defined are
  • News paper environment
  • Detroit News
  • Times of India
  • Tribune India
  • Esakal
  • e - Commerce environment
  • Amazon.com
  • Buy.com

74
Web Page Structure
75
Contents
  • Introduction
  • CBR Where are we?
  • Multimedia annotation
  • Context-rich environments
  • Semantic web
  • Our work
  • Anglograms
  • Finding latent semantics
  • Using text for improved image search
  • Using images for improved text search
  • Web page structure
  • A cross-modal theory of linked document
    semantics

76
A Cross-Modal Theory of Linked Document Semantics
  • Environment
  • Suppose one has a linked set of multimedia
    documents
  • Web
  • Content-based hypermedia
  • This provides a rich context for individual
    chunks of information
  • The structure of individual multimedia documents
  • The link structure

77
A Cross-Modal Theory of Linked Document Semantics
  • Goal
  • Derive document semantics based on user browsing
    behavior
  • The same document has multiple semantics
  • Different people see different meanings in the
    same document
  • Over short browsing paths, an individual users
    wants and needs are uniform
  • The pages visited over these short paths exhibit
    semantics in congruence with these wants and
    needs

78
A Cross-Modal Theory of Linked Document Semantics
  • Questions
  • How can the semantics of a web page be derived
    given a set of user browsing paths that end at
    that page?
  • How can we characterize the semantics of a user
    browsing path?
  • How can web page semantics help us in navigating
    the web more efficiently?
  • How can our approach actually be implemented in
    the real web world?

79
A Cross-Modal Theory of Linked Document Semantics
  • Our approach
  • We use actual browsing paths to find the latent
    semantics of web pages
  • Textual features
  • Image features
  • Structural features
  • We hope to find general concepts comprising
    various textual and image features which
    frequently co-occur

80
A Cross-Modal Theory of Linked Document Semantics
  • We believe that a users browsing path exhibits
    semantic coherence
  • While the users entire path exhibits multiple
    semantics, especially pages far from each other
    on the path, neighboring pages, especially the
    portions close to the links taken, are
    semantically close to each other

81
A Cross-Modal Theory of Linked Document Semantics
  • We would like to characterize the contiguous
    sub-paths of a users browsing path that exhibit
    similar semantics and detect the semantic break
    points along the path where the semantics
    appreciably change
  • Collect these sub-paths into a multiset

82
A Cross-Modal Theory of Linked Document Semantics
  • We categorize the semantics of each web page
    based on a history of the semantically-coherent
    browsing paths of all users which end at that
    page
  • A browsing path will be represented by a
    high-dimensional vector
  • The various positions of the vector correspond to
    the presence of
  • textual keywords
  • image features (visual keywords)
  • structural features (structural keywords)

83
A Cross-Modal Theory of Linked Document Semantics
  • From the complete set of web pages under
    consideration, we extract a set of textual,
    visual, and structural keywords
  • For each multiset, M, of sub-paths that we are to
    analyze, we form three matrices
  • term-path matrix
  • image-path matrix
  • structure-path matrix

84
A Cross-Modal Theory of Linked Document Semantics
  • The (i,j)th element of these matrices are
    determined by
  • Strength of the presence of ith keyword along the
    jth browsing path
  • Determined by
  • How many times this term occurs on the pages
    along the path
  • How much time the user spends examining these
    pages
  • How close each occurrence of the ith keyword is
    to both the outgoing and incoming anchor
    positions
  • How many times this browsing path occurs in M

85
A Cross-Modal Theory of Linked Document Semantics
  • These matrices may be concatenated together in
    various ways to produce an overall keyword-path
    matrix
  • Perform latent-semantic analysis to get concepts
  • A page is then represented by a set of concept
    classes

86
Conclusions
  • Researchers in CBR should now be concentrating on
    extracting semantics from multimedia documents
  • The web is a perfect testbed for studying
    semi-(automated) techniques for multimedia
    annotation due to contextual richness
  • CBR Semantic Web The Multimedia Semantic Web
  • Get Involved!!!
Write a Comment
User Comments (0)
About PowerShow.com