Exploitation of knowledge in video recordings Dr' Alexia Briassouli, Dr' Yiannis Kompatsiaris Multim

About This Presentation

Title:

Exploitation of knowledge in video recordings Dr' Alexia Briassouli, Dr' Yiannis Kompatsiaris Multim

Description:

... of knowledge in video recordings. Dr. Alexia Briassouli, Dr. Yiannis Kompatsiaris ... 1-2 exabytes (millions of terabytes) of new information produced world ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 31

Provided by: see48

Category:

more less

Transcript and Presenter's Notes

Title: Exploitation of knowledge in video recordings Dr' Alexia Briassouli, Dr' Yiannis Kompatsiaris Multim

1
Exploitation of knowledge in video recordings
Dr. Alexia Briassouli, Dr. Yiannis
KompatsiarisMultimedia Knowledge
LaboratoryCERTH-ITI

October 24, 2008
Thessaloniki, Greece

2
Evolution of Content

1-2 exabytes (millions of terabytes) of new
information produced world-wide annually
80 billion of digital images are captured each
year
Over 1 billion images related to commercial
transactions are available through the Internet
This number is estimated to increase by ten times
in the next two years.
4 000 new films are produced each year
300 000 world-wide available films
33 000 television stations and 43 000 radio
stations
100 billions of hours of audiovisual content

Personal Content
Sport - News
Web Mobile
Movies
3
(No Transcript)
4
Need for annotation medatata

The value of information depends on how easily
it can be found, retrieved, accessed, filtered or
managed in an active, personalized way

5
Video Analysis

Video analysis that exploits knowledge provides
significant advantages
Improved accuracy of semantics from video
Higher level concepts inferred through
exploitation of knowledge combined with video
processing
Knowledge about behavior, event detection
More efficient storage, access, retrieval,
dissemination of multimodal data because of the
(automatically generated) annotations

6
Video Analysis in JUMAS

7
Text-based indexing

Manual annotation
Straightforward
High/Semantic level
Efficient during content creation
Most commonly used
Necessary in a number of applications
- Time consuming
- Operator-application dependent
- Text related problems (synonyms etc)?

Annotation using captions and related text
Web, Video, Documents etc
Straightforward
High/Semantic level
Multimodal approach
- Text processing restrictions and limitations
- Captions must exist

8
Addressing the Semantic Gap

Semantic Gap for multimedia To map automatically
generated numerical low level-features to higher
level human-understandable semantic concepts

lt?xml version'1.0' encoding'ISO-8859-1'
?gt ltMpeg7 xmlnsgt ltDescriptionUnit xsitype
"DescriptorCollectionType"gt ltDescriptor
xsitype "DominantColorType"gt
ltSpatialCoherencygt31lt/SpatialCoherencygt
ltValuegt ltPercentagegt31lt/Percentagegt
ltIndexgt19 23 29 lt/Indexgt
ltColorVariancegt0 0 0 lt/ColorVariancegt
lt/Valuegt lt/Descriptorgt lt/DescriptionUnitgt lt/
Mpeg7gt
This image contains a sky region and is a
holiday image
Dominant Color Descriptor of a sky region
9
Problem definition

Semantic image analysis how to translate the
automatically extracted visual descriptions into
human like conceptual ones
Low-level features provide cues for
strengthen/weaken evidence based on visual
similarity
Prior knowledge is needed to support semantics
disambiguation

10
Knowledge ExtractionA common view
Feature extraction Text, Image analysis Segmentati
on, SVMs Evidence generation Vehicle, Building
Reasoning Fusion of annotations Consistency
checking Higher-level concepts/events Emergency
scene
Classifiers fusion Global vs. Local Modalities
fusion Context Ambulance
Multimedia content annotation tools Training (Stat
istical) Modeling
Domain Multimedia content Annotations Algorithms
- Features Context
11
Knowledge from Video analysis

Semantics from video
Implicitly derived via machine learning methods
i.e. based on training
SVM, HMM, Neural Networks, Bayesian Networks
Training uses appropriate data, relevant to the
semantics that interest us
Training finds models that connect low level
features (e.g. motion trajectories) with
high-level annotations
These models are then applied to test data

12
Classification ResultsaceMedia
Natural-Person 0.456798 Sailing-Boat
0.463645 Sand 0.476777 Building
0.415358 Pavement 0.454740 Road
0.503242 Body-Of-Water 0.489957 Cliff
0.472907 Cloud 0.757926 Mountain 0.512597 Sea
0.455338 Sky 0.658825 Stone 0.471733 Waterfall
0.500000 Wave 0.476669 Dried-Plant
0.494825 Dried-Plant-Snowed 0.476524 Foliage
0.497562 Grass 0.491781 Tree 0.447355 Trunk
0.493255 Snow 0.467218 Sunset 0.503164 Car
0.456347 Ground 0.454769 Lamp-Post
0.499387 Statue 0.501076
Segments hypothesis set
13
Frame Region Concept Association

Region feature vector formed from local
descriptors
Individual SVM introduced for every defined local
concept, receiving as input the region feature
vector
Training identical to global concept training
case
Every region evaluated by all trained SVMs,
segments local concept hypothesis set created (
)?

Segments hypothesis set
Ground 0.89 Grass
0.44 Mountain 0.21 Boat
0.07 Smoke 0.41 Dirty-Water 0.18 Trunk
0.12 Foam 0.19 Debris
0.34 Mud 0.31 Water 0.42
Sky 0.22 Ashes 0.11
Subtitles 0.24 Flames 0.13 Vehicle
0.12 Building 0. 25 Foliage
0.84 Person 0.32 Road 0.39
14
Initial Region-Concept Association

Region feature vector formed from local
descriptors
Individual SVM introduced for every defined
concept, receiving as input the region feature
vector
Training identical to global training case
Every region evaluated by all trained SVMs,
segments concept hypothesis set created ( )?

Segments hypothesis set
Building 0.89 Roof 0.29 Grass
0.21 Tree 0.07 Stone 0.41
Ground 0.15 Dried-plant 0.12 Sky
0.19 Person 0.34 Trunk
0.31 Vegetation 0.42 Rock 0.22 Boat
0.11 Sand 0.44 Sea
0.13 Wave 0.12
15
Knowledge for Video analysis

Explicit Semantics from video
Based on previously known models
Explicitly defined models, rules, facts
Rules from preliminary scripts and standards
from similar cases
Explicit and implicit knowledge can be combined
with results from low-level video processing to
extract meaningful high-level knowledge

16
System Overview
17
Video analysis

Motion Analysis
Motion detection
Tracking
Detection of when motion occurs
Motion Segmentation
Object segmentation based on motion
characteristics
Generation of active regions

18
Activity Areas from motion analysis
19
Sub-activity Areas

After statistical processing for temporal
localization of motion and events

People walking towards each other
People leave together
People meet
20
Fight Sequence
21
Video Processing (1)?

Pre-processing
Separate video from audio
Split video into frames
Noise removal via spatiotemporal filtering
Scene/shot detection
Shot frames taken by single camera
Detect transition between frames
Uses only low-level information
Scene story-telling unit
Uses higher-level knowledge, semantics

22
Video Processing (2)?

Spatial segmentation
Spatial segmentation in images, video frames
Extracts object(s) based on color, texture
features
Motion segmentation
Groups pixels with similar motion
Spatiotemporal segmentation
Finds objects over several frames through
combination of motion, appearance features
Merges spatial and motion segmentation results

23
Knowledge in Video Analysis (1)?

Low level features can be combined with
knowledge/rules for higher-level results
Spatiotemporally segmented objects can be used
for object recognition
Face/gesture recognition after training with
faces/gestures of significance
Motion in specific parts of a video (e.g. near
court entrance, near prisoners seat) has
additional significance
Needs prior knowledge of which parts of the video
frames are important and why

24
Knowledge in Video Analysis (2)?

Knowledge structures can provide additional
information about the relations between different
low-level features
Interactions e.g. two motions in opposite
directions, relation of extracted gestures, may
mean something people meeting, fighting,
pointing, gesticulating
Face recognition combined with prior knowledge
can show who is present when an event occurs

25
Conclusions

Combined use of video processing with knowledge
can lead to richer and more accurate high-level
descriptions of multimedia data
Can be used in many more applications than
currently, because the knowledge introduces
flexibility and adaptability to the system
The same algorithms and low-level features can
provide much more information when used in
combination with explicit and implicit knowledge

26
Thank you! CERTH-ITI / Multimedia Knowledge
Laboratory http//mklab.iti.gr
26
27
Video Analysis State of the Art

Spatiotemporal segmentation
Find spatiotemporally homogeneous objects i.e.
similar appearance and motion
Apply spatial segmentation on each frame
Match segmented objects in successive frames
using low-level features (e.g. similar color,
texture, continuous motion)?
Use motion information project position of
object in current/next frames

28
Video Analysis State of the Art
29
Video Analysis State of the Art
30
Video Analysis State of the Art

Spatial segmentation
Spatial segmentation in images, video frames
Region Based Most methods are based on grouping
similar features like color, texture, location
based on homogeneity of intensity, texture,
position
Gradient/edge based detecting changes in spatial
distribution of features e.g. pixel illumination
Some methods combine region/edge information

Write a Comment

User Comments (0)

About PowerShow.com

Exploitation of knowledge in video recordings Dr' Alexia Briassouli, Dr' Yiannis Kompatsiaris Multim - PowerPoint PPT Presentation

Exploitation of knowledge in video recordings Dr' Alexia Briassouli, Dr' Yiannis Kompatsiaris Multim

... of knowledge in video recordings. Dr. Alexia Briassouli, Dr. Yiannis Kompatsiaris ... 1-2 exabytes (millions of terabytes) of new information produced world ... – PowerPoint PPT presentation