Exploitation of knowledge in video recordings Dr' Alexia Briassouli, Dr' Yiannis Kompatsiaris Multim - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Exploitation of knowledge in video recordings Dr' Alexia Briassouli, Dr' Yiannis Kompatsiaris Multim

Description:

... of knowledge in video recordings. Dr. Alexia Briassouli, Dr. Yiannis Kompatsiaris ... 1-2 exabytes (millions of terabytes) of new information produced world ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 31
Provided by: see48
Category:

less

Transcript and Presenter's Notes

Title: Exploitation of knowledge in video recordings Dr' Alexia Briassouli, Dr' Yiannis Kompatsiaris Multim


1
Exploitation of knowledge in video recordings
Dr. Alexia Briassouli, Dr. Yiannis
KompatsiarisMultimedia Knowledge
LaboratoryCERTH-ITI
  • October 24, 2008
  • Thessaloniki, Greece

2
Evolution of Content
  • 1-2 exabytes (millions of terabytes) of new
    information produced world-wide annually
  • 80 billion of digital images are captured each
    year
  • Over 1 billion images related to commercial
    transactions are available through the Internet
  • This number is estimated to increase by ten times
    in the next two years.
  • 4 000 new films are produced each year
  • 300 000 world-wide available films
  • 33 000 television stations and 43 000 radio
    stations
  • 100 billions of hours of audiovisual content

Personal Content
Sport - News
Web Mobile
Movies
3
(No Transcript)
4
Need for annotation medatata
  • The value of information depends on how easily
    it can be found, retrieved, accessed, filtered or
    managed in an active, personalized way

5
Video Analysis
  • Video analysis that exploits knowledge provides
    significant advantages
  • Improved accuracy of semantics from video
  • Higher level concepts inferred through
    exploitation of knowledge combined with video
    processing
  • Knowledge about behavior, event detection
  • More efficient storage, access, retrieval,
    dissemination of multimodal data because of the
    (automatically generated) annotations

6
Video Analysis in JUMAS

7
Text-based indexing
  • Manual annotation
  • Straightforward
  • High/Semantic level
  • Efficient during content creation
  • Most commonly used
  • Necessary in a number of applications
  • - Time consuming
  • - Operator-application dependent
  • - Text related problems (synonyms etc)?
  • Annotation using captions and related text
  • Web, Video, Documents etc
  • Straightforward
  • High/Semantic level
  • Multimodal approach
  • - Text processing restrictions and limitations
  • - Captions must exist

8
Addressing the Semantic Gap
  • Semantic Gap for multimedia To map automatically
    generated numerical low level-features to higher
    level human-understandable semantic concepts

lt?xml version'1.0' encoding'ISO-8859-1'
?gt ltMpeg7 xmlnsgt ltDescriptionUnit xsitype
"DescriptorCollectionType"gt ltDescriptor
xsitype "DominantColorType"gt
ltSpatialCoherencygt31lt/SpatialCoherencygt
ltValuegt ltPercentagegt31lt/Percentagegt
ltIndexgt19 23 29 lt/Indexgt
ltColorVariancegt0 0 0 lt/ColorVariancegt
lt/Valuegt lt/Descriptorgt lt/DescriptionUnitgt lt/
Mpeg7gt
This image contains a sky region and is a
holiday image
Dominant Color Descriptor of a sky region
9
Problem definition
  • Semantic image analysis how to translate the
    automatically extracted visual descriptions into
    human like conceptual ones
  • Low-level features provide cues for
    strengthen/weaken evidence based on visual
    similarity
  • Prior knowledge is needed to support semantics
    disambiguation

10
Knowledge ExtractionA common view
Feature extraction Text, Image analysis Segmentati
on, SVMs Evidence generation Vehicle, Building
Reasoning Fusion of annotations Consistency
checking Higher-level concepts/events Emergency
scene
Classifiers fusion Global vs. Local Modalities
fusion Context Ambulance
Multimedia content annotation tools Training (Stat
istical) Modeling
Domain Multimedia content Annotations Algorithms
- Features Context
11
Knowledge from Video analysis
  • Semantics from video
  • Implicitly derived via machine learning methods
    i.e. based on training
  • SVM, HMM, Neural Networks, Bayesian Networks
  • Training uses appropriate data, relevant to the
    semantics that interest us
  • Training finds models that connect low level
    features (e.g. motion trajectories) with
    high-level annotations
  • These models are then applied to test data

12
Classification ResultsaceMedia
Natural-Person 0.456798 Sailing-Boat
0.463645 Sand 0.476777 Building
0.415358 Pavement 0.454740 Road
0.503242 Body-Of-Water 0.489957 Cliff
0.472907 Cloud 0.757926 Mountain 0.512597 Sea
0.455338 Sky 0.658825 Stone 0.471733 Waterfall
0.500000 Wave 0.476669 Dried-Plant
0.494825 Dried-Plant-Snowed 0.476524 Foliage
0.497562 Grass 0.491781 Tree 0.447355 Trunk
0.493255 Snow 0.467218 Sunset 0.503164 Car
0.456347 Ground 0.454769 Lamp-Post
0.499387 Statue 0.501076
Segments hypothesis set
13
Frame Region Concept Association
  • Region feature vector formed from local
    descriptors
  • Individual SVM introduced for every defined local
    concept, receiving as input the region feature
    vector
  • Training identical to global concept training
    case
  • Every region evaluated by all trained SVMs,
    segments local concept hypothesis set created (
    )?

Segments hypothesis set
Ground 0.89 Grass
0.44 Mountain 0.21 Boat
0.07 Smoke 0.41 Dirty-Water 0.18 Trunk
0.12 Foam 0.19 Debris
0.34 Mud 0.31 Water 0.42
Sky 0.22 Ashes 0.11
Subtitles 0.24 Flames 0.13 Vehicle
0.12 Building 0. 25 Foliage
0.84 Person 0.32 Road 0.39
14
Initial Region-Concept Association
  • Region feature vector formed from local
    descriptors
  • Individual SVM introduced for every defined
    concept, receiving as input the region feature
    vector
  • Training identical to global training case
  • Every region evaluated by all trained SVMs,
    segments concept hypothesis set created ( )?

Segments hypothesis set
Building 0.89 Roof 0.29 Grass
0.21 Tree 0.07 Stone 0.41
Ground 0.15 Dried-plant 0.12 Sky
0.19 Person 0.34 Trunk
0.31 Vegetation 0.42 Rock 0.22 Boat
0.11 Sand 0.44 Sea
0.13 Wave 0.12
15
Knowledge for Video analysis
  • Explicit Semantics from video
  • Based on previously known models
  • Explicitly defined models, rules, facts
  • Rules from preliminary scripts and standards
    from similar cases
  • Explicit and implicit knowledge can be combined
    with results from low-level video processing to
    extract meaningful high-level knowledge

16
System Overview
17
Video analysis
  • Motion Analysis
  • Motion detection
  • Tracking
  • Detection of when motion occurs
  • Motion Segmentation
  • Object segmentation based on motion
    characteristics
  • Generation of active regions

18
Activity Areas from motion analysis
19
Sub-activity Areas
  • After statistical processing for temporal
    localization of motion and events

People walking towards each other
People leave together
People meet
20
Fight Sequence
21
Video Processing (1)?
  • Pre-processing
  • Separate video from audio
  • Split video into frames
  • Noise removal via spatiotemporal filtering
  • Scene/shot detection
  • Shot frames taken by single camera
  • Detect transition between frames
  • Uses only low-level information
  • Scene story-telling unit
  • Uses higher-level knowledge, semantics

22
Video Processing (2)?
  • Spatial segmentation
  • Spatial segmentation in images, video frames
  • Extracts object(s) based on color, texture
    features
  • Motion segmentation
  • Groups pixels with similar motion
  • Spatiotemporal segmentation
  • Finds objects over several frames through
    combination of motion, appearance features
  • Merges spatial and motion segmentation results

23
Knowledge in Video Analysis (1)?
  • Low level features can be combined with
    knowledge/rules for higher-level results
  • Spatiotemporally segmented objects can be used
    for object recognition
  • Face/gesture recognition after training with
    faces/gestures of significance
  • Motion in specific parts of a video (e.g. near
    court entrance, near prisoners seat) has
    additional significance
  • Needs prior knowledge of which parts of the video
    frames are important and why

24
Knowledge in Video Analysis (2)?
  • Knowledge structures can provide additional
    information about the relations between different
    low-level features
  • Interactions e.g. two motions in opposite
    directions, relation of extracted gestures, may
    mean something people meeting, fighting,
    pointing, gesticulating
  • Face recognition combined with prior knowledge
    can show who is present when an event occurs

25
Conclusions
  • Combined use of video processing with knowledge
    can lead to richer and more accurate high-level
    descriptions of multimedia data
  • Can be used in many more applications than
    currently, because the knowledge introduces
    flexibility and adaptability to the system
  • The same algorithms and low-level features can
    provide much more information when used in
    combination with explicit and implicit knowledge

26
Thank you! CERTH-ITI / Multimedia Knowledge
Laboratory http//mklab.iti.gr
26
27
Video Analysis State of the Art
  • Spatiotemporal segmentation
  • Find spatiotemporally homogeneous objects i.e.
    similar appearance and motion
  • Apply spatial segmentation on each frame
  • Match segmented objects in successive frames
    using low-level features (e.g. similar color,
    texture, continuous motion)?
  • Use motion information project position of
    object in current/next frames

28
Video Analysis State of the Art
29
Video Analysis State of the Art
30
Video Analysis State of the Art
  • Spatial segmentation
  • Spatial segmentation in images, video frames
  • Region Based Most methods are based on grouping
    similar features like color, texture, location
    based on homogeneity of intensity, texture,
    position
  • Gradient/edge based detecting changes in spatial
    distribution of features e.g. pixel illumination
  • Some methods combine region/edge information
Write a Comment
User Comments (0)
About PowerShow.com