Title: A Motivating Scenario for Designing an Extensible Audio-Visual Description Language
1A Motivating Scenario for Designing an Extensible
Audio-Visual Description Language
Raphaël Troncy, Jean Carrive, Steffen Lalande
and Jean-Philippe Poli
- Monday 25th of October, 2004
2Description of the AV content
- Various uses / Different granularity
- identification of the content creator and the
content provider Dublin Core metadata, VRA core
categories, TV Anytime metadata - feature extraction from the video signal storing
and exchanging automatic tools results (MPEG-7) - structural decomposition in video segments
corresponding to a logical structure of the
program time-code, spatial coordinates - semantic description of these segments
controlled vocabulary, thesaurus, free text
annotation
3Description of the AV content(cultural heritage
point of view)
- Segmentation
- locate and date some events
- Description
- type each segment with an AV genre
- type each segment with a general thematic
- give hints on the production
- describe the scene (who, when, where, what, )
? needs a powerful description language
4Motivating scenario
- Generic application for describing manually TV
programs w.r.t - structural constraints patterns represent the
logical structure of a document - semantic constraints the description of the
content is machine understandable - Let us define the temporal structure of a Sports
Magazine
5MPEG-7, the natural candidate description
language?
- ISO standard since December of 2001
- Main components
- Descriptors (Ds) and Description Schemes (DSs)
- DDL (XML Schema extensions)
- Concern all types of media
Part 5 - MDS
6MPEG-7 a non-suitable description language for
this scenario
- A non-extensible language
- closed set of descriptors
- Exchange syntax rather than a real machine
processable multimedia description language - non object-based data model
- non modular language (universal approach)
- No formal semantics provided
- applications cannot have access to the meaning of
the documents
? the DDL (XML Schema) fault ?
7MPEG-7 a non-suitable description language for
this scenario
- How to define new descriptors ?
- How to define new description schemes ?
- How to make the description machine
understandable ?
? how to reconciliate the critical issue
object-oriented semantic expression versus
structural validation
8Our proposition AVDL
- AVDL a reduced yet extensible audio-visual
description language - an object meta-model (an instance model specifies
the vocabulary for and the rules followed by the
descriptions) - an XML syntax
- a semantics (closed to DL for the descriptors)
- Description Schemes
- Descriptors
- Properties
- Structures
- Descriptions
- valid instances w.r.t description schemes
9The meta class level
10The class level
11Location
12Document, Content and Media
- Distinction
- Document vs Content vs Media
- Virtual content vs physical content
- Media a content abstraction for decomposition
- audio tracks, subtitles
13Defining Structures
- A structure defines how the descriptors may and
have to be combined - allows a description control
- allows an automatic completion of the
descriptions - AVDL provides some predefined structure models
- containment gives the list of the possible
sub-segments of an AV segment (in space and in
time) - regular expression by analogy of grammar for
temporal succession - Other models are currently studied temporal
constraints, etc.
14AVDL Implementation
- XML Serialization
- Independent from a schema language
- Use XML Schema validation (mainly for datatypes)
- C
- Object inheritance
- Use of the .NET reflexivity
15XML Serialization
avdl.xsd
Audio-Visual Description Language
ds-17.xsd
partial control
partial control
ds-17.xml
d-162.xml
transformation
Description Schemes
Descriptions
16XML Syntax (DS)
- ltDescriptor xsitype"LocatedDescriptorType"
id"id-d2" name"Tracking"gt - ltProperty ref"id-p2"/gt
- ltStructure ref"id-s2"/gt
- ltDescriptionRelationship characterization"strin
g"gt - ltLocation type"TemporalInterval"/gt
- ltMedia type"Media"/gt
- lt/DescriptionRelationshipgt
- lt/Descriptorgt
ltProperty id"id-p2" name"nbDetection"gt
ltDomain descriptor"id-d2"/gt ltRangegt
ltPrimitive nameType"int"/gt lt/Rangegt lt/Propertygt
ltStructure id"id-s2" name"TrackingStructure"gt
ltFormalModelgt ltConstraint type"temporal"
validation"full" method"system
parser"XMLSchema"gt ltxsdsequence
minOccurs"0" maxOccurs"unbounded"gt
ltxsdelement name"Detection" type"DetectionType"
/gt lt/xsdsequencegt lt/Constraintgt
lt/FormalModelgt lt/Structuregt
17XML Syntax (Descriptions)
- ltTracking type"LocatedDescriptorType"
nbDetection"1"gt - ltDescriptionRelationshipgt
- ltLocationgt
- ltavdlBegin timeRef"147329280"/gtltavdlEnd
timeRef"147329280"/gt - lt/Locationgt
- ltMedia id"CPB86006610.mpg"
name"CPB86006610.mpg" contentID"CPB86006610.mpg"
/gt - lt/DescriptionRelationshipgt
ltStructure constraintType"temporal"gt
ltDetection type"LocatedDescriptorType"
nbFeature"1"gt ltDescriptionRelationshipgt
ltLocationgt ltavdlInstant
timeRef"147329280"/gt lt/Locationgt
ltMedia id"CPB86006610.mpg" name"CPB86006610.mpg"
contentID"CPB86006610.mpg"
frameHeight"288" frameWidth"352"/gt
lt/DescriptionRelationshipgt ltStructure
constraintType"spatial"gt ltFeature
xsitype"FaceType"gt ltDescriptionRelatio
nshipgt ltLocationgt
ltavlBoundingBoxgt ltavdlNE
numX"92" denX"352" numY"217" denY"288"/gt
ltavdlNW numX"92" denX"352"
numY"267" denY"288"/gt ltavdlSE
numX"136" denX"352" numY"217" denY"288"/gt
ltavdlSW numX"136" denX"352"
numY"267" denY"288"/gt
lt/avdlBoundingBoxgt lt/Locationgt
...
18.NET implementation
ds-17.dll
read/write
Memory
.NET instanciation
parsing
parsing
ds-17.xml
d-162.xml
Description Schemes
Descriptions
19Two kinds of applications
- Static Description Schemes
- DS are well-known
- The developer uses generated libraries
- Dynamic Description Schemes
- DS are created by the application
- Use of the dynamic instantiation mechanism
(reflexivity) of .NET
20Carrying out the scenario
- Definition of new descriptors and properties
- associating behavior with the corresponding
classes - performing reasoning on the descriptions with the
formal definitions in OWL - Definition of logical and temporal structures
- the description is controlled and validated by a
grammar
21Conclusion and Future Work
- AVDL a reduced yet extensible Audio-Visual
Description Language - descriptors, properties, structures
- XML syntax and DL semantics
- .NET implementation and APIs
- About structure validation
- which constructors used ? which semantics ?
- Trade-of expressivity vs calculability
- OWL Full is undecidable
- constraints satisfaction problems can be complex