MPEG-7 Audio Overview - PowerPoint PPT Presentation

Loading...

PPT – MPEG-7 Audio Overview PowerPoint presentation | free to download - id: b9d39-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

MPEG-7 Audio Overview

Description:

Home Entertainment (e.g., management of personal multimedia collections, ... Musical Instrument Timbre. Melody. General Sound Recognition and Indexing. Spoken Content ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 41
Provided by: jiao6
Learn more at: http://www.music.mcgill.ca
Category:
Tags: mpeg | audio | overview

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: MPEG-7 Audio Overview


1
MPEG-7 Audio Overview
  • Beinan Li
  • MUMT 611 Week 2
  • 2005. 1. 20

2
Content
  • MPEG-7 overview
  • What is
  • Why?
  • Objectives and scope
  • Main elements and organization.
  • MPEG-7 Audio
  • Low-level features
  • High-level tools

3
What is MPEG-7
  • "Multimedia Content Description Interface
  • ISO/IEC standard by MPEG (Moving Picture Experts
    Group)
  • Providing meta-data for multimedia
  • MPEG-1, -2, -4 make content available MPEG-7
    makes content accessible, retrievable,
    filterable,
  • manageable (via device /
    computer).
  • Multi-degrees of interpretation of informations
    meaning
  • Support as broad a range of applications as
    possible.
  • A compatible (with existing tech) and extensible
    standard.

4
Why MPEG-7
  • The value of information often depends on how
    easy it can be found, retrieved, accessed,
    filtered and managed.
  • Past poverty of the digital multimedia sources
    -gt Simplicity of the access mechanisms
  • Now growing amount of audiovisual information-gt
    Identifying and managing them efficiently is
    becoming more difficult.
  • e.g. record only news about sport.

5
Why MPEG-7
  • For future multimedia services, content
    representation and description may have to be
    addressed jointly.
  • Many services dealing with content representation
    will have to deal first with content description
  • a non-described content may be useless
  • Need for access only to the content description
  • New original services (e.g. optimizing personal
    time)
  • Adaptation to networks and terminal capabilities

6
Applications domains (incomplete)
  • Broadcast media selection (e.g., radio channel,
    TV channel).
  • Digital libraries (e.g., film, video, audio and
    radio archives).
  • E-Commerce (e.g., personalized advertising).
  • Education (e.g., repositories of multimedia
    courses, multimedia search for support material).
  • Home Entertainment (e.g., management of personal
    multimedia collections, including manipulation of
    content, e.g. karaoke).
  • Journalism (e.g. searching speeches of a certain
    politician using his name, his voice or his
    face).
  • Multimedia directory services (e.g. yellow pages,
    G.I.S).
  • Surveillance and remote sensing.

7
MPEG-7 Objectives
  • Standardize content-based description for various
    types of audiovisual information
  • Independent from media support (encoding and
    storage)
  • Different granularity
  • Low-level features shape, size, key, tempo
    changes,
  • High-level semantic info scene with a barking
    brown dog on the left and with the sound of
    passing cars in the background.
  • Meaningful in the context of the application
  • Same material -gt different types of features and
    combinations
  • e.g. timbre v.s. loudness

8
MPEG-7 Objectives
  • Information about the content
  • The form e.g. the coding format used
  • Conditions for accessing the materiale.g.
    Intellectual property rights / price
  • Classification e.g. parental rating
  • Links to other relevant materials
  • The context e.g. Olympic Games 1996, final of
    200 meter hurdles, men)
  • Information present in the content
  • Combination of low-level and high-level
    descriptors

9
Scope of the Standard
processing chain
10
An example of architecture
  • Pull (Client Queries -gt Descriptions repository
    -gt Matched Ds)
  • Push (Filter descriptions -gt Programmed actions)

11
Workplan
12
Where are the descriptions from?
  • Preservation of existing descriptive data (e.g.
    scripts) through the production/delivery
  • Generated automatically by capture devices (e.g.
    time or GPS location in a camera)
  • Extracted automatically semi-automatically
    (i.e. with some human assistance)
  • Manually produced (e.g. for legacy material such
    as existing film archives)

13
Main Elements of MPEG-7
  • Description Tools ( textual / binary )
  • Descriptors (D) define the syntax and the
    semantics of each feature (metadata element)
  • Description Schemes (DS) relationships between
    components
  • Description Definition Language (DDL)
  • Define the syntax of the MPEG-7 Description Tools
  • Creation , extension and modification of DSs
  • System tools
  • Storage and transmission, synchronization of
    descriptions with content, multiplexing of
    descriptions, etc.

14
Main Elements of MPEG-7
  • Relationship among elements introduced above.

15
Description Tools
  • Creation and production processes (director,
    title)
  • Usage (broadcast schedule)
  • Storage features.
  • Structural information (spatial-temporal
    components)
  • Segmentations
  • Low level features (sound timbres, melody
    description)
  • Conceptual information (objects and events,
    interactions)
  • Navigation and access (summaries, variations)
  • Collections of objects.
  • User-content interactions (user preferences,
    usage history)

16
Organization of Description Tools
17
Descriptions (further)
  • MPEG-7 approaches the description of content from
    several viewpoints.
  • A set of methods and tools for the different
    viewpoints of the description (not a monolithic
    system)
  • Interrelated and can be combined in many ways.
  • Associated with the content itself (searching,
    filtering)
  • Location (document V.S. stream)
  • physically located with the material
  • somewhere else on the globe (maybe not)
  • Interoperability with other metadata standards
    (XML)

18
Use of Description Tools
  • The description tools are presented on the basis
    of the functionality they provide.
  • In practice, they are combined into meaningful
    sets of description units.
  • Furthermore, each application will have to select
    a sub-set of descriptors and DSs.
  • Library of tools!
  • DDL can be used to handle specific needs of the
  • application. (like scripting in many current
    applications)

19
Major Functionalities
  • MPEG-7 Systems
  • MPEG-7 Description Definition Language
  • MPEG-7 Visual
  • MPEG-7 Audio
  • MPEG-7 Multimedia Description Schemes (D.T.)
  • Reference Software the eXperimentation Model
    (test)
  • MPEG-7 Conformance (syntax checking)
  • MPEG-7 Extraction and use of descriptions
    (technical report)

20
MPEG-7 Audio
  • Audio provides structuresbuilding upon some
    basic structures from the MDSfor describing
    audio content.
  • Low-level Descriptors
  • audio features that cut across many applications
  • High-level Description Tools
  • more specific to a set of applications.

21
Low-level Features
  • MPEG-7 Audio Framework
  • Two low-level descriptor types (for sample and
    segment)
  • Scalar (e.g. power or fundamental frequency)
  • Vector (e.g. spectra)
  • Hierarchical, consistent interface
  • Any descriptor inheriting from these types can be
    instantiated, describing a segment with a single
    summary value or a series of sampled values, as
    the application requires.
  • Scalable Series (hierarchical re-sampling)
  • Progressively down-sample the data contained in a
    series (Application-oriented)

22
Low-level Features (types)
  • Basic
  • Basic Spectral
  • Signal Parameters
  • Timbral Temporal
  • Timbral Spectral
  • Spectral Basis
  • MPEG-7 Silence Descriptor

23
Low-level Features (graph)
24
Low-level Features (details)
  • Basic (temporally sampled scalar values for
    general use)
  • AudioWaveform Descriptor
  • waveform envelope (for display purposes).
  • AudioPower Descriptor
  • temporally-smoothed instantaneous power
  • (quick summary of a signal)
  • Applicable to all kinds of signals

25
Low-level Features (details)
  • Basic Spectral (single time-frequency analysis
    of signal)
  • AudioSpectrumEnvelope (Base class)
  • the short-term power spectrum
  • (display, synthesize, general-purpose search)
  • AudioSpectrumCentroid
  • dominated by high or low frequencies ?
  • AudioSpectrumSpread
  • the power spectrum centered near the spectral
    centroid, or spread out over the spectrum?
  • pure-tone and noise-like sounds
  • AudioSpectrumFlatness (the presence of tonal
    components)

26
Low-level Features (details)
  • Signal Parameters (periodic or quasi-periodic
    signals)
  • AudioFundamentalFrequency
  • confidence measure, replacing pitch-tracking
  • AudioHarmonicity
  • distinction between sounds with a
  • harmonic / inharmonic / non-harmonic spectrum

27
Low-level Features (details)
  • Timbral Temporal (temporal characteristics of
    segments of sounds, musical timbre)
  • LogAttackTime
  • TemporalCentroid
  • where in time the energy of a signal is focused.
  • Useful when attack times are identical

28
Low-level Features (details)
  • Timbral Spectral (spectral features in a
    linear-frequency space)
  • SpectralCentroid
  • power-weighted average of the frequency
  • of the bins in the linear power spectrum.
  • distinguishing musical instrument timbres
  • 4 Ds for harmonic regularly-spaced components of
    signals
  • HarmonicSpectralCentroid
  • HarmonicSpectralDeviation
  • HarmonicSpectralSpread
  • HarmonicSpectralVariation

29
Low-level Features (details)
  • Spectral Basis (low-dimensional projections of a
    spectral space to aid compactness and
    recognition)
  • AudioSpectrumBasis
  • a series of (time-varying / statistically
    independent) basis functions derived from the
    singular value decomposition of a normalized
    power spectrum.
  • AudioSpectrumProjection
  • low-d features of a spectrum after projection
    upon a reduced rank basis.
  • independent subspaces of a spectra correlate
    strongly with different sound sources.
  • Provide more salience using less space.
  • With Sound Classification and Indexing
    Description Tools.

30
Low-level Features (details)
  • Silence segment (no significant sound)
  • aid further segmentation of the audio stream, or
    as a hint not to process a segment

31
High-level audio Description Tools (Ds and DSs)
  • Exchange some generality for descriptive
    richness
  • a smaller set of audio features (as compared to
    visual features) that may canonically represent a
    sound without domain-specific knowledge.
  • Audio Signature (DS)
  • Musical Instrument Timbre
  • Melody
  • General Sound Recognition and Indexing
  • Spoken Content

32
High-level audio Description Tools (details)
  • Audio Signature Description Scheme
  • SpectralFlatness Ds
  • a unique content identifier for the purpose of
    robust automatic identification
  • e.g. audio fingerprinting

33
High-level audio Description Tools (details)
  • Musical Instrument Timbre Description Tools
  • HarmonicInstrumentTimbre Ds
  • LogAttackTime Descriptor
  • PercussiveIinstrumentTimbre Ds
  • SpectralCentroid Descriptor

34
High-level audio Description Tools (details)
  • Melody Description Tools
  • efficient, robust, and expressive melodic
    similarity matching.
  • MelodyContour Description Scheme
  • terse, efficient melody contour / rhythm
  • MelodySequence Description Scheme
  • verbose, complete, expressive melody / rhythm.
  • Interval encoding

35
High-level audio Description Tools (details)
  • General Sound Recognition and Indexing
    Description Tools
  • SoundModel Description Scheme
  • SoundClassificationModel Description Scheme
  • a set of SoundModel DS -gt multi-way classifier
  • SoundModelStatePath Descriptor
  • indices to states generated by a SoundModel of a
    segment
  • immediately applied to sound effects
  • automatically index and segment sound tracks.
  • Low -gt mid -gt high level analyses

36
High-level audio Description Tools (details)
  • Spoken Content Description Tools
  • detailed description of words spoken within an
    audio stream.
  • indexing into and retrieval of an audio stream
  • indexing of multimedia objects annotated with
    speech.
  • Recall of audio/video data by memorable spoken
    events.
  • a character or person spoke a particular word
  • Spoken Document Retrieval
  • separate spoken documents
  • Annotated Media Retrieval
  • photograph retrieved using a spoken annotation

37
Development
  • Currently under development
  • MPEG-7 Audio COR.1 (currently at DCOR1)
  • MPEG-7 Amendment 1 (currently at FPDAM1)
  • New Audio Description Tools specified (MPEG-7
    version 2)
  • Spoken Content
  • Audio Signal Quality
  • Audio Tempo
  • Currently Proposed tools
  • Low Level Descriptor for Audio Intensity
  • Low Level Descriptor for Audio Spectrum Envelope
    Evolution
  • Generic mechanism for data representation based
    on modulation decomposition
  • MPEG-7 Audio-specific binary representation of
    descriptors

38
MPEG-7 version 1 Schedule
  • Call for Proposals October 1998
  • Evaluation February 1999
  • First version of Working Draft (WD) December
    1999
  • Committee Draft (CD) October 2000
  • Final Committee Draft (FCD) February 2001
  • Final Draft International Standard (FDIS) July
    2001
  • International Standard (IS) September
    2001

39
MPEG-7 work plan
  • See
  • Annex A of MPEG-7 Overview (version 9)
    http//www.chiariglione.org/mpeg/standards/mpeg-7/
    mpeg-7.htm

40
Annotated Link Page / References
  • http//www.music.mcgill.ca/damonli/611/611_w2.htm
About PowerShow.com