Title: A Framework of Multimedia Data Mining and Knowledge Management
1A Framework of Multimedia Data Mining and
Knowledge Management
- PI Cheng-Chang Lu
- Students Qiyu Zhang, Mingming Lu
- Collaborting Group Arvind Bansal
2Data Mining and Knowledge Management
- Processing multimedia objects
- Defining and extracting features
- Feature dimension reduction
- Multimedia data retrieval
- Knowledge representation and management
3Current Tasks
- Off-line data training
- Segment images batch mode
- Find region of interest (ROI)
- Interface with feature extraction and analysis
- Feature domain processing
4Current Tasks (cont.)
- Users Interfaces
- Reading user-input images
- Segmentation
- Find ROI
- Feature extraction of ROI
- Compare with trained data in repository
- Return data (images) satisfying certain criteria
5Data Training
Image Domain
Feature Domain
Interface
Segmentation
Finding ROI
Feature Extract
Dimension Reduction
Image Feature Data Repository
Sending Images for Processing
Store Feature Data back
6Image Domain Procesisng
- Segmentation Color VQ, Texture based image
segmentation - Find ROI
- ROI occupies large area
- ROI locates near the image center
- ROI contains homogenous texture
7Color-Texture SegmentationApplications
- Identify Regions of Interest (ROI) in a scene
- Image classification
- Image annotation
- Object based image and video coding
8Color-Texture SegmentationCurrent Limitations
- Many existing techniques work well on homogeneous
color regions, while natural scenes are rich in
color and texture. - Many texture segmentation algorithms require the
estimation of texture model parameters, which is
a difficult problem and often requires a good
homogeneous region for robust estimation.
9Color-Texture SegmentationAdvantage of Color VQ
and Texture based segmentation
- Does not attempt to estimate a specific model for
a texture region. - Tests for the homogeneity of a given
color-texture pattern, which is computationally
more feasible than estimation of model parameters.
10Color-Texture SegmentationTwo-Step Process
- Color Quantization
- Performed in the color space without
consideration of spatial distribution of colors. - Label each pixel with a quantized color to form a
class-map. - Spatial Segmentation
- Performed on the class-map
11Color-Texture SegmentationColor Quantization
- Use Peer Group Filtering
- As a result, coarse quantization can be obtained
while preserving the color information in the
original images. - Usually 10-20 colors are needed in the images of
natural scenes.
12Color-Texture SegmentationCriteria for Good
Segmentation
13Color-Texture Segmentation-A Criterion for Good
Segmentation
- When the color classes are more separated from
each other, J is getting larger. - If all color classes are uniformly distributed
over the entire image, J tends to be small.
14Color-Texture SegmentationA Criterion for Good
Segmentation
- Now let us recalculate J over each segmented
region instead of the entire class-map and define
the average by - A segmentation which can minimize J is
considered a good segmentation.
15Color-Texture Segmentation-Spatial Segmentation
- Seed Determination
- Seed Growing
- Region Merge
16Color-Texture Segmentation-Spatial Segmentation
17ROI Determination
- Find ROI Mechanism
- Pixel closer to the center contributes more
weight to the region it belongs to. - Region with more pixels tends to get higher weight
18Results of Image Domain Processing
- Results of Color Quantization
- Results of Finding ROI
19Interface with Feature Domain
- Find the rectangle circumscribing the ROI
- Store its coordinate information into to a
temporary file for feature domains use.
20Feature Domain(Overview)
- Two Stages
- Feature Extraction
- Dimension Reduction (DR)
Feature Domain
Interface
Image Domain
Feature Extract
Dimension Reduction
Image Feature Data Repository
Store Feature Data back
21Implementations
- Acquire ROI information from the image domain
- Extract features based on Gabor Filter and color
histogram on HSV space - Integrate two feature spaces
- Reduce the high feature dimensions to a very low
number
22Implementations (cont.)
- Calculate the similarity measurement between the
query object and the objects in the image
repository - Search the similar images in the repository based
on similarity index - Output the corresponding retrieval images
- Knowledge extraction
23Feature Extraction Algorithm
- Gabor Filter Feature
- One of the most important wavelets with
multi-scale and multi-resolution - Mainly reflect texture information
- Color histogram on HSV space
- Provide color features
24Gabor Filter Concept
- A complete but non-orthogonal basis wavelet set
- A significant aspect localized frequency
description composed of space information
25Gabor(cont.)
- A two dimensional Gabor function g(x, y) and its
Fourier transform G(u, v) can be written as
26Gabor(cont.)
- Let g(x, y) be the mother Gabor wavelet, then
this self-similar filter dictionary can be
obtained by appropriate dilations and rotations
of g(x, y) through the generating function
27Color Histogram in HSV Space
- HSV color space includes
- Hue (H)
- Saturation (S)
- Value (V or Lightness)
- Only consider Hue and saturation information,
since the lightness of pictures is very sensitive
to the surrounding conditions.
28HSV space Figure
29HSV space bands
- Design bands in the HSV space
- 8 hue bands
- 4 saturation bands,
- Total 32 sub-spaces
- Compute color histogram feature in each sub-space
to form 32 feature dimensions eventually
30Feature Integration
- Normalize both Gabor filter and HSV color
histogram features - Set a weight factor to balance two feature
spaces. Usually Gabor filter features will have
the bigger weight value.
31DR Algorithm
- Disadvantages in the high dimension space
- The computational complexity arise sharply
- The database indexing becomes difficult
- Principal Component Analysis (PCA)
- PCA seeks to reduce the dimension of the data by
finding a few orthogonal linear combinations
(Principal Component PC)
32DR implementation
- Original feature dimensions
- Gabor filter features 652 60
- HSV color histogram features 48 32
- Total dimensions 92
- Feature dimensions after DR
- 10 15 dimensions
33Simulation Results in the Feature Domain
- We randomly select 11 query pictures as the test
samples in this report. - At each query time, at most 14 retrieval pictures
are retrieved. - The minimum square error method is served as the
similarity measurement. - The value in the tables as below means the
positive pictures out of the 14 retrieval
pictures.
34Performance between different feature extraction
techniques
- the integration of Gabor Filter and HSV color
Histogram gains the better performance. - See pictures in detail. Click here
35Performance between with and without DR applied
- The performance after DR applied slightly
degrades on average in comparison to the results
before DR takes on stage - See pictures in detail. Click here
36More Simulations
- Performance between different weight used
- Performance between different dimensions retained
after DR
37Final Integration Results
- Simulation results when both the image domain and
the feature domain are used - See the detail pictures, Click here
38Integration
- UAV media capture and analysis
- WWW based media analysis
- Vehicle based media capture and analysis
39Future ResearchExtended to video objects
- Object based video coding
- Non-object based video coding
- Video indexing
- Knowledge extraction and management
40Future ResearchData Fusion
- Multimodality medical imaging
- CT Structural information
- PET Functional information
- Fusion
- Knowledge management