Effect%20of%20Linearization%20on%20Normalized%20Compression%20Distance - PowerPoint PPT Presentation

About This Presentation
Title:

Effect%20of%20Linearization%20on%20Normalized%20Compression%20Distance

Description:

Kolmogorov Complexity is an emerging similarity metric ... Li (2004): successful clustering of phylogeny trees, music, text files. 1D to 2D data? ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 42
Provided by: jonathanm7
Category:

less

Transcript and Presenter's Notes

Title: Effect%20of%20Linearization%20on%20Normalized%20Compression%20Distance


1
Effect of Linearization on Normalized Compression
Distance
  • Jonathan Mortensen
  • Julia Wu
  • DePaul University
  • July 2009

2
Introduction
  • Kolmogorov Complexity is an emerging similarity
    metric
  • Transformation Distance
  • Universal Similarity Measure
  • Does not require feature identification and
    selection
  • How can it be applied to images?
  • CBIR, Classification
  • Investigate its effectiveness
  • Discovered some fundamentals have been overlooked
    thus far

3
Outline
  • Background
  • Kolmogorov Complexity and Complearn
  • Research Topics
  • Spatial Transformations
  • Intensity Transformations
  • Image Groupings
  • Conclusion
  • Future Work

4
Background
  • Li (2004) successful clustering of phylogeny
    trees, music, text files
  • 1D to 2D data?
  • Tran (2007) NCD not a good predictor of visual
    indistinguishability
  • Only one photograph used, one type of
    linearization (row-by-row)
  • Gondra (2008) CBIR using NCD produced
    statistically significant measures against H0 of
    random retrieval and other similarity measures
  • Test set of hundreds of images, inconsistent
    methods of compression and concatenation,
    linearization unclear

5
Kolmogorov Complexity
  • K(x) The length of the shortest program or
    string x to produce x
  • K(xy) - The shortest binary string to convert
    output x given input y
  • E(x,y)maxK(xy),K(yx)
  • Normalized Information Distance

6
Kolmogorov Complexity
  • Universal, in that it captures all other
    semi-computable normalized distance measures
  • Therefore also semi-computable
  • Compression losslessly simplifies strings, and
    therefore is used as an approximation, C(x)

The human brain is incapable of creating
anything which is really complex.--Kolmogorov,
 A.N., Statistical Science, 6, p314, 1990
7
CompLearn
  • Open Source package which implements K-Complexity
  • Developed by Rudi Cilibrasi, Anna Lissa Cruz,
    Steven de Rooij, and Maarten Keijzer
  • Uses basic linux compression tools to develop the
    comparison map

8
(No Transcript)
9
Images from Google Similar Images
10
Initial Questions
  • Linearization Methods and Alternatives
  • How to Preserve a 2D signal
  • Linearizations affect NCD on spatial
    transformations and intensity shifts
  • Do additional feature images lower NCD?
  • CBIR Can K-Complexity be used with feature
    vectors or image semantics

11
Spatial Transformations
  • Applied 4 types of linearization to 800 images
    (original and 7 transformations)
  • Found that each linearization type produced
    distinctly different NCDs
  • Certain linearizations result in lower NCDs for
    certain transformations

12
Linearization Methods
Row Major
Column Major
Hilbert-Peano SPC Images transformed to 128x128
SCPO Images transformed to 35 of original size
13
Spatial Transformations
Original Image
Down Shift
Left Shift
180 rotation
90 rotation
270 rotation
Reflection Y Axis
Reflection X Axis
14
Intensity Transformations
  • Additive Constant
  • Three types of noise
  • Gaussian
  • Speckle
  • Salt and Pepper
  • Least Significant Bit (LSB) Steganography
  • Contrast Windowing

15
Additive Constant
Image 937.jpg 32 and 64 respectively
  • P Intensity Constant
  • 4, 8, 12 100
  • 16 bit
  • 255 (4)-gt 259
  • Truncation
  • 255 (4)-gt 255
  • Wrap
  • 255 (4)-gt 4

16
Additive Constant
17
Various Noise
 
  • Gaussian (Statistical)
  • Speckle (Multiplicative)
  • Salt and Pepper (Drop-off)

0.32 and 0.64 Variance/Noise Density Respectively
18
Noise Cont
  • Gaussian and Speckle Noise dont compress well
  • Gaussian and Salt Pepper experience some
    posterior decay

19
Least Significant Bit Steganography
  • Hide4PGP
  • Scrambles message
  • Changes pixel bit to most similar color with
    opposite bit assignment
  • Spreads secret data over entire file
  • True Grayscale Changes two bits per pixel

Image with No Text
Image hiding Gettysburg Address
20
LSB Steganography
21
Hamming Distance
22
(No Transcript)
23
Contrast Windowing
  • Computed Tomography image enhancement that
    increases contrast in certain structures
  • Brief Medical Exploration

24
Contrast Windowing
  • Bone Window (300 HU, width 1500 HU)
  • Lung Window (-200 HU, width 2000 HU)
  • Patient 5 Original Image top left
  • Soft Tissue Window (50 HU, width 350 HU)

25
P1
P3
original bone lung tiss
p1 0 1.028241 1.049258 1.02429
bone 1.028241 0 1.036157 1.011354
lung 1.049258 1.036157 0 1.039524
tiss 1.02429 1.011519 1.039524 0
p3 0 1.02097 1.043942 1.025635
bone 1.020539 0 1.037073 1.014142
lung 1.044137 1.037073 0 1.037244
tiss 1.026016 1.014354 1.037244 0
p5 0 1.020947 1.047888 1.023039
bone 1.020947 0 1.038712 1.019146
lung 1.047888 1.038712 0 1.036131
tiss 1.023039 1.019924 1.036131 0
P5
26
Cross Dicom Comparison
p1tiss p1lung p1bone p1 p3tiss p3lung p3bone p3 p5tiss p5lung p5bone p5
p1tiss 0.0000 1.0395 1.0115 1.0243 0.9739 1.0390 1.0157 1.0223 0.9813 1.0325 1.0066 1.0234
p1lung 1.0395 0.0000 1.0362 1.0493 1.0362 0.9772 1.0361 1.0485 1.0410 0.9853 1.0412 1.0477
p1bone 1.0114 1.0362 0.0000 1.0282 1.0158 1.0378 0.9642 1.0278 1.0197 1.0365 0.9761 1.0247
p1 1.0243 1.0493 1.0282 0.0000 1.0255 1.0460 1.0258 0.9811 1.0258 1.0455 1.0240 1.0025
p3tiss 0.9741 1.0362 1.0168 1.0255 0.0000 1.0372 1.0144 1.0260 0.9810 1.0328 1.0140 1.0222
p3lung 1.0390 0.9772 1.0378 1.0460 1.0372 0.0000 1.0371 1.0441 1.0434 0.9874 1.0418 1.0513
p3bone 1.0137 1.0361 0.9650 1.0258 1.0141 1.0371 0.0000 1.0205 1.0175 1.0360 0.9728 1.0220
p3 1.0238 1.0485 1.0271 0.9811 1.0256 1.0439 1.0210 0.0000 1.0278 1.0414 1.0218 0.9997
p5tiss 0.9932 1.0410 1.0180 1.0258 0.9821 1.0434 1.0172 1.0278 0.0000 1.0361 1.0199 1.0230
p5lung 1.0325 0.9853 1.0365 1.0455 1.0328 0.9874 1.0360 1.0414 1.0361 0.0000 1.0387 1.0479
p5bone 1.0062 1.0412 0.9757 1.0240 1.0142 1.0418 0.9724 1.0217 1.0191 1.0387 0.0000 1.0209
p5 1.0234 1.0477 1.0247 1.0025 1.0222 1.0513 1.0220 0.9997 1.0230 1.0479 1.0209 0.0000
27
Conclusion "How Many" vs "How Little"
  • NCD for Ordinal Comparisons
  • Numerical Redundancy

Selective
Entire Picture
Gaussian Speckle Noise
Salt and Pepper Noise
Steganography
Additive Constants
Contrast Windowing
Larger NCD
Smaller NCD
28
Feature Image Comparison and Grouping
  • Feature Image Pixel based values derived from
    the original image
  • 3 Main Types of Linearization
  • Avg NCD inter gt Avg NCD intra
  • The greater inter - intra, the better NCD finds
    groupings

29
Feature Image Linearization
  • Image-At-Once row-order one feature image at a
    time
  • Row Concatenation Appends all images, then
    performs row-order linearization
  • Pixel Order Selects value from same pixel of
    each feature image in row-order fashion
  • Gray Row-Major Grayscales an image and follows
    row-order on intensities

30
(No Transcript)
31
Data Set and Methods
  • Corel Image Database with 10 predefined groupings
  • Linearized by 5 methods
  • NCDs were found within a group and then to the
    left and to the right

32
(No Transcript)
33
Results
  • Nearly every linearization produced statistically
    different NCDs
  • Intra Group was always less than Inter Group
  • Gray provided the greatest difference Inter-Intra
  • Thought this was due to filesize
  • Triple Concated Gray creating equal filesize
    Found an even greater difference

34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
Conclusion
  • NCD is a good model for predefined human
    groupings and linearization has little impact on
    this
  • Gray-Triple Row-Major may be the best form of
    linearization
  • Direction of concatenation does not matter
  • Defined a methodology for any number of feature
    images

38
Conclusion
  • Compressor Errors
  • Numerical Redundancy
  • Ordinal Variables vs Nominal Variables
  • EX 195 195 195 195 ltgt 198 198 198 198
  • NCD 0.100000
  • 199 199 199 199 ltgt 202 202 202 202
  • NCD  0.128205
  • NCD needs refinement
  • 2D image as a 1D string?

39
Future Work
  • Image Scaling and Normalization
  • Additional Feature Images
  • New Forms of Image concatenation
  • Investigate Compressors (Numeric?)

40
References
  • A. Itani and D. Manohar. Self-Describing
    Context-Based pixel ordering. Lecture notes in
    computer science, pages 124134, 2002.
  • M. Li, X. Chen, X. Li, B. Ma, and P. M.B Vitnyi.
    The similarity metric. IEEE.Transactions on
    Information Theory, 5012, 2004.
  • R. Dafner, D. Cohen-Or, and Y. Matias.
    Context-based space lling curves. In Computer
    Graphics Forum, volume 19, pages 209218.
    Blackwell Publishers Ltd, 2000.
  • R. Cilibrasi, Anna L. Cruz, Steven de Rooij, and
    Maarten Keijzer. CompLearn home.
    http//www.complearn.org/.
  • R. Cilibrasi, P. Vitanyi, and R. de Wolf.
    Algorithmic clustering of music. Arxiv preprint
    cs.SD/0303025, 2003.
  • N. Tran. The normalized compression distance and
    image distinguishability. Proceedings of SPIE,
    649264921D, 2007.
  • I. Gondra and D. R. Heisterkamp. Content-based
    image retrieval with the normalized information
    distance. Computer Vision and Image
    Understanding, 111(2)219228, 2008.

41
Questions
Write a Comment
User Comments (0)
About PowerShow.com