Indexing and Binning Large Databases - PowerPoint PPT Presentation

About This Presentation
Title:

Indexing and Binning Large Databases

Description:

Penetration rate is the average fraction of the database searched during identification ... Best combined penetration rate of 5% Dataset 250 Training Set & 250 ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 42
Provided by: Ami970
Category:

less

Transcript and Presenter's Notes

Title: Indexing and Binning Large Databases


1
Indexing and Binning Large Databases
2
Abstract
  • Problems with large databases
  • Biometric identification (1N Matching) does not
    scale well with size
  • No established way to organize high dimensional
    biometric data
  • Proposed Solution
  • Reduce search space before 1N matching
  • Divide the database using Clustering Techniques
  • Contributions
  • We analyze the effect of implementing a binning
    scheme on search performance and accuracy
  • We present binning and pruning approaches using
    multiple biometrics
  • Using hand geometry and signature, we have
    achieved a search space reduction of 95 without
    any FRR

3
Background
  • Only biometric identification (1N matching) can
    prevent duplicate enrollments, double dipping
  • Biometrics are being deployed for immigration and
    national ID applications
  • US-VISIT program
  • Voter ID and national ID programs3
  • Potential size that can run into millions
  • Current research is focused only on accuracy
  • Apart from accuracy, scalability, speed and
    efficiency also become important at this scale

4
Challenges
  • Textual/Numeric Data
  • Data is scalar(1D)
  • Textual/numeric data can be linearly ordered and
    therefore easily indexed
  • Biometric Data
  • Biometric templates are high dimensional
  • No linear ordering or sorting methods exists for
    biometric data

5
Search space analysis
  • As number of stored templates increases, template
    density (TD) also increases

6
Identification problem
  • Number of false positives grows geometrically
    with the size of the database
  • Let FAR and FRR be the False Acceptance Rate
    (probability) and False Reject Rate (probability)
    for 11 matching
  • For a 1N matching,
  • The total number of False Accepts is given by

7
State of the Art
Biometrics State of the art Research Problems
Fingerprint 0.15 FRR at 1 FAR (FVC 2002) Fingerprint Enhancement Partial fingerprint matching
Face Recognition 10 FRR at 1 FAR (FRVT 2002) Improving accuracy Face alignment variation Handling lighting variations
Hand Geometry 4 FRR at 0 FAR (Transport Security Administration Tests) Developing reliable models Identification problem
Signature Verification 1.5(IBM Israel) Developing offline verification systems Handling skillful forgeries
Voice Verification lt1 FRR (Current Research) Handling channel normalization User habituation Text and language independence
8
State of the Art
Biometrics State of the art Research Problems
Fingerprint 0.15 FRR at 1 FAR (FVC 2002) Fingerprint Enhancement Partial fingerprint matching
Face Recognition 10 FRR at 1 FAR (FRVT 2002) Improving accuracy Face alignment variation Handling lighting variations
Hand Geometry 2.6 FRR at 0.02 FAR (CUBS, SUNY-Buffalo) Developing reliable models Identification problem
Signature Verification 1.5(IBM Israel) Developing offline verification systems Handling skillful forgeries
Voice Verification lt1 FRR (Current Research) Handling channel normalization User habituation Text and language independence
9
Identification problem (contd.)
  • Even if FAR 0.0001, False accepts 1 in 10
    for N100000(lower bound) in the identification
    case.
  • No single biometric is capable of meeting this
    security requirement individually
  • Ways to reduce identification errors
  • Reduce FAR
  • FAR is limited by feature representation and the
    recognition algorithm
  • Cannot be indefinitely reduced
  • Reduce N
  • Classify or index the biometric database. (e.g
    Henry classification system for fingerprints)
  • Index the records based on meta-data
  • Can we do better?

10
Fingerprint Features
Fingerprints can be classified based on the ridge
flow pattern
Fingerprints can be distinguished based on the
ridge characteristics 65 of fingerprints belong
to the Loop class
11
Henry Classification of Fingerprints
  • Ratha et al,1996 used Henry Classification on
    database of 1800 templates, tested on 100
    templates
  • Search Space 25 FRR 10
  • Jain, Pankanti,2000 similar experiment on
    database of 700 templates achieved FRR 7.4
    (Focus on classification only)
  • State-of-art Fingerprint classification system
    Capelli,Maio,Maltoni,Nanni,2003 has FRR 4.8
    for 5 class problem and 3.7 for 4 class problem
  • Though natural class exists, still classification
    is non-trivial
  • Natural classes do not exist for biometrics like
    Hand Geometry
  • Need more sophistication for partitioning database

12
Analysis of search space reduction
  • We can improve performance by reducing the search
    space during identification
  • Let PSYS Penetration rate between 0.0 and 1.0
  • Penetration rate is the average fraction of the
    database searched during identification
  • Effective size NPSYS
  • For a 1N matching,
  • The total number of False Accepts is given by
  • State of the art fingerprint systems has PSYS0.5

13
Effect of binning on accuracy
  • For PSYS lt 0.2, the false accepts are almost
    constant
  • Query response time improves by a factor of PSYS
  • Capabilities of a low FAR system
  • Will allow us to screen immigrants at airports
  • Will make biometric systems more user-friendly by
    eliminating the need to remember PINs and IDs

14
Binning
  • Binning can be used to achieve a smaller PSYS
  • Partition the feature space
  • Each bin is represented by a cluster center CK
  • Records are compared with only NB cluster centers
  • Bin representatives are computed offline during
    training
  • Challenges
  • How to handle clustering of large databases?
  • How to handle additions and deletions?

15
Tradeoff
  • Although binning reduces search space, it
    introduces another source of identification error
    Bin Miss
  • If the bin in which the user record exists is not
    searched, then FRR is generated no matter how
    good the matcher is
  • If P(B) is the probability of getting the correct
    bin
  • Binning increases the probability of False
    Rejects
  • Not tolerable in security and screening
    applications
  • Solution
  • Use K-means clustering to find K bins
  • Check Ns nearest bins for the record, such that
    P(B) 1

16
Formal definition of Binning
  • In general a biometric template may be
    represented as a vector
  • Vectors are represented into N distinct clusters
    each represented by a code book vector
  • The code book vectors divide the feature space
    into N distinct Voronoi regions
  • Every template is closest to the mean (codebook
    vector) of the region it belongs to

17
Search Space Partition Voronoi Regions
18
Hand Geometry Template
  • Feature extraction stages
  • Image capture
  • Binarization
  • Contour Extraction
  • Noise Removal
  • 35 Features are extracted
  • 25 directly measured features
  • 10 ratio and perimeter features

19
Signature Template
11 Features Extracted
  • Regression Constants b0,b1
  • Compactness
  • Signature Length
  • Major Stroke Length
  • Major Stroke Angle
  • Connected Components
  • Hole Count
  • Hole Area
  • Stroke Count
  • Signing Time

20
Results
35 Dimensional Hand Geometry data Best
Penetration 35.8 for 6 bins FRR 0
11 Dimensional Signature data Best Penetration
35.57 for 6 bins FRR 0
Dataset 250 Training Set 250 Testing Set
21
Multi-modal approach
  • Resulting bins have very high template densities
  • A different biometric modality should be used to
    classify templates within a bin
  • Multimodal biometrics
  • Using multiple biometrics improves accuracy
  • It is difficult to forge multiple biometrics
  • Composite templates reduce template density
  • Statistical independence ensures that individual
    binning results are diverse
  • The search space (intersection of bins) is
    reduced due to low commonality between the
    individual binning results

22
Multi-Modal Approach
23
Multi-Modal Approach
Search Space 5 original database size FRR 0
24
Results of Combination
Best combined penetration rate of 5
Dataset 250 Training Set 250 Testing Set
25
Binning v/s Indexing
  • Applications can have frequent insertions of new
    templates
  • Binning works well when database is static
  • Insertions will require re-partitioning the
    entire database
  • Indexing can be used in both static and dynamic
    database scenarios
  • Trees are commonly used for indexing
  • Extend the concept of indexing relational
    databases to indexing biometric databases
  • Much more challenging no concept of primary key
    exists in biometric templates!

26
Pyramid Technique spatial hashing
  • Determine the Pyramid (i) within with which the
    template lies
  • Determine height (h) of template from the apex
  • The 1-D value Pyramid Number (i) Height (h)
  • Indexing done using B Trees

27
Various Indexing Techniques
  • Grid Files KD Tree
  • R Tree
    R Tree X Tree

Pyramid Technique
28
Comparative Study
Method Scalable Order Invariant Dynamic Range Query No Overlap
Grid File Y Y N N Y
R Tree Y N N N N
R Tree Y N N N N
R Tree Y N N N Y
KD Tree Y N N N Y
X Tree Y N Y Y Y
Pyramid Tech Y Y Y Y Y
29
Results of Indexing
  • 35 Dimensional Hand Geometry data
  • Best Penetration 27
  • FRR 0
  • Dataset 450 Training Set
    450 Testing Set
  • Parallel combination with signature will further
    reduce the search space

30
Multimodal Biometrics
31
2D Biometric Signature Fingerprint Fusion
Impostor Score Pairs
True Match Score Pairs
32
Optimal Fusion AlgorithmSignature Fused With
Fingerprint
Unrealizable Performance Area
True Match Score Pairs
Optimal Fusion ROC
Fusion Algorithm
False Accept Rate (FAR)
Suboptimal Performance Area
Impostor Score Pairs
The ROC is the boundary between what is possible
and suboptimal performance.
33
Optimal Fusion Algorithm Decision Regions99.04
Accuracy _at_ Specified FAR of 1 in a Million
2nd Biometric Score Axis
1st Biometric Score Axis
irregular decision region boundary due to finite
sample size the more data the smoother the
boundaries
34
RSS Fusion Algorithm for Fingerprint Signature
Provides A Suboptimal Performance ROC
Optimal ROC
True Match Score Pairs
RSS Fusion ROC
RSS Fusion
False Accept Rate (FAR)
Impostor Score Pairs
35
RSS Fusion Decision Regions96.11 Accuracy _at_
Specified FAR of 1 in a Million
2nd Biometric Score Axis
1st Biometric Score Axis
36
OR Fusion Algorithm for Fingerprint Signature
Provides A Suboptimal Performance ROC
Optimal ROC
True Match Score Pairs
OR Fusion ROC
OR Fusion
False Accept Rate (FAR)
Impostor Score Pairs
37
OR Fusion Decision Regions96.85 Accuracy _at_
Specified FAR of 1 in a Million
2nd Biometric Score Axis
1st Biometric Score Axis
38
AND Fusion Algorithm for Fingerprint Signature
Provides A Suboptimal Performance ROC
Optimal ROC
True Match Score Pairs
AND Fusion ROC
AND Fusion
False Accept Rate (FAR)
Impostor Score Pairs
39
AND Fusion Decision Regions62.91 Accuracy _at_
Specified FAR of 1 in a Million
2nd Biometric Score Axis
1st Biometric Score Axis
40
ROC
41
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com