Скачать презентацию Indexing and Binning Large Databases Abstract Скачать презентацию Indexing and Binning Large Databases Abstract

24e1d1e73ac9758c24d245d18325265c.ppt

  • Количество слайдов: 41

Indexing and Binning Large Databases Indexing and Binning Large Databases

Abstract § § § Problems with large databases § Biometric identification (1: N Matching) Abstract § § § Problems with large databases § Biometric identification (1: N Matching) does not scale well with size § No established way to organize high dimensional biometric data Proposed Solution § Reduce search space before 1: N matching § Divide the database using Clustering Techniques Contributions § We analyze the effect of implementing a binning scheme on search performance and accuracy § We present binning and pruning approaches using multiple biometrics § Using hand geometry and signature, we have achieved a search space reduction of 95% without any FRR

Background § § § Only biometric identification (1: N matching) can prevent duplicate enrollments, Background § § § Only biometric identification (1: N matching) can prevent duplicate enrollments, double dipping Biometrics are being deployed for immigration and national ID applications § US-VISIT program § Voter ID and national ID programs[3] Potential size that can run into millions Current research is focused only on accuracy Apart from accuracy, scalability, speed and efficiency also become important at this scale

Challenges Textual/Numeric Data § § Data is scalar(1 D) Textual/numeric data can be linearly Challenges Textual/Numeric Data § § Data is scalar(1 D) Textual/numeric data can be linearly ordered and therefore easily indexed Biometric Data § § Biometric templates are high dimensional No linear ordering or sorting methods exists for biometric data

Search space analysis § As number of stored templates increases, template density (TD) also Search space analysis § As number of stored templates increases, template density (TD) also increases

Identification problem §Number of false positives grows geometrically with the size of the database Identification problem §Number of false positives grows geometrically with the size of the database §Let FAR and FRR be the False Acceptance Rate (probability) and False Reject Rate (probability) for 1: 1 matching § For a 1: N matching, §The total number of False Accepts is given by

State of the Art Biometrics State of the art Research Problems Fingerprint 0. 15% State of the Art Biometrics State of the art Research Problems Fingerprint 0. 15% FRR at 1% FAR (FVC 2002) §Fingerprint Enhancement §Partial fingerprint matching Face Recognition 10% FRR at 1% FAR (FRVT 2002) §Improving accuracy §Face alignment variation §Handling lighting variations Hand Geometry 4% FRR at 0% FAR (Transport Security Administration Tests) §Developing reliable models §Identification problem Signature Verification 1. 5%(IBM Israel) §Developing offline verification systems §Handling skillful forgeries Voice Verification <1% FRR (Current Research) §Handling channel normalization §User habituation §Text and language independence

State of the Art Biometrics State of the art Research Problems Fingerprint 0. 15% State of the Art Biometrics State of the art Research Problems Fingerprint 0. 15% FRR at 1% FAR (FVC 2002) §Fingerprint Enhancement §Partial fingerprint matching Face Recognition 10% FRR at 1% FAR (FRVT 2002) §Improving accuracy §Face alignment variation §Handling lighting variations Hand Geometry 2. 6% FRR at 0. 02% FAR (CUBS, SUNY-Buffalo) §Developing reliable models §Identification problem Signature Verification 1. 5%(IBM Israel) §Developing offline verification systems §Handling skillful forgeries Voice Verification <1% FRR (Current Research) §Handling channel normalization §User habituation §Text and language independence

Identification problem (contd. ) § § § Even if FAR = 0. 0001%, False Identification problem (contd. ) § § § Even if FAR = 0. 0001%, False accepts = 1 in 10 for N=100000(lower bound) in the identification case. No single biometric is capable of meeting this security requirement individually Ways to reduce identification errors: § Reduce FAR § FAR is limited by feature representation and the recognition algorithm § Cannot be indefinitely reduced § Reduce N § Classify or index the biometric database. (e. g Henry classification system for fingerprints) § Index the records based on meta-data § Can we do better?

Fingerprint Features Fingerprints can be classified based on the ridge flow pattern Fingerprints can Fingerprint Features Fingerprints can be classified based on the ridge flow pattern Fingerprints can be distinguished based on the ridge characteristics 65% of fingerprints belong to the Loop class

Henry Classification of Fingerprints § [Ratha et al, 1996] used Henry Classification on database Henry Classification of Fingerprints § [Ratha et al, 1996] used Henry Classification on database of 1800 templates, tested on 100 templates § Search Space: 25%; FRR: 10% § [Jain, Pankanti, 2000] similar experiment on database of 700 templates achieved FRR: 7. 4% (Focus on classification only) § State-of-art Fingerprint classification system [Capelli, Maio, Maltoni, Nanni, 2003] has FRR 4. 8% for 5 class problem and 3. 7% for 4 class problem § Though natural class exists, still classification is non-trivial § Natural classes do not exist for biometrics like Hand Geometry § Need more sophistication for partitioning database

Analysis of search space reduction We can improve performance by reducing the search space Analysis of search space reduction We can improve performance by reducing the search space during identification § Let PSYS – Penetration rate [between 0. 0 and 1. 0] § Penetration rate is the average fraction of the database searched during identification § Effective size = N*PSYS § For a 1: N matching, § § The total number of False Accepts is given by §State of the art fingerprint systems has PSYS=0. 5

Effect of binning on accuracy § For PSYS < 0. 2, the false accepts Effect of binning on accuracy § For PSYS < 0. 2, the false accepts are almost constant § Query response time improves by a factor of § Capabilities of a low FAR system § Will allow us to screen immigrants at airports § Will make biometric systems more user-friendly by eliminating the need to remember PINs and IDs PSYS

Binning § Binning can be used to achieve a smaller PSYS § Partition the Binning § Binning can be used to achieve a smaller PSYS § Partition the feature space § Each bin is represented by a cluster center CK § Records are compared with only NB cluster centers § Bin representatives are computed offline during training § Challenges § How to handle clustering of large databases? § How to handle additions and deletions?

Tradeoff § § § Although binning reduces search space, it introduces another source of Tradeoff § § § Although binning reduces search space, it introduces another source of identification error : Bin Miss If the bin in which the user record exists is not searched, then FRR is generated no matter how good the matcher is If P(B) is the probability of getting the correct bin Binning increases the probability of False Rejects Not tolerable in security and screening applications Solution: § Use K-means clustering to find K bins § Check Ns nearest bins for the record, such that P(B) = 1

Formal definition of Binning § In general a biometric template may be represented as Formal definition of Binning § In general a biometric template may be represented as a vector § Vectors are represented into N distinct clusters; each represented by a ‘code book vector’ § The code book vectors divide the feature space into N distinct Voronoi regions § Every template is closest to the mean (codebook vector) of the region it belongs to

Search Space Partition: Voronoi Regions Search Space Partition: Voronoi Regions

Hand Geometry Template Feature extraction stages • Image capture • Binarization • Contour Extraction Hand Geometry Template Feature extraction stages • Image capture • Binarization • Contour Extraction • Noise Removal 35 Features are extracted • 25 directly measured features • 10 ratio and perimeter features

Signature Template 11 Features Extracted • Regression Constants b 0, b 1 • Compactness Signature Template 11 Features Extracted • Regression Constants b 0, b 1 • Compactness • Signature Length • Major Stroke Angle • Connected Components • Hole Count • Hole Area • Stroke Count • Signing Time

Results 11 – Dimensional Signature data Best Penetration: 35. 57% for 6 bins FRR Results 11 – Dimensional Signature data Best Penetration: 35. 57% for 6 bins FRR = 0% 35 – Dimensional Hand Geometry data Best Penetration: 35. 8% for 6 bins FRR = 0% Dataset 250 Training Set & 250 Testing Set

Multi-modal approach § Resulting bins have very high template densities § A different biometric Multi-modal approach § Resulting bins have very high template densities § A different biometric modality should be used to classify templates within a bin § Multimodal biometrics § Using multiple biometrics improves accuracy § It is difficult to forge multiple biometrics § Composite templates reduce template density § Statistical independence ensures that individual binning results are diverse § The search space (intersection of bins) is reduced due to low commonality between the individual binning results

Multi-Modal Approach Multi-Modal Approach

Multi-Modal Approach Search Space: 5% original database size; FRR – 0% Multi-Modal Approach Search Space: 5% original database size; FRR – 0%

Results of Combination Best combined penetration rate of 5% Dataset 250 Training Set & Results of Combination Best combined penetration rate of 5% Dataset 250 Training Set & 250 Testing Set

Binning v/s Indexing § Applications can have frequent insertions of new templates § Binning Binning v/s Indexing § Applications can have frequent insertions of new templates § Binning works well when database is static § Insertions will require re-partitioning the entire database § Indexing can be used in both – static and dynamic database scenarios § Trees are commonly used for indexing § Extend the concept of indexing relational databases to indexing biometric databases § Much more challenging – no concept of primary key exists in biometric templates!

Pyramid Technique spatial hashing § § Determine the Pyramid (i) within with which the Pyramid Technique spatial hashing § § Determine the Pyramid (i) within with which the template lies Determine height (h) of template from the apex The 1 -D value = Pyramid Number (i) + Height (h) Indexing done using B+ Trees

Various Indexing Techniques Grid Files R Tree Pyramid Technique R+ Tree KD Tree X Various Indexing Techniques Grid Files R Tree Pyramid Technique R+ Tree KD Tree X Tree

Comparative Study Method Scalable Order Invariant Dynamic Range Query No Overlap Grid File Y Comparative Study Method Scalable Order Invariant Dynamic Range Query No Overlap Grid File Y Y N N Y R Tree Y N N R* Tree Y N N R+ Tree Y N N N Y KD Tree Y N N N Y X Tree Y N Y Y Y Pyramid Tech Y Y Y

Results of Indexing 35 – Dimensional Hand Geometry data Best Penetration: 27% FRR = Results of Indexing 35 – Dimensional Hand Geometry data Best Penetration: 27% FRR = 0% Dataset 450 Training Set & 450 Testing Set § Parallel combination with signature will further reduce the search space

Multimodal Biometrics Multimodal Biometrics

2 D Biometric: Signature & Fingerprint Fusion Impostor Score Pairs True Match Score Pairs 2 D Biometric: Signature & Fingerprint Fusion Impostor Score Pairs True Match Score Pairs

Optimal Fusion Algorithm Signature Fused With Fingerprint True Match Score Pairs Unrealizable Performance Area Optimal Fusion Algorithm Signature Fused With Fingerprint True Match Score Pairs Unrealizable Performance Area Fusion Algorithm Accuracy (1 -FRR) Optimal Fusion ROC Suboptimal Performance Area False Accept Rate (FAR) Impostor Score Pairs The ROC is the boundary between what is possible and suboptimal performance.

Optimal Fusion Algorithm Decision Regions 2 nd Biometric Score Axis 99. 04% Accuracy @ Optimal Fusion Algorithm Decision Regions 2 nd Biometric Score Axis 99. 04% Accuracy @ Specified FAR of 1 in a Million 1 st Biometric Score Axis irregular decision region boundary due to finite sample size the more data the smoother the boundaries Match Zone No-Match Zone

RSS Fusion Algorithm for Fingerprint & Signature Provides A Suboptimal Performance ROC Optimal ROC RSS Fusion Algorithm for Fingerprint & Signature Provides A Suboptimal Performance ROC Optimal ROC True Match Score Pairs RSS Fusion Accuracy (1 -FRR) RSS Fusion ROC False Accept Rate (FAR) Impostor Score Pairs

RSS Fusion Decision Regions 2 nd Biometric Score Axis 96. 11% Accuracy @ Specified RSS Fusion Decision Regions 2 nd Biometric Score Axis 96. 11% Accuracy @ Specified FAR of 1 in a Million 1 st Biometric Score Axis Match Zone No-Match Zone

OR Fusion Algorithm for Fingerprint & Signature Provides A Suboptimal Performance ROC Optimal ROC OR Fusion Algorithm for Fingerprint & Signature Provides A Suboptimal Performance ROC Optimal ROC True Match Score Pairs OR Fusion Accuracy (1 -FRR) OR Fusion ROC False Accept Rate (FAR) Impostor Score Pairs

OR Fusion Decision Regions 2 nd Biometric Score Axis 96. 85% Accuracy @ Specified OR Fusion Decision Regions 2 nd Biometric Score Axis 96. 85% Accuracy @ Specified FAR of 1 in a Million 1 st Biometric Score Axis Match Zone No-Match Zone

AND Fusion Algorithm for Fingerprint & Signature Provides A Suboptimal Performance ROC Optimal ROC AND Fusion Algorithm for Fingerprint & Signature Provides A Suboptimal Performance ROC Optimal ROC True Match Score Pairs AND Fusion Accuracy (1 -FRR) AND Fusion ROC False Accept Rate (FAR) Impostor Score Pairs

AND Fusion Decision Regions 2 nd Biometric Score Axis 62. 91% Accuracy @ Specified AND Fusion Decision Regions 2 nd Biometric Score Axis 62. 91% Accuracy @ Specified FAR of 1 in a Million 1 st Biometric Score Axis Match Zone No-Match Zone

ROC ROC

Thank You Thank You