- Количество слайдов: 43
An Automated System for Visual Biometrics Allerton Conference: Security - Part I September 27, 2007 Graduate Students: Derek J. Shiell, Louis H. Terry Post Doctorate: Petar S. Aleksic Principle Investigator: Professor Aggelos K. Katsaggelos ([email protected] edu, [email protected] edu, [email protected] northwestern. edu, [email protected] northwestern. edu) Northwestern University Image and Video Processing Lab Dept. of Electrical Engineering and Computer Science
Overview w System overview w Visual front-end n n n Face detection AAM tracking Feature extraction and normalization w Visual biometrics experiments n VALID database n Details and results w Future research directions
System Flowchart Image sequence Face Detection AAM Tracking Visual Feature Extraction Recognition/ Rec. Identification Result
System Flowchart Visual Front End Image sequence Face Detection AAM Tracking Visual Feature Extraction Recognition/ Rec. Identification Result Visual features are detected, tracked, normalized, extracted and recognized in real-time.
Visual Front End: Face Detection AAM Tracking Visual Feature Extraction Recognition/ Identification
Face Detection using Viola & Jones § Viola & Jones algorithm: § Train weak classifiers (Haar features) using Adaboost method § Create a strong classifier through a cascade of weak classifiers Haar Features Face Detection Results
Visual Front End: AAM Tracking Face Detection AAM Tracking Visual Feature Extraction Recognition/ Identification
Active Appearance Models In General: w Label many images of a deformable object. w Align the labeled shapes (i. e. point sets). w Compute a linear shape model. w Warp images to compute a linear texture model. w Combine the shape and texture model into a single linear appearance model Training Image with landmark points showing shape contours. Tim Cootes - http: //www. isbe. man. ac. uk/~bim/
Active Appearance Models (AAM) I. Matthews and S. Baker, "Active Appearance Models Revisited, " International Journal of Computer Vision, 2004.
Training the AAM Image Labeling • Manually labeled 303 face images with 75 landmark points. • 10 male, 10 female speakers • Various office lighting conditions. • Very Time consuming. Tim Cootes - http: //www. isbe. man. ac. uk/~bim/
Shape Alignment Procrustes Analysis 1. Remove x and y translation from all shapes. 2. Calculate average shape, Xms. 3. Solve for rotation and scale (Ri, bi) for all, N images. 4. Recalculate the mean, Xms and realign (step 3). 5. Stop when Xms stabilizes.
Shape Model • Deformations after rigid shape alignment are due to shape variation • Do PCA on the point sets to create a set of shape basis, Ps. • Given Ps and s, X can be reconstructed. • Most variation described by first few principle components (eigenvectors in Ps). (1) (3) (2)
Shape Model Visualizing how the different shape basis affect the shape. In general: First three shape modes showing how shape changes by varying +/- 3*sqrt( sj)
Shape Model Combining and projecting onto all ks shape bases generates a new shape. First three shape modes showing how shape changes by varying +/- 3*sqrt( sj)
Texture Model Visualizing how the different texture basis affect the texture: First three shape modes showing how shape changes by varying +/- 3* sqrt( tj)
Appearance Model (1) The texture and shape models can generate Unique textures and shapes depending on bs and bt (2) Weight the shape bases to match shape and texture units. (4) Concatenate all bs and bt from training to train appearance model via a 3 rd PCA.
Appearance Model Define: Given appearance parameters ba we can reconstruct the shape and texture of a deformable model.
AAM Search Procedure Algorithm 1. Initialize points to mean shape and mean appearance. 2. Warp texture from image to the mean shape and compute the difference, E, from the mean texture. 3. Update appearance model parameters, ba = R*E 4. Determine xs and xt given ba 5. Back to step 2 6. Stop when reconstruction error < thresh or max number of iterations
Original vs Reconstruction Original face Reconstructed face
Visual Front-End: Feature Extraction Face Detection AAM Tracking Visual Feature Extraction Recognition/ Identification
Mouth Region Extraction Tracked shape Normalize to reference shape with respect to in-plane rotation and scale.
Extracting Visual Features 40 x 40 ROI CT D DCT coefficients Normalized image
Recognition Results Face Detection AAM Tracking Visual Feature Extraction Recognition/ Identification
VALID Database Speakers: n 77 male, 29 female n 97 Caucasian, 5 Asian, 4 Indo-Asian n 38 with spectacles, 68 w/o spectacles n 8 with facial hair, 98 w/o facial hair Vocabulary: n n n “
Visual Speaker Identification Visual biometrics speaker identification experiment § Phrase “Joe took father’s green shoe bench out” § Compared best visual speaker identification result using AAM tracking compared to supplied hand labeled data § Shifted training/testing sets for more reliable results (43 speaker subset) § Run #1: train on videos 3, 4 and 5 test on 2 § Run #2: train on videos 2, 4 and 5 test on 3 § Etc. § Testing over 20, 40, 60, 80, 100 DCT coefficients 1, 2, 3, 4, 5 mixtures and 3, 4, 5 states
Speaker Recognition Results AAM: 60 DCT coefficients, 4 mixtures, 3 states, 59. 3% GT: 100 DCT coefficients, 3 mixtures, 3 states, 52. 3%
Speaker Recognition Results Example extracted ROIs from hand labeled data.
Future Research Directions • Improve tracking robustness. • Illumination, speakers, head paraphernalia/occlusions • Investigate other visual features for speaker recognition. • Investigate the effect of different normalization methods on the mouth region as a pre-processing step before recognition.
Thank You! Questions?
Thank You! Questions?
Thank You! Questions?
Photometric Normalization Algorithm 1. Scan all pixels intensities into vectors and find mean texture. Standardize texture 2. Standardize texture 3. Align all textures to the mean 4. Find new mean, xmt 5. Standardize xmt 6. Repeat until xmt is stable Texture Alignment
Photometric Normalization Mean Texture Profile Original Texture Profiles After scale and mean alignment
Project Details w w Linux Ubuntu environment Open. CV, Intel IPP Package, Motorola MLite++ 1000’s of lines of MATLAB and C++ code Demo computer n n n Intel Core Duo 1. 66 Ghz (using only one processor) 1 GB RAM Philips Toucam Pro at 320 x 240 resolution
Adaptive LMS Filter Update Comparison to adaptive LMS filter:
Adaptive LMS Filter Update Rearrange: Optimum Solution:
Research Goal 1. 2. 3. Develop a robust system to automatically and rapidly extract visual features from a speaker for use in audio-visual and visual speech recognition and biometrics. Compare results of recognition and biometrics using the extracted features with results from ‘ground truth’ data. Imbed speech recognition software into tracking system to allow the capability for real-time visualonly speech/speaker recognition.
AAM Update w Iteratively update the appearance parameters. ba = R*E w Estimate R through multivariable linear regression. w Solve for R by computing the Hessian and steepest descent images using Gauss-Newton optimization.
Image Warping Affine transform describes triangle warps.