Скачать презентацию An Automated System for Visual Biometrics Allerton Conference Скачать презентацию An Automated System for Visual Biometrics Allerton Conference

52471c1d798f2c06e741e739e5e34db0.ppt

  • Количество слайдов: 43

An Automated System for Visual Biometrics Allerton Conference: Security - Part I September 27, An Automated System for Visual Biometrics Allerton Conference: Security - Part I September 27, 2007 Graduate Students: Derek J. Shiell, Louis H. Terry Post Doctorate: Petar S. Aleksic Principle Investigator: Professor Aggelos K. Katsaggelos ([email protected] edu, [email protected] edu, [email protected] northwestern. edu, [email protected] northwestern. edu) Northwestern University Image and Video Processing Lab Dept. of Electrical Engineering and Computer Science

Overview w System overview w Visual front-end n n n Face detection AAM tracking Overview w System overview w Visual front-end n n n Face detection AAM tracking Feature extraction and normalization w Visual biometrics experiments n VALID database n Details and results w Future research directions

System Flowchart Image sequence Face Detection AAM Tracking Visual Feature Extraction Recognition/ Rec. Identification System Flowchart Image sequence Face Detection AAM Tracking Visual Feature Extraction Recognition/ Rec. Identification Result

System Flowchart Visual Front End Image sequence Face Detection AAM Tracking Visual Feature Extraction System Flowchart Visual Front End Image sequence Face Detection AAM Tracking Visual Feature Extraction Recognition/ Rec. Identification Result Visual features are detected, tracked, normalized, extracted and recognized in real-time.

Visual Front End: Face Detection AAM Tracking Visual Feature Extraction Recognition/ Identification Visual Front End: Face Detection AAM Tracking Visual Feature Extraction Recognition/ Identification

Face Detection using Viola & Jones § Viola & Jones algorithm: § Train weak Face Detection using Viola & Jones § Viola & Jones algorithm: § Train weak classifiers (Haar features) using Adaboost method § Create a strong classifier through a cascade of weak classifiers Haar Features Face Detection Results

Visual Front End: AAM Tracking Face Detection AAM Tracking Visual Feature Extraction Recognition/ Identification Visual Front End: AAM Tracking Face Detection AAM Tracking Visual Feature Extraction Recognition/ Identification

Active Appearance Models In General: w Label many images of a deformable object. w Active Appearance Models In General: w Label many images of a deformable object. w Align the labeled shapes (i. e. point sets). w Compute a linear shape model. w Warp images to compute a linear texture model. w Combine the shape and texture model into a single linear appearance model Training Image with landmark points showing shape contours. Tim Cootes - http: //www. isbe. man. ac. uk/~bim/

Active Appearance Models (AAM) I. Matthews and S. Baker, Active Appearance Models (AAM) I. Matthews and S. Baker, "Active Appearance Models Revisited, " International Journal of Computer Vision, 2004.

Training the AAM Image Labeling • Manually labeled 303 face images with 75 landmark Training the AAM Image Labeling • Manually labeled 303 face images with 75 landmark points. • 10 male, 10 female speakers • Various office lighting conditions. • Very Time consuming. Tim Cootes - http: //www. isbe. man. ac. uk/~bim/

Shape Alignment Procrustes Analysis 1. Remove x and y translation from all shapes. 2. Shape Alignment Procrustes Analysis 1. Remove x and y translation from all shapes. 2. Calculate average shape, Xms. 3. Solve for rotation and scale (Ri, bi) for all, N images. 4. Recalculate the mean, Xms and realign (step 3). 5. Stop when Xms stabilizes.

Shape Model • Deformations after rigid shape alignment are due to shape variation • Shape Model • Deformations after rigid shape alignment are due to shape variation • Do PCA on the point sets to create a set of shape basis, Ps. • Given Ps and s, X can be reconstructed. • Most variation described by first few principle components (eigenvectors in Ps). (1) (3) (2)

Shape Model Visualizing how the different shape basis affect the shape. In general: First Shape Model Visualizing how the different shape basis affect the shape. In general: First three shape modes showing how shape changes by varying +/- 3*sqrt( sj)

Shape Model Combining and projecting onto all ks shape bases generates a new shape. Shape Model Combining and projecting onto all ks shape bases generates a new shape. First three shape modes showing how shape changes by varying +/- 3*sqrt( sj)

Texture Model Visualizing how the different texture basis affect the texture: First three shape Texture Model Visualizing how the different texture basis affect the texture: First three shape modes showing how shape changes by varying +/- 3* sqrt( tj)

Appearance Model (1) The texture and shape models can generate Unique textures and shapes Appearance Model (1) The texture and shape models can generate Unique textures and shapes depending on bs and bt (2) Weight the shape bases to match shape and texture units. (4) Concatenate all bs and bt from training to train appearance model via a 3 rd PCA.

Appearance Model Define: Given appearance parameters ba we can reconstruct the shape and texture Appearance Model Define: Given appearance parameters ba we can reconstruct the shape and texture of a deformable model.

AAM Search Procedure Algorithm 1. Initialize points to mean shape and mean appearance. 2. AAM Search Procedure Algorithm 1. Initialize points to mean shape and mean appearance. 2. Warp texture from image to the mean shape and compute the difference, E, from the mean texture. 3. Update appearance model parameters, ba = R*E 4. Determine xs and xt given ba 5. Back to step 2 6. Stop when reconstruction error < thresh or max number of iterations

Original vs Reconstruction Original face Reconstructed face Original vs Reconstruction Original face Reconstructed face

Tracking Result Tracking Result

Visual Front-End: Feature Extraction Face Detection AAM Tracking Visual Feature Extraction Recognition/ Identification Visual Front-End: Feature Extraction Face Detection AAM Tracking Visual Feature Extraction Recognition/ Identification

Mouth Region Extraction Tracked shape Normalize to reference shape with respect to in-plane rotation Mouth Region Extraction Tracked shape Normalize to reference shape with respect to in-plane rotation and scale.

Extracting Visual Features 40 x 40 ROI CT D DCT coefficients Normalized image Extracting Visual Features 40 x 40 ROI CT D DCT coefficients Normalized image

Recognition Results Face Detection AAM Tracking Visual Feature Extraction Recognition/ Identification Recognition Results Face Detection AAM Tracking Visual Feature Extraction Recognition/ Identification

VALID Database Speakers: n 77 male, 29 female n 97 Caucasian, 5 Asian, 4 VALID Database Speakers: n 77 male, 29 female n 97 Caucasian, 5 Asian, 4 Indo-Asian n 38 with spectacles, 68 w/o spectacles n 8 with facial hair, 98 w/o facial hair Vocabulary: n n n “” “ 5 0 6 9 2 8 1 3 7 4” continuous digits “Joe took father’s green shoe bench out” Audio: n 16 bit stereo samples at a frequency of 32 k. Hz with PCM encoding Video: n 5 different video conditions for each speaker: 1 studio environment, 4 office environments n Illumination variation, pose variation, appearance variation n 576 x 720, 25. 00 fps

Visual Speaker Identification Visual biometrics speaker identification experiment § Phrase “Joe took father’s green Visual Speaker Identification Visual biometrics speaker identification experiment § Phrase “Joe took father’s green shoe bench out” § Compared best visual speaker identification result using AAM tracking compared to supplied hand labeled data § Shifted training/testing sets for more reliable results (43 speaker subset) § Run #1: train on videos 3, 4 and 5 test on 2 § Run #2: train on videos 2, 4 and 5 test on 3 § Etc. § Testing over 20, 40, 60, 80, 100 DCT coefficients 1, 2, 3, 4, 5 mixtures and 3, 4, 5 states

Speaker Recognition Results AAM: 60 DCT coefficients, 4 mixtures, 3 states, 59. 3% GT: Speaker Recognition Results AAM: 60 DCT coefficients, 4 mixtures, 3 states, 59. 3% GT: 100 DCT coefficients, 3 mixtures, 3 states, 52. 3%

Speaker Recognition Results Example extracted ROIs from hand labeled data. Speaker Recognition Results Example extracted ROIs from hand labeled data.

Future Research Directions • Improve tracking robustness. • Illumination, speakers, head paraphernalia/occlusions • Investigate Future Research Directions • Improve tracking robustness. • Illumination, speakers, head paraphernalia/occlusions • Investigate other visual features for speaker recognition. • Investigate the effect of different normalization methods on the mouth region as a pre-processing step before recognition.

Thank You! Questions? Thank You! Questions?

Thank You! Questions? Thank You! Questions?

Thank You! Questions? Thank You! Questions?

Tracking Result Tracking Result

Photometric Normalization Algorithm 1. Scan all pixels intensities into vectors and find mean texture. Photometric Normalization Algorithm 1. Scan all pixels intensities into vectors and find mean texture. Standardize texture 2. Standardize texture 3. Align all textures to the mean 4. Find new mean, xmt 5. Standardize xmt 6. Repeat until xmt is stable Texture Alignment

Image Warping Image Warping

Photometric Normalization Mean Texture Profile Original Texture Profiles After scale and mean alignment Photometric Normalization Mean Texture Profile Original Texture Profiles After scale and mean alignment

Project Details w w Linux Ubuntu environment Open. CV, Intel IPP Package, Motorola MLite++ Project Details w w Linux Ubuntu environment Open. CV, Intel IPP Package, Motorola MLite++ 1000’s of lines of MATLAB and C++ code Demo computer n n n Intel Core Duo 1. 66 Ghz (using only one processor) 1 GB RAM Philips Toucam Pro at 320 x 240 resolution

Adaptive LMS Filter Update Comparison to adaptive LMS filter: Adaptive LMS Filter Update Comparison to adaptive LMS filter:

Adaptive LMS Filter Update Rearrange: Optimum Solution: Adaptive LMS Filter Update Rearrange: Optimum Solution:

Research Goal 1. 2. 3. Develop a robust system to automatically and rapidly extract Research Goal 1. 2. 3. Develop a robust system to automatically and rapidly extract visual features from a speaker for use in audio-visual and visual speech recognition and biometrics. Compare results of recognition and biometrics using the extracted features with results from ‘ground truth’ data. Imbed speech recognition software into tracking system to allow the capability for real-time visualonly speech/speaker recognition.

AAM Update w Iteratively update the appearance parameters. ba = R*E w Estimate R AAM Update w Iteratively update the appearance parameters. ba = R*E w Estimate R through multivariable linear regression. w Solve for R by computing the Hessian and steepest descent images using Gauss-Newton optimization.

Image Warping Affine transform describes triangle warps. Image Warping Affine transform describes triangle warps.