42c683f64357698456f3ece33742d574.ppt
- Количество слайдов: 54
Strategies for Managing Missing or Incomplete Data in Biometric and Business Applications 1 Mark Ritzmann Pace University March 17, 2007
Contents l l l 2 Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work
Contents l l l 3 Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work
Overview Essence of this work l l l 4 Address the problem of missing or incomplete data and put forth strategies to overcome that problem Add to the accuracy of existing Keystroke Biometric Recognition System Apply finding to other application areas
Overview The Impact of Missing data l l 5 <1% considered trivial 1 -5% considered manageable 5 -15% requires sophisticated methods >15% may severely impact any interpretation P. Liu & L. Lei, Missing Data Treatment Methods and NBI Models, Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications, IEEE, 2006
Overview Missing Data Mechanisms l l l MCAR – Missing Completely at Random MAR – Missing At Random NMAR – Not missing at Random Most missing data treatment methods assume missing is MAR 6 P. Liu & L. Lei, Missing Data Treatment Methods and NBI Models, Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications, IEEE, 2006
Overview Missing Data Treatment, High Level Heuristic • Based on established rules and guidelines • Similar to an expert system • Association is prime example 7 Statistical • Existing data used to calculate missing data • Care need to be taken not to over fit • Mean/mode is prime example
Overview Missing Data Treatment Methods l l l l l 8 Case Deletion Parameter Estimation Mean/Mode Imputation Method of Assigning All Possible Values of the Attribute Regression Imputation Hot Deck Imputation and Cold Deck Imputation Multiple Imputation K-Nearest Neighbor Imputation Internal Treatment Method
Overview Biometric background l l l Roots in CIA & Dept of Defense work Early Issues – technology, cost, lack of standards Basic Uses – – l Basic types – – 9 Verification (easier of the two; yes/no) Identification (harder of the two; 1 of n) Physiological – generally do not change Behavioral – can change, easier to mimic
Overview Biometric Issues Business People • User confidence • Privacy issues • User preferences • User acceptance • User profile • Trust Legal & Regulatory • Lack of precedence • Ambiguous process • Imprecise definition • Logistics of proof of defense • Financial feasibility • Interaction with traditional controls • Application not subject to rigor • Incompatibility with business partners • Transition to e-business Operational • Control locus BIOMETRICS: CHALLENGES & CAVEATS Technical • Adaptation • Hardware • Evolving nature of technology • Scattered proliferation & polarization • Uniqueness of biometric • Scalability 10 • Lab vs Field • Scalability • Continuous Authentication • Security System • Business Process • Design • Control • Enrollment Challenge • System Downtime • Availability of template database • Effects of malicious code A. Chandra & T. Calderon, Challenges and Constraints to the Diffusion of Biometrics Information Systems, Communications of the ACM, December 2005, Vol 48, No 2
Overview Privacy Issues – special mention l Opt in/Opt out – l Dictated environment – l Any corporate or instructional e-mail system where the ultimate ownership of the keystroke resides with that entity Capture results, not text itself – 11 Any application or web site that used this system would need to do so with full disclosure. The user could then knowingly decide. Use keystrokes to authenticate/identify, not the words themselves or the intact messages
Overview Keyboard Biometric Studies in the Literature l Key Concepts – – l Classic Studies – – – l l 12 Copy vs Free Authentication vs Identification Gaines, 1980 Umphress & Williams, 1985 Leggett & Williams, 1988 Joyce & Gupta, 1990 Bleha et al, 1990 Brown & Rogers, 1993 Recent Studies – University of Torino Pace University contributions
Contents l l l 13 Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work
Essence and Significance of Work High Level Objectives Develop strategies to manage the significant problem of missing or incomplete data 1 2 Improve the accuracy of the current Keystroke Biometric Recognition System 3 Apply findings to other areas 14
Essence and Significance of Work Detailed Objectives First Objective: Gain insight as to the effectiveness and application of MISSING DATA strategies and decision making with incomplete information Second Objective: Third Objective: 15 Improve the accuracy of the current Keystroke Biometric Recognition system by improving the FALLBACK model invoked when a sample is of insufficient size • Identify a potential application for a Keystroke Biometric recognition system • Project the findings to other potential areas
Contents l l l 16 Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work
Experiment Design Re-use of assets from previous Pace work l l 17 Data set Features/feature extraction Tests Optimal settings
Experiment Design Future inclusion ? 18
Experiment Design 6 Test scenarios 19 Dr. Mary Vilani, Spring 2006 Used with permission
Experiment Design Feature set 20 Dr. Mary Vilani, Spring 2006 Used with permission
Experiment Design Summary of Subject Participation Subjects by Experiment Ø 36 subjects all four quadrants Ø 52 subjects 1. Copy Task Ø 40 subjects 2. Free Text Ø 93 subjects 3. Desktop Ø 47 subjects 4. Laptop Ø 41 subjects 5. Desk Copy / Lap Free 21 Dr. Mary Vilani, Spring 2006 Used with permission Ø 40 subjects 6. Lap Copy / Desk Free
Experiment Design Data/Sample Capture Application 22 Dr. Mary Vilani, Spring 2006 Used with permission
Experiment Design Application Version 2. 0 - developed Fall, 2006 l l l 23 Development and Implementation of 2 additional Fallback Models Tremendously enhanced Testing functionality Development and Implementation of Trace Mechanism
Experiment Design New Bio Feature Extractor Interface 24
Experiment Design New Classifier Interface 25
Experiment Design High Level Overview of Fallback Models Statistical Heuristic Linguistic Model Touch Type Model 26 Statistical Model New Models
Experiment Design Overview of Models l Linguistic l Touch Type Statistical l 27
Experiment Design Linguistic Fallback Model - Duration 28
Experiment Design Linguistic Fallback Model - Transition 29
Experiment Design Touch Type Fallback Model - Background l Touch Type approach invented by Frank Edgar Mc. Gurrin in late 1800’s – – l l 30 Won speed contest on July 25, 1888 Was front page news Touch Type Idea - use sense of touch rather than sight (looking at key label) Most keyboards still have raised indicator on “f” and “j” to indicate home position
Experiment Design Touch Type Fallback Model 31
Experiment Design Touch Type Fallback Model - Duration All Keys All Left Hand All Right Hand Left Little Right Little Left Ring A Z 32 Left Middle Left Index Right Middle Right Ring Q 1 ; S X W D E 2 C 3 L F G B R 4 T V 5 H J M Y 6 U N 7 K I. , O 8 9 P / 0
Experiment Design Touch Type Fallback Model - Transition Letter/letter Left/left Right/right I/N Left/right Right/left O/N N/D O/R H/E E/R 33 S/T A/T E/S R/E T/H E/A A/N E/N T/I
Experiment Design Statistical Fallback Model l l For Duration – Mean Imputation For Transition – Multiple Imputation – – 34 Mean and Standard deviation calculated on transition full data set Any value >1 Standard deviation from the mean was removed New mean and standard deviation calculated on remaining data Process repeated 3 times
Experiment Design Statistical Fallback Model – Duration Clusters 35
Experiment Design Statistical Fallback Model - Duration B Y CLUSTER 1 All Keys H UNDER 100 G CLUSTER 2 U OVER 100 NODE A CLUSTER 9 N NODE B A CLUSTER 8 CLUSTER 4 ‘ S CLUSTER 7 36 I CLUSTER 3 E D W CLUSTER 6 R . O C CLUSTER 5 T P F M , - L
Experiment Design Statistical Fallback Model – Transition development Data Compacting Sample Size % of sample left after outlier wash 100% 37 Data Compacting process
Experiment Design Statistical Fallback Model – Transition, Raw Order 38
Experiment Design Statistical Fallback Model – Transition, Cluster Development 39
Experiment Design Statistical Fallback Model - Transition Any/Any Over 50 Under 50 Node B Node A Node 1 E-R 40 T-H R-E Node 2 A-N H-E E-S Node C Node 3 A-T Node D T-I Node 4 E-A S-T N-D O-R O-N E-N I-N
Contents l l l 41 Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work
Outcomes Results Comparison 42
Contents l l l 43 Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work
Analysis Fallback Trace 44
Experiment Design Linguistic Fallback Model – Duration (repeat of previous) 45
Analysis Proposed Second Generation Touch Type Fallback Model Duration All Keys Red Circles remain as leafs All else falls back to next level All Left Hand All Right Hand Left Little Right Little Left Ring A Z 46 Left Middle Left Index Right Middle Right Ring Q P S W D X C E L F G B R T V H J M Y U N K I O
Contents l l l 47 Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work
Future Work Two Main Areas l Academic – – – l Hybrid System development – keystroke, mouse movement, stylistic Principle Components Eigen Values Application – For Keystroke Biometric system: l l – 48 Academic – online testing Biometric Marketing For General Missing data, analytical applications
Future Work Key success factors to system acceptance l l l Robustness – level of trust Acceptance Level – support by third party processes Cost – hardware/software, communications and support Ease of Use/Portability – extent of support across client machines Security – privacy, integrity, and non-repudiation “future research into the use of biometric technology in online marketing applications must consider not only technical feasibility, but also social and legal acceptability. ” 49
Future Work Biometric Marketing l l Use of Biometric technology to identify and segment users/consumers What you have to believe: – – l 50 Segmentation is better Short + short = long for sampling Chat rooms, e-mails etc.
Future Work Analytical Applications l l Currently growing in use and acceptance Can be assumed Missing Data problem is present – – 51 SAP considers <10% a non-factor IBM identifies missing data, but does not manage Case deletion most prevalent Advanced strategies not identified
Future Work Analytical Applications - Examples 52
Future Work Analytical Applications - Examples 53
Future Work Analytical Applications - Examples Customer Management Merchandising Management Products & Services Management • Campaign & Promotion Analysis • Cross Purchase Behavior • Cross Sell Analysis • Customer Attrition Analysis • Customer Complaints Analysis • Customer Credit Risk Profile • Customer Delinquency Analysis • Customer Interaction Analysis • Assortment and Allocation Analysis • Inventory Analysis • Physical Merchandising / Space Management Analysis • Pricing Analysis • Promotion Analysis • Business Performance Analysis • Planning and Forecasting Analysis • Product profitability Store Operations Management 54 • Customer Lifetime Value Analysis • Customer Loyalty • Customer Movement Dynamics • Customer Profile Analysis • Customer Profitability • Involved Party Exposure • Lead Analysis • Market Analysis • Activity Based Costing Analysis • Location Exposure • Location profitability • Loss Prevention Analysis Corporate Finance Management • Capital Allocation Analysis • Credit Risk Analysis • Service Delivery Analysis • Transaction Profitability Analysis • Vendor Performance Analysis • Non Performing Loan Analysis • Organization Unit Profitability • Performance Measurement • Staffing Analysis • Store Location Analysis • Store Optimization Analysis • Suspicious Activity Analysis • Financial Management Accounting • Income Analysis


