Скачать презентацию Strategies for Managing Missing or Incomplete Data in Скачать презентацию Strategies for Managing Missing or Incomplete Data in

42c683f64357698456f3ece33742d574.ppt

  • Количество слайдов: 54

Strategies for Managing Missing or Incomplete Data in Biometric and Business Applications 1 Mark Strategies for Managing Missing or Incomplete Data in Biometric and Business Applications 1 Mark Ritzmann Pace University March 17, 2007

Contents l l l 2 Overview Essence and Significance of Work Experiment Design Outcomes Contents l l l 2 Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work

Contents l l l 3 Overview Essence and Significance of Work Experiment Design Outcomes Contents l l l 3 Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work

Overview Essence of this work l l l 4 Address the problem of missing Overview Essence of this work l l l 4 Address the problem of missing or incomplete data and put forth strategies to overcome that problem Add to the accuracy of existing Keystroke Biometric Recognition System Apply finding to other application areas

Overview The Impact of Missing data l l 5 <1% considered trivial 1 -5% Overview The Impact of Missing data l l 5 <1% considered trivial 1 -5% considered manageable 5 -15% requires sophisticated methods >15% may severely impact any interpretation P. Liu & L. Lei, Missing Data Treatment Methods and NBI Models, Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications, IEEE, 2006

Overview Missing Data Mechanisms l l l MCAR – Missing Completely at Random MAR Overview Missing Data Mechanisms l l l MCAR – Missing Completely at Random MAR – Missing At Random NMAR – Not missing at Random Most missing data treatment methods assume missing is MAR 6 P. Liu & L. Lei, Missing Data Treatment Methods and NBI Models, Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications, IEEE, 2006

Overview Missing Data Treatment, High Level Heuristic • Based on established rules and guidelines Overview Missing Data Treatment, High Level Heuristic • Based on established rules and guidelines • Similar to an expert system • Association is prime example 7 Statistical • Existing data used to calculate missing data • Care need to be taken not to over fit • Mean/mode is prime example

Overview Missing Data Treatment Methods l l l l l 8 Case Deletion Parameter Overview Missing Data Treatment Methods l l l l l 8 Case Deletion Parameter Estimation Mean/Mode Imputation Method of Assigning All Possible Values of the Attribute Regression Imputation Hot Deck Imputation and Cold Deck Imputation Multiple Imputation K-Nearest Neighbor Imputation Internal Treatment Method

Overview Biometric background l l l Roots in CIA & Dept of Defense work Overview Biometric background l l l Roots in CIA & Dept of Defense work Early Issues – technology, cost, lack of standards Basic Uses – – l Basic types – – 9 Verification (easier of the two; yes/no) Identification (harder of the two; 1 of n) Physiological – generally do not change Behavioral – can change, easier to mimic

Overview Biometric Issues Business People • User confidence • Privacy issues • User preferences Overview Biometric Issues Business People • User confidence • Privacy issues • User preferences • User acceptance • User profile • Trust Legal & Regulatory • Lack of precedence • Ambiguous process • Imprecise definition • Logistics of proof of defense • Financial feasibility • Interaction with traditional controls • Application not subject to rigor • Incompatibility with business partners • Transition to e-business Operational • Control locus BIOMETRICS: CHALLENGES & CAVEATS Technical • Adaptation • Hardware • Evolving nature of technology • Scattered proliferation & polarization • Uniqueness of biometric • Scalability 10 • Lab vs Field • Scalability • Continuous Authentication • Security System • Business Process • Design • Control • Enrollment Challenge • System Downtime • Availability of template database • Effects of malicious code A. Chandra & T. Calderon, Challenges and Constraints to the Diffusion of Biometrics Information Systems, Communications of the ACM, December 2005, Vol 48, No 2

Overview Privacy Issues – special mention l Opt in/Opt out – l Dictated environment Overview Privacy Issues – special mention l Opt in/Opt out – l Dictated environment – l Any corporate or instructional e-mail system where the ultimate ownership of the keystroke resides with that entity Capture results, not text itself – 11 Any application or web site that used this system would need to do so with full disclosure. The user could then knowingly decide. Use keystrokes to authenticate/identify, not the words themselves or the intact messages

Overview Keyboard Biometric Studies in the Literature l Key Concepts – – l Classic Overview Keyboard Biometric Studies in the Literature l Key Concepts – – l Classic Studies – – – l l 12 Copy vs Free Authentication vs Identification Gaines, 1980 Umphress & Williams, 1985 Leggett & Williams, 1988 Joyce & Gupta, 1990 Bleha et al, 1990 Brown & Rogers, 1993 Recent Studies – University of Torino Pace University contributions

Contents l l l 13 Overview Essence and Significance of Work Experiment Design Outcomes Contents l l l 13 Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work

Essence and Significance of Work High Level Objectives Develop strategies to manage the significant Essence and Significance of Work High Level Objectives Develop strategies to manage the significant problem of missing or incomplete data 1 2 Improve the accuracy of the current Keystroke Biometric Recognition System 3 Apply findings to other areas 14

Essence and Significance of Work Detailed Objectives First Objective: Gain insight as to the Essence and Significance of Work Detailed Objectives First Objective: Gain insight as to the effectiveness and application of MISSING DATA strategies and decision making with incomplete information Second Objective: Third Objective: 15 Improve the accuracy of the current Keystroke Biometric Recognition system by improving the FALLBACK model invoked when a sample is of insufficient size • Identify a potential application for a Keystroke Biometric recognition system • Project the findings to other potential areas

Contents l l l 16 Overview Essence and Significance of Work Experiment Design Outcomes Contents l l l 16 Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work

Experiment Design Re-use of assets from previous Pace work l l 17 Data set Experiment Design Re-use of assets from previous Pace work l l 17 Data set Features/feature extraction Tests Optimal settings

Experiment Design Future inclusion ? 18 Experiment Design Future inclusion ? 18

Experiment Design 6 Test scenarios 19 Dr. Mary Vilani, Spring 2006 Used with permission Experiment Design 6 Test scenarios 19 Dr. Mary Vilani, Spring 2006 Used with permission

Experiment Design Feature set 20 Dr. Mary Vilani, Spring 2006 Used with permission Experiment Design Feature set 20 Dr. Mary Vilani, Spring 2006 Used with permission

Experiment Design Summary of Subject Participation Subjects by Experiment Ø 36 subjects all four Experiment Design Summary of Subject Participation Subjects by Experiment Ø 36 subjects all four quadrants Ø 52 subjects 1. Copy Task Ø 40 subjects 2. Free Text Ø 93 subjects 3. Desktop Ø 47 subjects 4. Laptop Ø 41 subjects 5. Desk Copy / Lap Free 21 Dr. Mary Vilani, Spring 2006 Used with permission Ø 40 subjects 6. Lap Copy / Desk Free

Experiment Design Data/Sample Capture Application 22 Dr. Mary Vilani, Spring 2006 Used with permission Experiment Design Data/Sample Capture Application 22 Dr. Mary Vilani, Spring 2006 Used with permission

Experiment Design Application Version 2. 0 - developed Fall, 2006 l l l 23 Experiment Design Application Version 2. 0 - developed Fall, 2006 l l l 23 Development and Implementation of 2 additional Fallback Models Tremendously enhanced Testing functionality Development and Implementation of Trace Mechanism

Experiment Design New Bio Feature Extractor Interface 24 Experiment Design New Bio Feature Extractor Interface 24

Experiment Design New Classifier Interface 25 Experiment Design New Classifier Interface 25

Experiment Design High Level Overview of Fallback Models Statistical Heuristic Linguistic Model Touch Type Experiment Design High Level Overview of Fallback Models Statistical Heuristic Linguistic Model Touch Type Model 26 Statistical Model New Models

Experiment Design Overview of Models l Linguistic l Touch Type Statistical l 27 Experiment Design Overview of Models l Linguistic l Touch Type Statistical l 27

Experiment Design Linguistic Fallback Model - Duration 28 Experiment Design Linguistic Fallback Model - Duration 28

Experiment Design Linguistic Fallback Model - Transition 29 Experiment Design Linguistic Fallback Model - Transition 29

Experiment Design Touch Type Fallback Model - Background l Touch Type approach invented by Experiment Design Touch Type Fallback Model - Background l Touch Type approach invented by Frank Edgar Mc. Gurrin in late 1800’s – – l l 30 Won speed contest on July 25, 1888 Was front page news Touch Type Idea - use sense of touch rather than sight (looking at key label) Most keyboards still have raised indicator on “f” and “j” to indicate home position

Experiment Design Touch Type Fallback Model 31 Experiment Design Touch Type Fallback Model 31

Experiment Design Touch Type Fallback Model - Duration All Keys All Left Hand All Experiment Design Touch Type Fallback Model - Duration All Keys All Left Hand All Right Hand Left Little Right Little Left Ring A Z 32 Left Middle Left Index Right Middle Right Ring Q 1 ; S X W D E 2 C 3 L F G B R 4 T V 5 H J M Y 6 U N 7 K I. , O 8 9 P / 0

Experiment Design Touch Type Fallback Model - Transition Letter/letter Left/left Right/right I/N Left/right Right/left Experiment Design Touch Type Fallback Model - Transition Letter/letter Left/left Right/right I/N Left/right Right/left O/N N/D O/R H/E E/R 33 S/T A/T E/S R/E T/H E/A A/N E/N T/I

Experiment Design Statistical Fallback Model l l For Duration – Mean Imputation For Transition Experiment Design Statistical Fallback Model l l For Duration – Mean Imputation For Transition – Multiple Imputation – – 34 Mean and Standard deviation calculated on transition full data set Any value >1 Standard deviation from the mean was removed New mean and standard deviation calculated on remaining data Process repeated 3 times

Experiment Design Statistical Fallback Model – Duration Clusters 35 Experiment Design Statistical Fallback Model – Duration Clusters 35

Experiment Design Statistical Fallback Model - Duration B Y CLUSTER 1 All Keys H Experiment Design Statistical Fallback Model - Duration B Y CLUSTER 1 All Keys H UNDER 100 G CLUSTER 2 U OVER 100 NODE A CLUSTER 9 N NODE B A CLUSTER 8 CLUSTER 4 ‘ S CLUSTER 7 36 I CLUSTER 3 E D W CLUSTER 6 R . O C CLUSTER 5 T P F M , - L

Experiment Design Statistical Fallback Model – Transition development Data Compacting Sample Size % of Experiment Design Statistical Fallback Model – Transition development Data Compacting Sample Size % of sample left after outlier wash 100% 37 Data Compacting process

Experiment Design Statistical Fallback Model – Transition, Raw Order 38 Experiment Design Statistical Fallback Model – Transition, Raw Order 38

Experiment Design Statistical Fallback Model – Transition, Cluster Development 39 Experiment Design Statistical Fallback Model – Transition, Cluster Development 39

Experiment Design Statistical Fallback Model - Transition Any/Any Over 50 Under 50 Node B Experiment Design Statistical Fallback Model - Transition Any/Any Over 50 Under 50 Node B Node A Node 1 E-R 40 T-H R-E Node 2 A-N H-E E-S Node C Node 3 A-T Node D T-I Node 4 E-A S-T N-D O-R O-N E-N I-N

Contents l l l 41 Overview Essence and Significance of Work Experiment Design Outcomes Contents l l l 41 Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work

Outcomes Results Comparison 42 Outcomes Results Comparison 42

Contents l l l 43 Overview Essence and Significance of Work Experiment Design Outcomes Contents l l l 43 Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work

Analysis Fallback Trace 44 Analysis Fallback Trace 44

Experiment Design Linguistic Fallback Model – Duration (repeat of previous) 45 Experiment Design Linguistic Fallback Model – Duration (repeat of previous) 45

Analysis Proposed Second Generation Touch Type Fallback Model Duration All Keys Red Circles remain Analysis Proposed Second Generation Touch Type Fallback Model Duration All Keys Red Circles remain as leafs All else falls back to next level All Left Hand All Right Hand Left Little Right Little Left Ring A Z 46 Left Middle Left Index Right Middle Right Ring Q P S W D X C E L F G B R T V H J M Y U N K I O

Contents l l l 47 Overview Essence and Significance of Work Experiment Design Outcomes Contents l l l 47 Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work

Future Work Two Main Areas l Academic – – – l Hybrid System development Future Work Two Main Areas l Academic – – – l Hybrid System development – keystroke, mouse movement, stylistic Principle Components Eigen Values Application – For Keystroke Biometric system: l l – 48 Academic – online testing Biometric Marketing For General Missing data, analytical applications

Future Work Key success factors to system acceptance l l l Robustness – level Future Work Key success factors to system acceptance l l l Robustness – level of trust Acceptance Level – support by third party processes Cost – hardware/software, communications and support Ease of Use/Portability – extent of support across client machines Security – privacy, integrity, and non-repudiation “future research into the use of biometric technology in online marketing applications must consider not only technical feasibility, but also social and legal acceptability. ” 49

Future Work Biometric Marketing l l Use of Biometric technology to identify and segment Future Work Biometric Marketing l l Use of Biometric technology to identify and segment users/consumers What you have to believe: – – l 50 Segmentation is better Short + short = long for sampling Chat rooms, e-mails etc.

Future Work Analytical Applications l l Currently growing in use and acceptance Can be Future Work Analytical Applications l l Currently growing in use and acceptance Can be assumed Missing Data problem is present – – 51 SAP considers <10% a non-factor IBM identifies missing data, but does not manage Case deletion most prevalent Advanced strategies not identified

Future Work Analytical Applications - Examples 52 Future Work Analytical Applications - Examples 52

Future Work Analytical Applications - Examples 53 Future Work Analytical Applications - Examples 53

Future Work Analytical Applications - Examples Customer Management Merchandising Management Products & Services Management Future Work Analytical Applications - Examples Customer Management Merchandising Management Products & Services Management • Campaign & Promotion Analysis • Cross Purchase Behavior • Cross Sell Analysis • Customer Attrition Analysis • Customer Complaints Analysis • Customer Credit Risk Profile • Customer Delinquency Analysis • Customer Interaction Analysis • Assortment and Allocation Analysis • Inventory Analysis • Physical Merchandising / Space Management Analysis • Pricing Analysis • Promotion Analysis • Business Performance Analysis • Planning and Forecasting Analysis • Product profitability Store Operations Management 54 • Customer Lifetime Value Analysis • Customer Loyalty • Customer Movement Dynamics • Customer Profile Analysis • Customer Profitability • Involved Party Exposure • Lead Analysis • Market Analysis • Activity Based Costing Analysis • Location Exposure • Location profitability • Loss Prevention Analysis Corporate Finance Management • Capital Allocation Analysis • Credit Risk Analysis • Service Delivery Analysis • Transaction Profitability Analysis • Vendor Performance Analysis • Non Performing Loan Analysis • Organization Unit Profitability • Performance Measurement • Staffing Analysis • Store Location Analysis • Store Optimization Analysis • Suspicious Activity Analysis • Financial Management Accounting • Income Analysis