28e6e16ca0f36149c41cb2ea348a1e71.ppt
- Количество слайдов: 23
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #21 Privacy March 29, 2005
Outline l Data Mining and Privacy - Review l Some Aspects of Privacy l Revisiting Privacy Preserving Data Mining l Platform for Privacy Preferences l Challenges and Discussion
Some Privacy concerns l Medical and Healthcare - Employers, marketers, or others knowing of private medical concerns l Security - Allowing access to individual’s travel and spending data - Allowing access to web surfing behavior l Marketing, Sales, and Finance - Allowing access to individual’s purchases
Data Mining as a Threat to Privacy l Data mining gives us “facts” that are not obvious to human analysts of the data l Can general trends across individuals be determined without revealing information about individuals? l Possible threats: Combine collections of data and infer information that is private l Disease information from prescription data l Military Action from Pizza delivery to pentagon l Need to protect the associations and correlations between the data that are sensitive or private -
Some Privacy Problems and Potential Solutions l Problem: Privacy violations that result due to data mining - Potential solution: Privacy-preserving data mining l Problem: Privacy violations that result due to the Inference problem - Inference is the process of deducing sensitive information from the legitimate responses received to user queries - Potential solution: Privacy Constraint Processing l Problem: Privacy violations due to un-encrypted data - Potential solution: Encryption at different levels l Problem: Privacy violation due to poor system design - Potential solution: Develop methodology for designing privacyenhanced systems
Some Directions: Privacy Preserving Data Mining l Prevent useful results from mining - Introduce “cover stories” to give “false” results - Only make a sample of data available so that an adversary is unable to come up with useful rules and predictive functions l Randomization - Introduce random values into the data and/or results - Challenge is to introduce random values without significantly affecting the data mining results - Give range of values for results instead of exact values l Secure Multi-party Computation - Each party knows its own inputs; encryption techniques used to compute final results
Some Directions: Privacy Problem as a form of Inference Problem l Privacy constraints - Content-based constraints; association-based constraints l Privacy controller - Augment a database system with a privacy controller for constraint processing and examine the releasability of data/information (e. g. , release constraints) l Use of conceptual structures to design applications with privacy in mind (e. g. , privacy preserving database and application design) l The web makes the problem much more challenging than the inference problem we examined in the 1990 s! l Is the General Privacy Problem Unsolvable?
Privacy Constraint Processing l Privacy constraints processing - Based on prior research in security constraint processing - Simple Constraint: an attribute of a document is private - Content-based constraint: If document contains information about X, then it is private - Association-based Constraint: Two or more documents taken together is private; individually each document is public - Release constraint: After X is released Y becomes private l Augment a database system with a privacy controller for constraint processing
Architecture for Privacy Constraint Processing User Interface Manager Privacy Constraints Constraint Manager Query Processor: Constraints during query and release operations DBMS Database Design Tool Update Processor: Constraints during database design operation Constraints during update operation Database
Semantic Model for Privacy Control Dark lines/boxes contain private information Cancer Influenza Has disease John’s address Patient John address England Travels frequently
Some Directions: Encryption for Privacy l Encryption at various levels - Encrypting the data as well as the results of data mining - Encryption for multi-party computation l Encryption for untrusted third party publishing - Owner enforces privacy policies - Publisher gives the user only those portions of the document he/she is authorized to access - Combination of digital signatures and Merkle hash to ensure privacy
Some Directions: Methodology for Designing Privacy Systems l Jointly develop privacy policies with policy specialists l Specification language for privacy policies l Generate privacy constraints from the policy and check for consistency of constraints l Develop a privacy model l Privacy architecture that identifies privacy critical components l Design and develop privacy enforcement algorithms l Verification and validation
Data Mining and Privacy: Friends or Foes? l They are neither friends nor foes l Need advances in both data mining and privacy l Need to design flexible systems - For some applications one may have to focus entirely on “pure” data mining while for some others there may be a need for “privacy-preserving” data mining - Need flexible data mining techniques that can adapt to the changing environments l Technologists, legal specialists, social scientists, policy makers and privacy advocates MUST work together
Aspects of Privacy l Privacy Preserving Databases - Privacy Constraint Processing l Privacy Preserving Networks - Sensor networks, - - l Privacy Preserving Surveillance - RFID l Privacy Preserving Semantic Web - XML, RDF, - - l Privacy Preserving Data Mining
Revisiting Privacy Preserving Data Mining l Association Rules - Privacy Preserving Association Rule Mining l IBM, ----l Decision Trees Privacy Preserving Decision Trees l IBM, - - - l Clustering Privacy Preserving Clustering l Purdue, - - - l Link Analysis Privacy Preserving Link Analysis l UTD, - - -
Privacy Preserving Data Mining Agrawal and Srikant (IBM) l Value Distortion - Introduce a value Xi + r instead of Xi where r is a random value drawn from some distribution l Uniform, Gaussian l Quantifying privacy Introduce a measure based on how closely the original values of modified attribute can be estimated l Challenge is to develop appropriate models Develop training set based on perturbed data l Evolved from inference problem in statistical databases -
Platform for Privacy Preferences (P 3 P): What is it? l P 3 P is an emerging industry standard that enables web sites t 9 o express their privacy practices in a standard format l The format of the policies can be automatically retrieved and understood by user agents l It is a product of W 3 C; World wide web consortium www. w 3 c. org l Main difference between privacy and security User is informed of the privacy policies User is not informed of the security policies -
Platform for Privacy Preferences (P 3 P): Key Points l When a user enters a web site, the privacy policies of the web site is conveyed to the user l If the privacy policies are different from user preferences, the user is notified l User can then decide how to proceed
Platform for Privacy Preferences (P 3 P): Organizations l Several major corporations are working on P 3 P standards including: Microsoft IBM HP NEC Nokia NCR l Web sites have also implemented P 3 P l Semantic web group has adopted P 3 P -
Platform for Privacy Preferences (P 3 P): Specifications l Initial version of P 3 P used RDF to specify policies l Recent version has migrated to XML l P 3 P Policies use XML with namespaces for encoding policies l Example: Catalog shopping Your name will not be given to a third party but your purchases will be given to a third party
Platform for Privacy Preferences (P 3 P): Specifications (Concluded) l P 3 P has its own statements a d data types expressed in XML l P 3 P schemas utilize XML schemas l XML is a prerequisite to understanding P 3 P l P 3 P specification released in January 20005 uses catalog shopping example to explain concepts l P 3 P is an International standard and is an ongoing project
P 3 P and Legal Issues l P 3 P does not replace laws l P 3 P work together with the law l What happens if the web sites do no honor their P 3 P policies Then appropriate legal actions will have to be taken l XML is the technology to specify P 3 P policies l Policy experts will have to specify the policies l Technologies will have to develop the specifications l Legal experts will have to take actions if the policies are violated -
Challenges and Discussion l Technology alone is not sufficient for privacy l We need technologists, Policy expert, Legal experts and Social scientists to work on Privacy l Some well known people have said ‘Forget about privacy” l Should we pursue working on Privacy? Interesting research problems Interdisciplinary research Something is better than nothing Try to prevent privacy violations If violations occur then prosecute l Discussion? -


