Скачать презентацию Privacy-oriented Data Mining by Proof Checking Stan Matwin Скачать презентацию Privacy-oriented Data Mining by Proof Checking Stan Matwin

58722d7d318905fe8221c6d44db788a4.ppt

  • Количество слайдов: 25

Privacy-oriented Data Mining by Proof Checking Stan Matwin (joint work with Amy Felty ) Privacy-oriented Data Mining by Proof Checking Stan Matwin (joint work with Amy Felty ) SITE University of Ottawa, Canada [email protected] uottawa. ca UBC 29/7/03

The TAMALE Group • 4 profs • Some 30 graduate students • Areas: machine The TAMALE Group • 4 profs • Some 30 graduate students • Areas: machine learning, data mining, text mining, NLP, data warehousing • Research in – – UBC 29/7/03 Inductive Logic Programming Text mining Learning in the presence of knowledge Applications of ML/DM (e. g. in SE: tools for maintenance personnel) 2

 • Why did I get into this research? • what is already being • Why did I get into this research? • what is already being done… and why it ’s not enough • the main idea • its operation • discussion – ’correctness ’ • prototype - Coq and CIC • example • some technical challenges • acceptance? UBC 29/7/03 3

Some useful concepts. . . • opting out vs opting in • Use Limitation Some useful concepts. . . • opting out vs opting in • Use Limitation Principle: data should be used only for the explicit purpose for which it has been collected UBC 29/7/03 4

…and existing technical proposals On the web: P 3 P Platform for Privacy Preferences …and existing technical proposals On the web: P 3 P Platform for Privacy Preferences • W 3 C standard • XML specifications - on websites and in browsers - of what can be collected and for what purpose - ULP? • Handles cookies • Data exchange protocol more than privacy protocol: no provisions for opting out after an initial opt-in • the ULP part is in NL…not verifiable UBC 29/7/03 5

Agrawal ’s data perturbation transformations • data is perturbed by random distortion: xi +r Agrawal ’s data perturbation transformations • data is perturbed by random distortion: xi +r r uniform or gaussian • a procedure to reconstruct a PACesitimation of the original distribution (but not the values) • a procedure to build an accurate decision tree on the perturbed distribution UBC 29/7/03 6

Agrawal ’s transformations cont’d • proposes a measure to quantify privacy – estimate intervals Agrawal ’s transformations cont’d • proposes a measure to quantify privacy – estimate intervals and their size • lately extended to non-numerical attributes, and to association rules • does not address the ULP • how do we know it is applied? UBC 29/7/03 7

the main idea: towards a verifiable ULP • User sets permissions: what can and the main idea: towards a verifiable ULP • User sets permissions: what can and cannot be done with her data • Any claim that a software respects these permissions is a proof of a theorem about the software • Verifying the claim is then checking that proof against the software UBC 29/7/03 8

Who are the players? • • • UBC 29/7/03 User C Data miner Org Who are the players? • • • UBC 29/7/03 User C Data miner Org Data mining software developer Dev Independent verifier Veri …BUT no one owns the data D 9

D: database scheme A: given set of database and data mining operations S: source D: database scheme A: given set of database and data mining operations S: source code for A UBC 29/7/03 PC(D, A): C’s permissions T(PC, S): theorem that S respects PC R(PC, S): proof of T(PC, S): B: binary code of S 10

Discussion - properties • It can be proven that C ’s permissions are respected Discussion - properties • It can be proven that C ’s permissions are respected (or not): PC is in fact a verifiable ULP • PC can be negative (out) or positive (in) • proof construction needs to be done only once for a given PC , D and A • Scheme is robust against cheating by Dev or Org UBC 29/7/03 11

Acceptance issues • No Org will give Veri access to S • Too much Acceptance issues • No Org will give Veri access to S • Too much overhead to check R(PC, S) for each task, and each user • Too cumbersome for C • Based on all Orgs buying in UBC 29/7/03 12

Acceptance 1: Veri’s operation- access • Veri needs – – PC from C R(S, Acceptance 1: Veri’s operation- access • Veri needs – – PC from C R(S, PC) from Dev S from Dev B from Org • Veri could check R(S, PC) at Dev’s • Veri needs to verify that S (belonging normally to Dev) corresponds to B that Org runs. UBC 29/7/03 13

Acceptance 2: overhead • Veri runs proof checking on a control basis • Org’s Acceptance 2: overhead • Veri runs proof checking on a control basis • Org’s execution ovhd ? UBC 29/7/03 14

Issues • Naming the fields: XML or disclosure • restricted class of theorems for Issues • Naming the fields: XML or disclosure • restricted class of theorems for a given P-automating proof techniques for this class UBC 29/7/03 15

Acceptance 3: C’s perspective • Building PCs must be easy for C, based on Acceptance 3: C’s perspective • Building PCs must be easy for C, based on D and processing schema; initially a closed set? • permissions could be encoded on a credit card, smart card, in the electronic wallet • or in the CA; they can then be dynamically modified and revoked UBC 29/7/03 16

 « Political » aspects: who is Veri? • generally trusted – – « « Political » aspects: who is Veri? • generally trusted – – « consumer association » ? – « Ralph Nader » ? – « transparency international » ? • IT expert at the level of instrumenting and running the proof checker – connection to Open Software Foundation? • theorem proving can be cast as « better testing » UBC 29/7/03 17

how to make Orgs buy in? • The first Org is needed to volunteer how to make Orgs buy in? • The first Org is needed to volunteer • a Green Data Mining logo will be granted and administered (verified) by Veri • other Orgs will have an incentive to join UBC 29/7/03 18

Future work • Build the tools • expand the prototype • extend from Weka Future work • Build the tools • expand the prototype • extend from Weka to commercial data mining packages • Integrate with P 3 P? • find a willing Org UBC 29/7/03 19

UBC 29/7/03 20 UBC 29/7/03 20

Link between S and B • compilation not an option • watermarking solution: B Link between S and B • compilation not an option • watermarking solution: B is watermaked by a slightly modified compiler with MD 5(tar(S)) =128 bytes • marks are inserted by a trusted makefile -and-compiler in locations in B given by Veri and unknown to Org UBC 29/7/03 21

Link… • Veri, given access to S, can verify that B corresponds to S Link… • Veri, given access to S, can verify that B corresponds to S • An attack by I requires hacking the compiler • An attack by Org requires knowing the locations of watermarks UBC 29/7/03 22

Example C restricts her Employee data from participating in a join with her Payroll Example C restricts her Employee data from participating in a join with her Payroll data Record Payroll : Set : = mk. Pay{PID : nat; Join. Ind : bool; Position : string; Salary: nat}. Record Employee : Set : = mk. Emp{Name : string; EID : nat; …}. Record Combined : Set : = mk. Comb{CID : nat; CName : string; Csalary: nat; …}. UBC 29/7/03 23

Fixpoint Join [Ps: list Payroll]: (list Employee) (list Combined) : = [Es : list Fixpoint Join [Ps: list Payroll]: (list Employee) (list Combined) : = [Es : list Employee] Cases Ps of nil (nil Combined) | (cons p ps) (app (check_Join. Ind_and_find_employee_record p Es) (Join ps Es)) end. (check_Join. Ind_and_find_employee_record p Es) if a record is found in Es whose EID matches Ps PID and Join. Ind permits Join, then a list of length 1 with the result of Join is returned, otherwise empty UBC 29/7/03 24

Definition Pc: = [S: ((list Payroll) (list Employee) (list Combined)) Prop] Ps: list Payroll}. Definition Pc: = [S: ((list Payroll) (list Employee) (list Combined)) Prop] Ps: list Payroll}. Es: list Employee. (Unique. Join. Ind Ps) P: Payroll. (In P Ps) ((Join. Ind P)=false not C: Combined ((In C (S Ps Es)) ((CID C)=(PID P))) • PC(S) is written as (PC Join): Coq expands the definition of PC and provides theorem • request to proof checking operator of Coq will check this proof i. e; it will check that the user permissions are encoded into the Join program given • Whole proof: 300 lines of Coq code; proof checking: 1 sec on a 600 MHz machine UBC 29/7/03 25