Скачать презентацию Privacy Preserving SQL Query Execution on Distributed Data Скачать презентацию Privacy Preserving SQL Query Execution on Distributed Data

8bfea8c1c74786db32588c82b68f24cf.ppt

  • Количество слайдов: 43

Privacy Preserving SQL Query Execution on Distributed Data Benjamin Nguyen, LIFO, INSA Centre Val Privacy Preserving SQL Query Execution on Distributed Data Benjamin Nguyen, LIFO, INSA Centre Val de Loire Joint work with Cuong Quoc To & Philippe Pucheral DAVID, Inria SMIS, UVSQ So. Sy. Sec Seminar 8 th April 2016

I. III. IV. V. PART I The New Oil Trusted Cells Global SQL Queries I. III. IV. V. PART I The New Oil Trusted Cells Global SQL Queries Cost Model and Experiments Conclusion

Mass-generation of (personal) data St Peter's Place, Roma Data sources have mostly turned digital Mass-generation of (personal) data St Peter's Place, Roma Data sources have mostly turned digital Analog processes People listnening e. g. , photography, films Paper-based interactions e. g. , banking, e-administration Communications e. g. , email, SMS, MMS, Skype (automatic) Information & Knowledge Extraction People recording Where is your personal data? … In data centers 112 new emails per day Mail servers 65 SMS sent per day Telcos 800 pages of social data Social networks Web searches, list of purchases google, amazon Is this WHY ? a problem ? Everything is free… PR SM 3 / 41

Personal data is the new oil Is this good news ? § $2 billion Personal data is the new oil Is this good news ? § $2 billion a year spend by US companies on third-party data about individuals (Forrester Report) § $44. 25 is the estimated return on $1 invested in email marketing (oil is up to 0. 5$/yr) §High Market Value Companies § § § Facebook: value / #accounts 50$ Google: $38 billion business sells ads based on how people search the Web Amazon (knows purchase intent), mail order systems companies (gmail), loyalty programs (supermarkets), banks & insurrance, employement market (linked. In, viadeo), travel & transportation (voyages-sncf), the « love » market (meetic), etc. PR SM 4 /41

Personal data is the new oil … or bad news ? How would oil Personal data is the new oil … or bad news ? How would oil companies behave ? • • • Exploit your oil field for free Know all about you $ Offer “extra” services Refine their knowledge $ $ $ $ Provide real services to their paying customers (e. g. advertisement and profiling, location tracking and spying, …) In other words : your personal data would be processed by sophisticated data refineries… REGARDLESS OF YOUR PRIVACY ! It’s the business model… PR SM Your choice Their choice 5 /41

Data Siloing Implementation = Centralisation = Intrinsic Privacy Problems All these data analytics are Data Siloing Implementation = Centralisation = Intrinsic Privacy Problems All these data analytics are run on « centralised » (e. g. data centers) Intrinsic problem #1: personal data is exposed to sophisticated attacks –High benefits to successful hack (or leak…) –One person negligence may affect millions Intrinsic problem #2: personal data is hostage of sudden privacy changes –Centralised administration of data means delegation of control –This leads to regular changes, with application (and business) evolution, with mergers and acquisition, etc. (e. g Facebook 2012) Increasing security is only a partial solution since does not solve those intrinsic limitations E. g. , Trusted. DB [BS 12] proposes tamper-resistant hardware to secure outsourced centralized databases. PR SM 6 /41

A New Hope A Personal Data Ecosystem… I want my privacy back !! … A New Hope A Personal Data Ecosystem… I want my privacy back !! … built around user-centricity and trust, achieved through a decentralized architecture with the same computing expressivity THE TRUSTED CELL ! Our goals : Preserve current USER functionalities Hinder uncontrolled data exploitation & privacy violations Create sustainable business models Our targets : General Data Management Applications : SQL “Low cost” solutions (i. e. acceptable by general public) PR SM 7 /41

I. III. IV. V. PART II Trusted Cells The New Oil Trusted Cells Global I. III. IV. V. PART II Trusted Cells The New Oil Trusted Cells Global SQL Queries Cost Model and Experiments Conclusion

The Secure (Trusted) Personal Data Server Approach [VLDB’ 10] : Securely Store and query The Secure (Trusted) Personal Data Server Approach [VLDB’ 10] : Securely Store and query data locally TRUSTED CELL Personal database is • Well-organized • Tamper resistant • Controlled by the owner (sharing, retention, audit) • Accessible in disconnected mode Approach characteristics : • • Well Structured World (R-DB, limited apps) • PR SM Based on tamper-resistant HW Uniform equipment 9 /41

Why trust personal secure HW solutions? Gemalto secure token 1. Users store their own Why trust personal secure HW solutions? Gemalto secure token 1. Users store their own data minimize abusive usage 2. Auto-administered platform Tamper resistance SMIS token (ZED) no DBA attack (even by user) Trust Zone architecture 3. Enforce privacy principles for externalized (shared) data best if the recipient of the data is another TC Dedicated HW device PC ? (social trust / open source) PR SM 4. Tamper-resistance + certified code/secure execution + single user + physical access needed ratio cost/benefit of an attack is very high 10 /41

The Trusted Cell Asymmetric Architecture [CIDR’ 13] : Distributed management of securely stored data The Trusted Cell Asymmetric Architecture [CIDR’ 13] : Distributed management of securely stored data TC asymmetric architecture Built using Secure Portable Tokens as Trusted Cells (called here Trusted Data Server or TDS) / Cloud as Supporting Server Infrastructure (SSI). Challenges : Local (Embedded) data management (not my work : Anciaux, Bouganim, Pucheral et al. ) Global querying (Part III) Data export management (Min. Exp Project with CG 78 & LIX) ASYMMETRIC LOW POWER / AVAILABILITY HIGH TRUST Durability, Availability HIGH POWER / AVAILABILITY LOW / NO TRUST Export Data Secure Computation Encrypted Private Data Generated (e. g. sensor) PR SM 11 /41

I. III. IV. V. The New Oil Trusted Cells Global SQL Queries Cost Model I. III. IV. V. The New Oil Trusted Cells Global SQL Queries Cost Model and Experiments Conclusion PART III Global SQL Queries on the Asymmetric Architecture

Example Trusted Cell : a Trusted Data Server (TDS) Average Salary in Rennes Authorized Example Trusted Cell : a Trusted Data Server (TDS) Average Salary in Rennes Authorized Querier Unauthorized Querier How to compute global queries over decentralized personal data stores while respecting users’ privacy? Token Characteristics : • High security: • High ratio Cost/Benefit of an attack; • Secure against its owner; • Modest computing resources (~10 Kb of RAM, 50 MHz CPU); • Low availability: physically controlled by its owner; connects and disconnects at it will PR SM 13 13 /41

Secure Global Computation on TCs PROBLEM : How to perform global queries on the Secure Global Computation on TCs PROBLEM : How to perform global queries on the asymmetric architecture? (i. e. using data from many/all cells) THREAT MODEL : Infrastructure (SSI) can be : Honest but curious : illicit data access Malicious (Covert Adversary) : Do. S TC can be : Unbreakable (honest) Broken (Malicious) HBC + Unbreakable “simple protocols” presented here (see EDBT’ 14) Mal (+ Broken) Must be prevented ! (via security primitives, see TODS’ 16 and DAPD’ 13) The « classical » problem of Secure Global Computation (e. g SMC) is more general and makes no trust assumption. PR SM /41

Is this a new problem ? Several approaches are possible to securely perform global Is this a new problem ? Several approaches are possible to securely perform global computations: 1. Use only an untrusted server/cloud/P 2 P and use generic (and costly) algorithms. (e. g. Secure Multi-Party Computing [Yao 82, GMW 87, CKL 06], fully homomorphic encryption [Gent 09]) Problem = COST 2. Use only an untrusted server/cloud/P 2 P and develop a specific algorithm for each specific class of queries or applications. (e. g. Data. Mining Toolkit [CKV+02]) Problem = GENERICITY 3. Introduce a tangible element of trust, through the use of a trusted component and develop a generic methodology to execute any centralized algorithm in this context. ([Katz 07, GIS+10, AAB+10]) Problem = TRUST PR SM 15 /41

Hypothesis on Querier and SSI Querier: • Shares the secret key with TDSs (for Hypothesis on Querier and SSI Querier: • Shares the secret key with TDSs (for encrypt the query & decrypt result). • Classical Access control policy (e. g. RBAC): – – Cannot get the raw data stored in TDSs (get only the final result) Can obtain only authorized views of the dataset ( do not care about inferential attacks) Supporting Server Infrastructure: • Doesn’t know query (so, attributes in GROUP BY clause) b/c query is encrypted by Querier before sending to SSI. • Has prior knowledge about data distribution. • Honest-but-curious attacker: Frequency-based attack – SSI matches the plaintext and ciphertext of the same frequency. e. g. investigates remarkable (very high/low) frequencies in dataset distribution 16 PR SM 16 /41

Quantifying an attack : Information Exposure Analysis (DCJP+03) • To measure Information Exposure, we Quantifying an attack : Information Exposure Analysis (DCJP+03) • To measure Information Exposure, we consider the probability that an attacker (here the Honnest but Curious SSI) can reconstruct the plaintext table (or part of the table) using the encrypted table and his prior knowledge about global distributions of plaintext attributes. • Information Exposure is noted : • n is the number of tuples • k is the number of attributes • ICi, j is the value in row i and column j of the inverse cardinality ( = 1/number of plaintext values that could 17 correspond) PR SM • N is the number of distinct plaintext values in the global

Simple Solution Overview John, 35 K 3) Aggregation phase SSI groups … + TDS Simple Solution Overview John, 35 K 3) Aggregation phase SSI groups … + TDS Decypher Mary, 43 K Paul, 100 K Broadcast query to TDS 4) Aggregate Filtering phase (#x 3 Z, 34) (#x 3 Z, 15) ($&1 z, 48) Supporting Server Infrastructure (SSI) … 2) Collection and Filtering phase SSI performs grouping ($&1 z, 24) SELECT age, AVG(salary) FROM user WHERE town = “Rennes” 5) Result GROUP BY age HAVING MIN(salary) > 0 SIZE 1) Query SELECT FROM [WHERE ] [GROUP BY ] [HAVING ] [SIZE ]; 18 Stop condition: max #tuples or max time PR SM

Proposed Solutions The main difficulty is with AGGREGATE QUERIES !! Solutions vary depending on Proposed Solutions The main difficulty is with AGGREGATE QUERIES !! Solutions vary depending on which kind of encryption is used, how the SSI constructs the partitions, and what information is revealed to the SSI. • Secure aggregation solution (presented briefly here) • Noise-based solutions (see paper) – – random (white) noise controlled by the complementary domain • Histogram-based solutions (see paper) We investigate these solutions along the directions of performance and security. PR SM 19

Q: SELECT Age, AVG(Salary) Secure Aggregation WHERE city = Rennes GROUP BY Age (#x Q: SELECT Age, AVG(Salary) Secure Aggregation WHERE city = Rennes GROUP BY Age (#x 3 Z, a. W 4 r) (25, 35 K) HAVING Min(Salary) > 0 (F!d 2, s 7@z) (25, [35 K, 1]) Decrypt Qi ($f 2&, b. G? 3) (45, 43 K) i Decrypt Q (ZL 5=, w 2^Z) Check AC rules (45, 37 K) Check AC rules (45, [40 K, 2]) ($&1 z, k. Ha 3) Hold partial aggregation (Gij, AGGk) … (#i 3 Z, af. WE) (45 y, Rennes, 43 K) (25 y, Rennes, 35 K) (T? f 2, s!@a) Evaluate HAVING clause ($f 2&, b. Ga 3) encrypts its data using non-deterministic encryption (53 y, Paris, 100 K) (? i 6 Z, af~E) (T? f 2, s 5@a) (5 f 2 A, b. G!3) ($f 2&, b. G? 3) Final Agg Final Result (#x 3 Z, a. W 4 r) No answer ? (#f 4 R, b. Z_a) (Ye”H, Form partitions (fit resource of a TDS) (Ye”H, fw%g) Supporting Server fw%g) (#x 3 Z, a. W 4 r) Infrastructure (SSI) (@!fg, w. Z 4#) ($f 2&, b. G? 3) ($&1 z, k. Ha 3) Querier (25, 29. 5 K) (45, 43. 7 K) … Qi= PR SM … (T? f 2, s 5@a) } 20

Noise Based Protocols Secure Aggregation Efficiency problem : n. Det_Enc on AG SSI cannot Noise Based Protocols Secure Aggregation Efficiency problem : n. Det_Enc on AG SSI cannot gather tuples belonging to the same group into same partition. Objective : Det_Enc on AG frequency-based attack. Idea : Add noise (fake tuples) to hide distribution of AG. How many fake tuples (nf) needed? disparity in frequencies among AG – small nf: random noise (here nf=10) – big nf: white noise ( nf>|max(x)-min(x)|2 ) – nf = n-1 per tuple : controlled noise (here nf=10*55=550) Efficiency: – Each TDS handles tuples belonging to one group (instead of large partial aggregation as in – SAgg) However, high cost of generating and processing the very large number of fake tuples PR SM 21 /41

Nearly Equi-Depth Histogram Solution 1. Distribution of AG is discovered and distributed to all Nearly Equi-Depth Histogram Solution 1. Distribution of AG is discovered and distributed to all TDSs. True Distribution Nearly equi-depth histogram 2. TDS allocates its tuple to corresponding bucket. 3. TDS send to SSI: {h(bucket. Id), n. Det_Enc(tuple)} Consequences : We do not generate & process too many fake tuples We do not handle too large partial aggregation Problem : Distribution must be discovered This can be done “offline” using secure 22 aggregation ! PR SM 22 /41

Information Exposure Analysis (Damiani et al. CCS 2003) • n: the number of tuples, Information Exposure Analysis (Damiani et al. CCS 2003) • n: the number of tuples, • k: the number of attributes, • ICi, j : IC for row i and column j • Nj: the number of distinct plaintext values in the global distribution of attribute in column j (i. e. , Nj ≤ n) SAgg: ICi, j = 1/Nj for all i, j EDHist: requires finding all possible partitions of the plaintext values such that the sum of their occurrences is the cardinality of the hashed value: NP-Hard multiple subset sum problem Noise_based & ED_Hist have a uniform distribution of the AG: ɛED_Hist = ɛNoise_based Plaintext: ɛ S_Agg PR SM ≤ ɛED_Hist =ɛNoise_based <1 23 23

I. III. IV. V. PART IV Cost Model and experiments The New Oil Trusted I. III. IV. V. PART IV Cost Model and experiments The New Oil Trusted Cells Global SQL Queries Cost Model and Experiments Conclusion

Unit Test Calibration Internal time consumption } Eval Board • 32 bit RISC CPU: Unit Test Calibration Internal time consumption } Eval Board • 32 bit RISC CPU: 120 MHz • Crypto-coprocessor: AES, SHA • 64 KB RAM, 1 GB NAND-Flash • USB full speed: 12 Mbps PR SM SMIS developed token (manufactured by SMEs) Same technical characteristics Price = 50 -300 EUR (small series, 25 depending on caracteristics) 25 /41

Parameters for cost model Dataset size Ttuple : varies from 5 to 65 million Parameters for cost model Dataset size Ttuple : varies from 5 to 65 million Number of groups G : varies from 1 to 106 Number of TDSs participating in the computation as a percentage of all TDSs connected at a given time Ttds : varies from 1% to 100%). We fix two parameters and vary the other, measuring : execution time, parallelism of the protocol, total load, maximum load on one TDS When the parameters are fixed : Ttuple =106, G=103, % of TDS connected = 10% of Ttuple. We also compute and use the optimal value for all reduction factors as well as for. In the figures, we plot two curves for Rnf_Noise protocols RN (nf = 2) and WN (nf = 1000) to capture the impact of the ratio of fake tuples. PR SM 26 /41

EXECUTION TIME 6 6 Ttuple=10 ; G=1 -10 Ttuple=5. 106 - 35. 106; G=1000 EXECUTION TIME 6 6 Ttuple=10 ; G=1 -10 Ttuple=5. 106 - 35. 106; G=1000 • 0. 7 • 100 • 0. 6 • SC • 1 • Nv • 0. 1 • RN • 1 00 00 00 • ED • 1 00 00 • 0. 0001 • 1 00 0 • CN • 1 00 • 0. 001 • 1 0 • WN • 1 • 0. 01 • EW • Number of groups (G) • Time (second) • 10 • SC • 0. 5 • Nv • 0. 4 • RN • 0. 3 • WN • 0. 2 • CN • 0. 1 • ED • EW • 0 • Total number of tuples (millions) Naïve, noise-based, ED&EW: • G increases, Ttuple fixed Number of tuples in each group decreases • Depend only on the total number of tuples in each group (because all groups are processed in parallel) exe. Time decreases when G increases. Secure Count: • G increases time for processing the big partial aggregation increases accordingly. • Cannot fully deploy the parallel computation (cannot divide each group for TDSs in parallel, each TDS has to handle the whole G groups) exe. Time increases PR SM Naïve, RN, ED&EW: • Ttuple increases, Ttds increases accordingly not much changes Secure Count: • Number of recursive steps increases when Ttuple increases. exe. Time increase WN, CN: • Number of fake tuples increases linearly with the number of true tuples. 27 exe. Time also increases linearly to handle the fake & true tuples 27 /41

NUMBER OF PARTICIPATING TDSS 6 6 6 Ttuple=5. 10 - 35. 106; G=1000 • NUMBER OF PARTICIPATING TDSS 6 6 6 Ttuple=5. 10 - 35. 106; G=1000 • 1 E+7 • SC • 1 E+6 • Nv • 1 E+5 • RN • 1 E+4 • WN • 1 E+3 • CN • 1 E+2 • ED 0 00 00 00 • 10%Ttuple • 4 • 3. 5 • 3 • SC • 2. 5 • Nv • 2 • RN • 1. 5 • WN • CN • 1 • ED • 0. 5 • EW • 0 • Total number of tuples (millions) • 1 00 0 • 1 0 00 0 • 1%Ttuple • 1 00 0 • 1 E+0 0 • EW • 1 0 • 1 E+1 • 1 • Number of participating PDSs • 1 E+8 • Number of participating PDSs (millions) Ttuple=10 ; G=1 -10 • Number of groups (G) Secure Count: • G increases level of convergence is low & the size of each aggregation is big need less participating TDSs to build the aggregations to gain the high convergence level Other solutions: • Since each group is processed in parallel and independently when G increases, the level of parallelism increases more TDSs are needed to participate in the parallel computation PR SM WN, CN: • When true Ttuple increases, the fake tuples increases as well more TDSs are needed to process fake tuples Secure Count: • Level of parallelism is less than other solutions needs least TDS 28 28 /41

TOTAL LOAD (NETWORK OVERHEAD) Ttuple=106; G=1 -106 Ttuple=5. 106 - 35. 106; G=1000 • TOTAL LOAD (NETWORK OVERHEAD) Ttuple=106; G=1 -106 Ttuple=5. 106 - 35. 106; G=1000 • 100000 • 10000 • SC • Nv • 1000 • RN • WN • 100 • Size (Mbytes) • 100000 • SC • Nv • 10000 • CN • RN • WN • 1000 • CN • 100 • ED • 10 0 00 00 0 • EW • 10 • Total number of tuples (millions) • 1 0 00 • 1 00 00 0 • 1 00 0 0 • 1 • EW • Number of groups (G) Noised-based: • Highest load because of the fake tuples • When G increases but Tpds does not change number of tuples (both true and fake) do not change total load is the same Others: Lower load since handle only true data PR SM Noised-based: • When true Ttuple increases, the fake tuples increases linearly total load is highest and increases 29 29 /41

MAXIMUM LOAD Ttuple=106; G=1 -106 Ttuple=5. 106 - 35. 106; G=1000 • 1 E+8 MAXIMUM LOAD Ttuple=106; G=1 -106 Ttuple=5. 106 - 35. 106; G=1000 • 1 E+8 • 600000 • 500000 • 1 E+6 • SC • 1 E+5 • Nv • 1 E+4 • RN • 1 E+3 • WN • CN • 1 E+2 • ED 0 • Nv • RN • 300000 • WN • 200000 • CN • ED • 100000 • EW 00 00 00 • SC • 400000 • 1 00 0 • 1 00 0 0 • 1 E+1 • Size (bytes) • 1 E+7 • EW • 0 • Total number of tuples (millions) • Number of groups (G) Secure Count: • When G increases, size of each aggregation is big each PDS process bigger aggregation • When G increases, number of participating PDSs decrease each participating PDS incurs higher load Others: • When G increases, number of participating PDSs decrease & number of tuples in each group decreases each PDS process less tuples max. Load decrease PR SM WN, CN: • Use all available PDSs max. Load increases linearly when Ttuple increases Others: when Ttuple increases, the number of participating PDSs also increase accordingly in general, the max. Load does not increase too much 30 30 /41

AVERAGE LOAD Ttuple=5. 106 - 35. 106; G=1000 Ttuple=106; G=1 -106 • 1 E+8 AVERAGE LOAD Ttuple=5. 106 - 35. 106; G=1000 Ttuple=106; G=1 -106 • 1 E+8 • 1000000 • 1 E+6 • SC • 1 E+5 • Nv • 1 E+4 • RN • WN • 1 E+3 • CN • 1 E+2 • SC • 100000 • Size (bytes) • 1 E+7 • Nv • RN • 10000 • WN • CN • 1000 • ED • 1 E+1 0 00 00 00 • EW • 100 • Total number of tuples (millions) • 1 00 0 • 1 00 0 0 • 1 • EW • Number of groups (G) Secure Count: • Total load is unchanged but the number of participating TDSs is reduced when G increases the average load increases. WN, CN: • High total load is the same & all PTpds=10^5 participate in the computation every PDSs incur the same amount of load Others: • G increase, more participating PDSs & total load unchanged Avg. Load decreases PR SM Although: Total. Load(CN) > Total. Load(SC) PTpds(CN) >> PTpds(SC) Avg. Load(CN) < Avg. Load(SC) 31 31 /41

CONSUMED MEMORY Actual RAM size of TDS • 1 E+8 • Size (bytes) • CONSUMED MEMORY Actual RAM size of TDS • 1 E+8 • Size (bytes) • 1 E+7 • SC • 1 E+6 • Nv • 1 E+5 • RN • WN • 1 E+4 • CN • 1 E+3 • ED • 1 E+2 • EW • RAM • 1 E+1 • 100 • 1000000 • Number of groups (G) Noise-based: • Need to store only 1 group regardless of G Require least RAM. Histogram-based: • Each PDS store h groups (h>1) regardless of G Require higher RAM SC: • Each PDS store all G groups • When G increases, RAM needed increases Require highest RAM • Exceed actual RAM’s size future work PR SM 32 32 /41

Experimental Scalability (experiments on LIPN cluster) TODS’ 16 PR SM Experimental Scalability (experiments on LIPN cluster) TODS’ 16 PR SM

COMPARISON WITH OTHER STATE-OF-THE-ART METHODS Answering aggregation queries in a secure system model. (Ge COMPARISON WITH OTHER STATE-OF-THE-ART METHODS Answering aggregation queries in a secure system model. (Ge & Zdonic, VLDB 2007) DES: each value is decrypted and the computation is performed on the plaintext. Server must have access to secret key & plaintext (violates security requirements) Paillier: perform computation directly on the ciphertext using a secure homomorphic encryption scheme: enc(a + b) = enc(a) + enc(b) Server performs computation without having access to the secret key or plaintext. In the end, ciphertext are passed back to the trusted agent (i. e. , Key Holder) to perform a final decryption and simple calculation of the final result Hardware: • Linux workstation; • AMD Athlon-64 2 Ghz processor; • 512 MB memory • SC: depends mostly on G (slightly on Ttuple) • Others: not depends on G, but mostly on Ttuple • Query running time (seconds) • 40 • 35 • 30 • SC (G=1) • 25 • SC (G=100) • 20 • SC (G=1000) • SC (G=10000) • 15 • Plaintext • 10 • Paillier • DES • 5 34 • 0 PR SM • Number of records (million) 34 /41

Metrics for the evaluation of the proposed solutions Average Time/Load Query Response Time Throughput Metrics for the evaluation of the proposed solutions Average Time/Load Query Response Time Throughput Total Load Information Exposure Resource Variation 35 PR SM 35 /41

Trade-off between criteria Select. . From. . Where. . Group By AG G = Trade-off between criteria Select. . From. . Where. . Group By AG G = card (AG) Security: S_Agg > ED_Hist Performance: G > 10: ED_Hist faster than S_Agg G <= 10: ED_Hist slower than S_Agg 36 PR SM 36 /41

I. III. IV. V. PART V Conclusion and perspectives The New Oil Trusted Cells I. III. IV. V. PART V Conclusion and perspectives The New Oil Trusted Cells Global SQL Queries Cost Model and Experiments Conclusion

Short/Middle term research : Data intensive Computing on an Asymmetric Architecture SQL (With SMIS) Short/Middle term research : Data intensive Computing on an Asymmetric Architecture SQL (With SMIS) Queries here do not have joins ! Take into account more attack models (e. g. Broken Tokens) Field experiment on usability (with ISN / A. Katsouraki Ph. D thesis) Add usage control (A. Michel Ph. D thesis) Private/Secure Map. Reduce (With LIPN -- some results in Coopis’ 15) Investigate compatibility of our protocols. Develop new protocols. Check performance ! Secure Graph computations (With LIX) Study social networking applications Secure K-core and k-truss computations (Rossi Ph. D thesis) XML management Adapt the work on XQ 2 P (Butnaru, Gardarin, Nguyen) to the Trusted Cells context. Distributed Window Queries. PR SM /41

Promoting the Trusted Cells vision Trusted Cells “Core” Open hardware and software bundle : Promoting the Trusted Cells vision Trusted Cells “Core” Open hardware and software bundle : basic functionalities Local DB Distributed DB No. SQL DB needed to develop Pb. D personal data management applications ! Promote an open source community around Trusted Cells (UVSQ, INSA CVL, ENSIIE, INSA Lyon…) Beyond Tamper Resistant HW Results are useable even with lower trust elements. Include social trust / reputation. Use virtualization. PR SM /41

QUESTIONS ? 40 QUESTIONS ? 40

PR SM 41 PR SM 41

AVERAGE TIME FOR PDS TO CONNECT Ttuple=106; G=1 -106 Ttuple=5. 106 - 35. 106; AVERAGE TIME FOR PDS TO CONNECT Ttuple=106; G=1 -106 Ttuple=5. 106 - 35. 106; G=1000 • 100 • 0. 18 • 0. 16 • 1 • SC • 0. 1 • Nv • 0. 01 • RN • 0. 001 • WN • CN • 0. 0001 • 0. 14 • Time (second) • 10 • ED 0 00 00 00 • EW • 1 00 0 • 1 00 0 0 • 1 E-05 • Number of groups (G) Secure Count: • The number of participating PDSs is reduced when G increases the average time increases. WN, CN: • High total load is unchanged & all PTpds=10^5 participate in the computation every PDSs take the same amount of time to process data Others: • G increase, more participating PDSs Avg. Time decreases PR SM • SC • 0. 12 • Nv • 0. 1 • RN • 0. 08 • WN • 0. 06 • CN • 0. 04 • ED • 0. 02 • EW • 0 • Total number of tuples (millions) High Avg. Time: • WN, CN: because of too many fake tuples • SC: because of very few participating PDSs 42 42 /41

Theoretical Scalability • 0. 0001 • CN • 1 E-05 • ED • Number Theoretical Scalability • 0. 0001 • CN • 1 E-05 • ED • Number of groups (G) 00 00 • EW • 1 0 • EW 0 • WN 00 • 0. 001 • Number of groups (G) • 100 • 10 • 0. 001 • WN • 0. 0001 • CN • 1 E-05 43 • ED • Number of groups (G) 00 00 • EW • 1 00 0 PR SM 0 • RN 00 • 0. 01 00 0 • NV • 1 00 0 • 0. 1 0 • SC • 1 0 • 1 • Execution time (seconds) Secure Count: has a (low) maximum number of participants. Others: WN have higher scalability than others (in the sense that adding participants count) Tpds = 100%Ttuple • 1 0 00 00 • RN • 1 00 0 • 1 0 • 0. 01 • 1 00 0 • ED • NV 00 0 • 1 E-05 • 0. 1 • 1 0 • CN • SC • 1 00 0 • 0. 0001 • 1 0 • WN • 10 • 1 0 • 0. 001 0 • RN 00 • 0. 01 00 0 • NV • 1 00 0 • 0. 1 0 • SC • 1 0 • 100 • 10 • Execution time (seconds) Tpds = 10%Ttuple • 100 • 1 • Execution time (seconds) Tpds = 1%Ttuple 43 /41