5103e58072e3b2360f8ee0b6129cc76e.ppt
- Количество слайдов: 39
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung, Concordia University Bipin C. Desai, Concordia University Nériah M. Sossou, Société de transport de Montréal
Outline 2 Introduction Related Work Preliminaries Sanitization Algorithm Experimental Results Conclusion 2
The STM Story 3 p p p The Société de transport de Montréal (STM) is the public transit agency in Montreal area. The smart card automated fare collection system generates and collects huge volume of transit data every day. Transit data needs to be shared for many reasons. 3
Transit Data 4 p Transit data, a kind of sequential data, consists of sequences of time-ordered locations. A station in the STM network 4
Privacy Threats 5 Alice visited L 4 and then L 1 5
Privacy Threats 6 Alice also visited L 2 … 6
Differential Privacy [1] 7 Pr. M[M(D) = D*] ≤ exp(ε) × Pr. M[M(D’) = D*] 7
Technical Challenges 8 p p p Suppose there are 1, 000 stations in the STM network Suppose the maximum number of stations visited by a passenger is 20 Traditional differentially private mechanisms are dataindependent: sequences to consider! Computationally infeasible 8
Contributions 9 p The first practical solution for publishing real-life sequential data under differential privacy p p A study of the real-life transit data sharing scenario at the STM The use of a hybrid-granularity prefix tree for datadependent publication and an efficient implementation based on a statistical process p Enforcement of two sets of consistency constraints p Seamless extension to trajectory data 9
Outline 10 Introduction Related Work Preliminaries Sanitization Algorithm Experimental Results Conclusion 10
Related Work 11 p p Abul et al. [2] achieves (k, δ)-anonymity by space translation. Terrovitis and Mamoulis [3] limits an adversary’s confidence of inferring the presence of a location by global suppression. Yarovoy et al. [4] k-anonymize a moving object database (MOD) by considering timestamps as the quasi-identifiers. Chen et al. [5] achieves the (K, C)L-privacy model by local suppression. Is it possible to employ a much stronger privacy model while achieving desirable utility? 11
Outline 12 Introduction Related Work Preliminaries Sanitization Algorithm Experimental Results Conclusion 12
Laplace Mechanism [1] 13 ε/(2 Δf) ε: privacy parameter (privacy budget) Δf: global sensitivity (e. g. , the maximum change of f due to the change of a single record). 13
Composition Properties 14 Sequential composition ∑iεi –differential privacy Parallel composition max(εi)–differential privacy 14
Prefix Tree 15 A simple but effective way to explore the entire output domain 15
Utility Requirements 16 p Count query: E. g. , how many passengers have visited both Guy-Concordia and Mc. Gill stations? p Frequent sequential pattern mining: E. g. , what are the most popular sequences of stations being visited? 16
Outline 17 Introduction Related Work Preliminaries Sanitization Algorithm Experimental Results Conclusion 17
Sanitization Algorithm 18 Complexity: 18
Noisy Prefix Tree 19 p p Each level consists of two sub-levels with different location granularities Each level receives ε/h privacy budget 19 19
Efficient Implementation 20 p Separately handle empty and non-empty nodes 20
Hybrid-Granularity 21 Or For an empty node on level i, we reduce noise by a factor of 21
Consistency Constraints 22 p For any root-to-leaf path p, where vi is a child of vi+1. p For each node v, 22
Consistency Enforcement 23 p Constraint Type Ⅰ [6] p Constraint Type Ⅱ 23
Outline 24 Introduction Related Work Preliminaries Sanitization Algorithm Experimental Results Conclusion 24
STM Datasets 25 p Real-life STM datasets are used for evaluation: Datasets |D| |L| max|S| avg|S| Metro 847, 668 68 90 4. 21 Bus 778, 724 944 121 5. 67 25
Average Relative Error vs. ε 26 26
Average Relative Error vs. ε 27 27
Average Relative Error vs. h 28 28
Average Relative Error vs. h 29 29
Utility vs. k 30 k TP (M/B) Simple FP (FD) (M/B) Simple TP (M/B) Hybrid FP (FD) (M/B) Hybrid 100 150 200 250 300 99/97 143/139 178/168 209/195 241/212 1/3 7/11 22/32 41/55 59/88 100/100 149/144 185/177 220/209 257/233 0/0 1/6 15/23 30/41 43/67 30
Utility vs. ε 31 ε TP (M/B) Simple FP (FD) (M/B) Simple TP (M/B) Hybrid FP (FD) (M/B) Hybrid 0. 5 0. 75 1. 0 1. 25 1. 5 227/194 239/206 241/212 243/216 248/224 73/106 61/94 59/88 57/84 52/76 244/215 253/224 257/233 259/238 261/242 56/85 47/76 43/67 41/62 39/58 31
Utility vs. h 32 h TP (M/B) Simple FP (FD) (M/B) Simple TP (M/B) Hybrid FP (FD) (M/B) Hybrid 6 8 234/212 240/217 66/88 60/83 241/221 254/232 59/79 46/68 10 12 14 16 18 20 241/215 241/212 240/210 240/209 238/206 58/85 59/88 60/90 60/91 62/94 255/236 257/233 258/231 255/230 254/228 45/64 43/67 42/69 45/70 46/72 32
Scalability 33 33
Outline 34 Introduction Related Work Preliminaries Sanitization Algorithm Experimental Results Conclusion 34
Conclusion 35 p p It is possible to publish useful transit data (sequential data) under differential privacy. Generally, a data-dependent solution outperforms a dataindependent solution. It is worth exploring the utility of released data for more complex data analysis tasks. It is important to educate transport service practitioners. 35
References 36 p p p C. Dwork, F. Mc. Sherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006. O. Abul, F. Bonchi, and M. Nanni. Never walk alone: Uncertainty for anonymity in moving objects databases. In ICDE, 2008. M. Terrovitis and N. Mamoulis. Privacy preservation in the publication of trajectories. In MDM, 2008. R. Yarovoy, F. Bonchi, L. V. S. Lakshmanan, and W. H. Wang. Anonymizing moving objects: How to hide a MOB in a crowd? In EDBT, 2009. R. Chen, B. C. M. Fung, N. Mohammed, and B. C. Desai. Privacypreserving trajectory data publishing by local suppression. Information Sciences, in press. M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. PVLDB, 2010. 36
37 Thank You Very Much Q&A 37
38 Back-up Slides 38
Detailed Algorithm 39 39
5103e58072e3b2360f8ee0b6129cc76e.ppt