Скачать презентацию The Bloom Paradox Ori Rottenstreich Joint work with Скачать презентацию The Bloom Paradox Ori Rottenstreich Joint work with

ca3f90b2d6d894b14f844278ef5c06e4.ppt

  • Количество слайдов: 25

The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel

Problem Definition y x user x y cost = 1 S cost = 10 Problem Definition y x user x y cost = 1 S cost = 10 local cache x cost = 10 z M central memory with all elements x z y • Requirement: A data structure in user with fast answer to • Solutions: o O(n) – Searching in a list o O(log(n)) – Searching in a sorted list o O(1) – But with false positives / negatives 2 u v y user

Two Possible Errors • False Positive: but the data structure answers • Results in Two Possible Errors • False Positive: but the data structure answers • Results in a redundant access to the local cache. y Ø Additional cost of 1. • False Negative: but the data structure answers • Results in an expensive access to the central memory instead of the local cache. x Ø Additional cost of 10 -1=9. 3

Bloom Filters (Bloom, 1970) • Initialization: Array of 0 0 zero bits. 0 0 Bloom Filters (Bloom, 1970) • Initialization: Array of 0 0 zero bits. 0 0 0 0 0 • Insertion: Each of the elements is hashed times, the corresponding bits are set. • Query: Hashing the element, checking that all bits are set. y x 1 0 1 0 0 1 x 1 1 1 0 1 11 0 1 z 0 1 1 0 0 1 1 1 w 1 0 0 • False positive rate (probability) of • No false negatives 4

Bloom Filters are Widely Used • • • Cache/Memory Framework Packet Classification Intrusion Detection Bloom Filters are Widely Used • • • Cache/Memory Framework Packet Classification Intrusion Detection Routing Accounting Beyond networking: Spell Checking, DNA Classification • Can be found in o Google's web browser Chrome o Google's database system Big. Table o Facebook's distributed storage system Cassandra o Mellanox's IB Switch System 5

Outline Ø Introduction to Bloom Filters Ø The Bloom Paradox Ø The Variable-Increment Counting Outline Ø Introduction to Bloom Filters Ø The Bloom Paradox Ø The Variable-Increment Counting Bloom Filter 6

The Bloom Paradox Sometimes, it is better to disregard the Bloom filter results, and The Bloom Paradox Sometimes, it is better to disregard the Bloom filter results, and in fact not to even query it, thus making the Bloom filter useless. 7

Example Bloom filter • Parameters: • Extreme case without locality: All elements with equal Example Bloom filter • Parameters: • Extreme case without locality: All elements with equal probability of belonging to the cache. o Toy example 8

The Bloom Paradox • Parameters: • Let be the set of elements that the The Bloom Paradox • Parameters: • Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives → • Intuition: B user Bloom filter cost = 1 S cost = 10 z . central memory with all elements x local cache x M . z y 9 u v

The Bloom Paradox • Parameters: • Let be the set of elements that the The Bloom Paradox • Parameters: • Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives → • Surprise: B Bloom filter cost = 1 S cost = 10 z . central memory with all elements x local cache x M . z y 9 u v

The Bloom Paradox • Parameters: • Let be the set of elements that the The Bloom Paradox • Parameters: • Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives → • Surprise: B Bloom filter . . The Bloom filter indicates the membership of elements. Only of them are indeed in .

The Bloom Paradox • When the Bloom filter states that , it is wrong The Bloom Paradox • When the Bloom filter states that , it is wrong with probability • Average cost if we listen to the Bloom filter: • Average cost if we don’t: = = The Bloom filter is useless!11 Don’t listen to the Bloom filter

Outline Ø Introduction to Bloom Filters Ø The Bloom Paradox Ø The Variable-Increment Counting Outline Ø Introduction to Bloom Filters Ø The Bloom Paradox Ø The Variable-Increment Counting Bloom Filter 12

Counting Bloom Filters (CBFs) • Bloom filters do not support deletions of elements. Simply Counting Bloom Filters (CBFs) • Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives. y x 1 0 1 1 0 0 0 • The solution: Counting Bloom filters - Storing array of instead of bits. o Insertion: Incrementing counters by one. o Deletion: Decrementing counters by one. o Query: Checking that counters are positive. y x +1 +1 0 +1 +1 1 0 0 2 0 +1 1 counters +1 0 1 • The same false positive probability. • Require too much memory, e. g. 57 bits per element for 0 .

Intuition for Variable Increments • Upon query, we should consider the exact values of Intuition for Variable Increments • Upon query, we should consider the exact values of the counters and not just their positiveness 0 1 0 2 y 5 0 1 8 3 0 2 1 z • Can we design a deterministic scheme that exploits the exact values of the counters? • Idea: Use variable increments to encode the element identity 14

Architecture • Each hash entry contains a pair of counters: o , fixed increments Architecture • Each hash entry contains a pair of counters: o , fixed increments → number of elements in entry (as in CBF) o , variable increments → weighted sum of elements o weights from a pre-determined set • We use two sets of hash functions: o The first set uses hash functions with range , i. e. it points to the set of entries. o The second set uses hash functions with range , i. e. it points to the set. 1 c 2 2 3 4 5 6 7 8 9 0 5 3 2 2 3 3 3 4 0 34 25 26 17 21 9 6 26 15

Insertion • Insertion: At each entry , the two counters are updated as follows. Insertion • Insertion: At each entry , the two counters are updated as follows. o from the set o • Example 1: 1 c 2 2 3 4 5 6 7 8 9 001 5 334 2 324 3 3 45 3 4 0 08 34 25 29 25 17 17 30 43 21 30934 13 26 +8 x +4 +13 z +4 16

Query • Query y ( with ) 1 c 2 2 3 4 5 Query • Query y ( with ) 1 c 2 2 3 4 5 6 7 8 9 0 5 3 2 3 3 4 0 34 25 17 30 21 30 13 26 4? 8? y? • We ask whether o 17 can be a sum of 2 elements from the set o 30 can be a sum of 3 elements from the set • No: • How should we pick the set of variable increments? We should use including 4 including 8 Sequences! 17

Bh Sequences • Definition 1: Let Then, with is a be a sequence of Bh Sequences • Definition 1: Let Then, with is a be a sequence of positive integers. sequence iff all the sums are distinct. • Example 2: All the sums of • elements of are distinct: Therefore, is a sequences are widely used in error-correcting codes. 18

The Bh-CBF Scheme Query • Example 3: is a sequence 1 c 2 2 The Bh-CBF Scheme Query • Example 3: is a sequence 1 c 2 2 3 4 5 6 7 8 9 0 5 3 2 3 3 4 34 25 17 30 21 30 13 26 0 1? 4? X? o Since , then the Bh-CBF can determine that 19

The Bh-CBF Scheme Operations The Bh-CBF Scheme Query • Example 3: is a sequence The Bh-CBF Scheme Operations The Bh-CBF Scheme Query • Example 3: is a sequence 1 c 2 2 3 4 5 6 7 8 9 0 5 3 2 3 3 4 0 34 25 17 30 21 30 13 26 1? X? o Here, Since 4? 8? y? and then necessarily , the Bh-CBF can determine that 19

The Bh-CBF Scheme Operations The Bh-CBF Scheme Query • Example 3: is a sequence The Bh-CBF Scheme Operations The Bh-CBF Scheme Query • Example 3: is a sequence 1 c 2 2 3 4 5 6 7 8 9 0 5 3 2 3 3 4 0 34 25 17 30 21 30 13 26 1? X? o Since 4? y? 8? 4? 13? z? , the Bh-CBF cannot exclude that 19

Experimental Results • Internet trace (equinix-chicago) with real hash functions. For the Bh-CBF, (with Experimental Results • Internet trace (equinix-chicago) with real hash functions. For the Bh-CBF, (with ). 20

Concluding Remarks • The Bloom Paradox o Discovery of the Bloom paradox o Importance Concluding Remarks • The Bloom Paradox o Discovery of the Bloom paradox o Importance of the a priori membership probability • The Variable-Increment Counting Bloom Filter o Can extend many variants of the counting Bloom filter o First time sequences are presented in networking applications 21

Thank You Thank You