99e77e0e13aede8ac4a65f69a88e8da6.ppt
- Количество слайдов: 25
Accelerating Boyer Moore Searches on Binary Texts Shmuel Tomi Klein Miri Kopel Ben-Nissan Bar Ilan University, ISRAEL
Outline Background and motivation Boyer Moore algorithm New binary variant Analysis Experiments Summary
Important application of Automata: PATTERN MATCHING Boyer & Moore KMP BDM BM Match Backwards ! ! this-is-a-sample-text--pattern
Boyer – Moore Algorithm Mismatch – case 1: b does not occur in y b x contains no b u x u a x delta 1 shift
Boyer – Moore Algorithm Mismatch – case 2: b occurs in y b x b u a x u x contains no b shift delta 1
Boyer – Moore Algorithm Mismatch – case 3: u reoccurs in y x preceded by b a x x c u u u shift delta 2 c ≠a
Boyer – Moore Algorithm Mismatch – case 4: delta 2 Only a suffix v of u reoccurs in y b a x x v u uv shift x
Boyer – Moore delta 1 example delta 2 Example a e l m p x rest 4 0 1 3 2 5 e x 7 a m p l e 12 11 10 9 8 7 1 is simple example simple here is a simple exampleexample
Problems of Binary Boyer & Moore this-is-a-sample-text--pattern most work by delta 1 0100101110100110101001 1101100 Bit-level processing delta 1 useless
Need for Binary Boyer & Moore Compressed Matching Given E(T) and P look for rather than E(P) in E(T) P in D(E(T)) Suggested Solution: BBBMM Blocked Binary Boyer Moore Matching
BBBMM k Text [ i ] Pat [ sh , j ] sh sl
BBBMM More information in binary case ASCII BINARY ffghabdgttiocb sbgghj 01100010 01101010
BBBMM extended delta 1 i– 1 T i 101 P 101 01 100 i+1
BBBMM Total size of delta 1 tables: If too large, use limit value K T P sl Size of delta 1 tables reduced to k
BBBMM Original delta 1 : increase of text pointer BBBMM delta 1 : shift size T P Mismatch not in last block Correct[sh, j]
BBBMM delta 2 T P j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Pat[j] 1 0 1 0 1 1 1 0 1 delta[j] 2 13 13 13 3 7 15 2 1
Analysis Assumption : random input Reasonable for compressed text Expected # comparisons till mismatch: Bit-wise: Blocked:
Analysis Expected # bits shifted after mismatch: Bit-wise: M Blocked: M’
Experiments English Bible (2. 5 MB) World Factbook (1. 5 MB) Text: Huffman encoded k = 8 Patterns: Random substrings of lengths 10 to 500
Experiments: Average # comparisons between shifts Bit-wise Blocked 1. 5 1. 4 1. 3 1. 2 1. 1 100 200 300 length of pattern 400 500
Experiments: Average size of shifts 100 Bit-wise Blocked 80 60 40 20 100 200 300 length of pattern 400 500
Experiments: Average # comparisons for 1000 bits Bit-wise BDM Blocked 500 400 300 200 100 200 300 length of pattern 400 500
Experiments: Time to locate first occurrence (ms) Bit-wise BDM Turbo-BDM Blocked 300 250 200 150 100 200 300 length of pattern 400 500
Summary Blocked variant of BM Faster than alternatives, Overhead 1 -10 K Extensions: ASCII, words instead of characters
Thank you !
99e77e0e13aede8ac4a65f69a88e8da6.ppt