
b6d2582866daa2ced3f518abd31aea02.ppt
- Количество слайдов: 20
On the Uselfulness of Backspace Shmuel Tomi Klein Dana Shapira Bar Ilan University Ashkelon Academic College
Extension of study of in large IR systems United -Nations NEGATION Edgar (-1: 2) Po Backspace Not really a character, but can be useful
Three applications Handling large numbers Text compression in IR Blockwise Huffman decoding
Handling large numbers 1 Syntax: A (1: 3) -B (1: 5) C In use at the -D (1: 1) E Responsa Project
1 Handling large numbers Too many large numbers Break in blocks of k digits 1234567 1234 567 Problem with precision: 5678 also retrieves 123456789
Handling large numbers 1 Each word includes a trailing blank House of Lords Long numbers use Backspace BS 1234567890 1234 BS 5678 BS 90 I declared an income of 1000000 on my last 10 1040 forms I declared an income of 1000 BS 000 on my last 10 1040 forms 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Handling large numbers 1 To search for submit query 234 -BS 234 2000 1040 -BS 12345678 -BS 1234 BS 5678 -BS 1234567 -BS 1234 BS 567 user@addr. com user @ BS addr. BS com
Text Compression in IR 2 Huffword: alternating words and non-words Use single Huffman tree for: — words including a trailing blank — punctuation signs: BS ; — Backspace, to handle exceptions
Text Compression in IR 2 File Size Huffword BSHuff gzip bzip English 3. 1 MB 3. 91 3. 97 3. 28 4. 41 French 7. 1 MB 3. 98 4. 03 3. 27 4. 63
3 Given Blockwise Huffman decoding with find Alphabet probabilities A B C D E 0. 4 0. 3 0. 1 such that lengths 0 11 1000 1001 0 HUFFMAN 0 0 D is minimized 1 A 0 1 2 3 4 4 1 1 1 E C average length B
Decoding k bits together Partial decoding tables 0 0 1 Table 1 0 A 1 2 0 1 3 0 D 1 E C Entry Pattern Decoding 0 1 001 0 0 1 A A Rem 1 6 1110 11 10 B Rem 3 3 100011 B 1000 11 D B
Decoding k bits together Partial decoding tables Pattern Table 0 Table 1 0 for l W l Table 0 W Table 2 W Table 3 l W l 0 000 AAA 1 D 0 0 DAA 0 1 AA 1 E 0 D 1 DA 1 2 001 A 010 2 CA 0 EA 0 D 2 3 011 A 2 AB 1 DB 0 4 - 0 EAA 0 5 100 3 101 6 110 BA 7 111 B E D Prefix: 0 CB 1 E 3 BAA 0 CAA C 0 B 1 EA 1 2 C 2 E 2 1 BB Λ 1 CA 0 CB 0 EB 0 1 10 100
Pattern for Table 0 Decoding Algorithm Table 0 W l Table 1 W l Table 2 W l Table 3 W l D E CA 0 0 0 DA D EA 0 1 0 DAA DA D 0 1 2 1 0 E CAA 0 0 1 2 CA C 1 DB 0 EAA 1 EA 0 CB 2 E 0 EB AAA AA A 0 1 2 3 4 011 100 AB - 101 110 C BA 0 C 3 BAA 0 BA 7 to EOI 000 001 010 5 6 for 0 1 2 111 B 0 B 1 BB (output , j) ← T( j , M [ f ; f + k – 1] ) A B C D E 0 11 1000 1001 j 0 100 output 3 101 1 110 2 000 0 101 EA B DA C 1 2 0
0 Looking for new tradeoffs 1 A 2 B Reduced Partial decoding tables including backspaces 3 C D E
Table 0 Table 3 Pattern for Table 0 W l b Revised Decoding Algorithm 0 001 AA 0 1 DA 0 1 2 010 A 0 2 D 0 2 011 AB 0 0 DB 0 0 4 100 - 3 0 EAA 0 0 5 101 C 0 0 EA 0 1 6 to EOI 0 0 DAA 3 for AAA 1 Reduced tables 000 110 BA 0 0 E 0 2 7 111 B 0 1 EB 0 0 (output , j , back ) ← T( j , M [ f ; f + k – 1] ) A B C D E – back 0 11 1000 100101110000101 - EA B - DA C
Regular Huffman Partial decoding tables Reduced tables with backspace E A B D A C 1 0 0 1 1 1 0 0 1 0 1 - EA B B DA - C DA C
Experimental results Bit WSJ partial reduced decode tables k bpa 1 1 8 8 8 6. 4 MB/ sec 6. 6 - 7. 6 RAM 2. 1 197 34. 1
Experimental results Bit KJV k bpa 1 1 partial reduced decode tables 8 8 8 6. 4 MB/ sec 10. 1 0. 4 13. 7 RAM 0. 21 8. 7 17
Conclusion 3 examples of IR applications Use of conceptual elements, like backspaces, may improve algorithms.
Thank you ! Questions?
b6d2582866daa2ced3f518abd31aea02.ppt