c35d2fe5117935e81e9e6eedcedc09df.ppt
- Количество слайдов: 25
New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1 Tohoku University, Japan 2 Kyushu University, Japan
Contents Introduction New lower bounds A brief history of results on bounds Simple heuristics for generating run-rich strings Analyzing asymptotic lower bounds Discussion Conclusion and further research 2 / 23 Prague Stringology Conference 2008
runs runs: occurrence of a periodic factor non-extendable (maximal) exponent at least two primitive-rooted example: aabaabaaaacaacac 3 / 23 Prague Stringology Conference 2008 aabaabaa=(aab) period : 3 root : aab exponent: 8 3
number of runs: ρ(n) run(w) :number of runs in string w ρ(n) = max{run(w) : |w| = n } maximum number of runs in a string of length n For any string w, example run(aabaabaa)=8 n 1 2 3 4 5 6 7 8 9 10 11 12 … ρ(n) 0 1 1 2 2 3 4 5 5 6 7 8 … 4 / 23 Prague Stringology Conference 2008
c 5 n cn [Kolpakov & Kucherov ’ 99] Max Number of Runs in a String c 5 n [Rytter ’ 06] 1. 05 n 4 n 3. 48 n [Puglisi et al. ’ 08] 3 n 3. 44 n [Rytter ’ 07] 1. 6 n [Crochemore & Ilie ’ 08] 2 n 1. 00 n 0. 95 n 0. 90 n n 0 5 / 23 1. 048 n [Crochemore et al. ’ 08] 0. 927 n [Franek et al. ’ 03] [Franek & Yang ’ 06]
Our result: New lower bound We discovered a run-rich string τ τ = aababaababbabaababaababbabaababaabbabaababaababbaba ababaababaababbabaababaabbabaababaabbabaababbabaababaabb abaababaababbabaabbabaababbabaababaababaababbabaabbabaababaababbabaababaabbabaababaababbabaabbabaababbabaababaabbabaababaababbabaabbabaababaababbabaababaababbabaababaabbabaababbaba ababaababbabaababaababbabaababaabbabaababaababbabaabbaba ababaababbabaababaababbabaababaababbabaababaabbabaababba baababaababbabaababaabbabaababaababbabaababaababbabaababaababbabaababaabbabaababaababb abaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaabbabaababbabaababaababbabaababaabbabaababbabaabbabaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaababbabaabab run(τ) = 1455, | τ | = 1558 New lower bound 6 / 23 Prague Stringology Conference 2008 Known best lower bound [Franek et al. ’ 03]
How to generate run-rich string run(τ) = 1455, | τ |= 1558 Let τ’ = τ[1: 1557] (delete the last character), the number of runs not decrease drastically. run(τ’) = 1453, | τ’ |= 1557 In order to generate run-rich string, We only have to do is to append single character to run-rich string. 7 / 23 Prague Stringology Conference 2008
The search first starts with the single string “a” in the buffer. At each round, two new strings are created from each string in the buffer by appending “a” or “b” to the string. The new strings are then sorted with respect to the number of runs. Only those that fit in the buffer size are retained for the next round. aa aab aba abb 8 / 23 aaaa aaab aaba aabb abaa abab abba abbb Prague Stringology Conference 2008 aaaaab aaaba aaabb aabaa aabab aabba aabbb abaaa abaab ababa ababb abbaa abbab abbba abbbb 1 1 1 2 2 2 1 1 1 buffer size: 10 Select Top 10 aaabb aabaa aabab aabba aabbb ababb abbaa aaaaba 2 2 2 2 1 1 1
aaabb aabaa aabab aabba aabbb ababb abbaa aaaaba 2 2 2 2 1 1 1 aaabba aaabbb aabaaa aabaab aababa aababb aabbaa aabbab aabbba aabbbb ababba ababbb abbaaa abbaab aaaaaab aaaaba aaaabb aaabaa aaababb aabbaa Select aaabba Top 10 aaabbb aabaaa aababa aabbab aabbba aabbbb 3 3 3 2 2 2 2 aabaabaabb aababba aababbb aabbaaa aabbaab aaabbaa aaabbab aaabbba aaabbbb aabaaaa aabaaab aababaa aababab aabbaba aabbabb aabbbaa aabbbab aabbbba aabbbbb aabaabbabb aabaaba Select aababba Top 10 aababbb aabbaaa aabbaab aaabbaa aababaa aabbaba The string in the buffer become run-rich. 9 / 23 Prague Stringology Conference 2008 4 4 3 3 3 3
Improving lower bound of ρ(n) (1/2) We discovered a run-rich string τ such that run(τ) = 1455, | τ |= 1558 run(τ2) > 2 run(τ) run(τ2) = 2915, | τ2 |= 2・ 1558 = 3116 Improved!! 10/ 23 Prague Stringology Conference 2008
Improving lower bound of ρ(n) (2/2) Using run-rich string τ, can we push lower bounds higher up more? k 1 2 3 4 5 6 7 8 run(τk) 1455 2915 4374 5833 7292 8751 10210 11669 : |τk| ( ρ(n)≧ ) run(τk)/|τk| 1558 0. 933889 3116 0. 935494 4674 0. 935815 6232 0. 935976 7790 0. 936072 9348 0. 936136 10906 0. 936182 12464 0. 936216 : Next, we give a formula that calculate number of runs in wk. 11/ 23 Prague Stringology Conference 2008
Number of runs in wk Theorem Let w be a string of length n. For any k≧ 2, run(wk) = Ak - B where A = run(w 3) - run(w 2) and B = 2 run(w 3) - 3 run(w 2) 12/ 23 Prague Stringology Conference 2008
Proof of theorem (1/4) If two strings wk and w are concatenated, the number of runs in wk+1 is changed in two cases: case (a): increase A new run may be newly created at the border between two strings. abba 13/ 23 Prague Stringology Conference 2008
Proof of theorem (2/4) If two strings wk and w are concatenated, the number of runs in wk+1 is changed in two cases: case (b): decrease A suffix run in wk and a prefix run in w may be merged into one run in wk+1. aabaaaabaa 14/ 23 Prague Stringology Conference 2008
Proof of theorem (3/4) By periodicity lemma, there is no runs in wk such that length is longer than 2|w| except the whole string wk. For any k≧ 3, run(wk) - run(wk-1) = c (constant). w 15/ 23 w Prague Stringology Conference 2008 w w w
Proof of theorem (4/4) Theorem Let w be a string of length n. For any k≧ 2, run(wk) = Ak - B where A = run(w 3) - run(w 2) and B = 2 run(w 3) - 3 run(w 2) proof 16/ 23 For any k≧ 3, run(wk) - run(wk-1) is a constant. Prague Stringology Conference 2008
Asymptotic behavior of ρ(n) Theorem For any string w and any ε>0, there exists a positive integer N such that for any n≧N, proof 17/ 23 Prague Stringology Conference 2008
Discovered run-rich strings See our web site [http: //www. shino. ecei. tohoku. ac. jp/runs] Length of τ r(τ) r(τ2) r(τ3) ρ (n) ≧ 125 110 227 343 0. 928 1558 1455 2915 4374 0. 93645 60064 56714 113448 170181 0. 944542 105405 99541 199103 298664 0. 944557 184973 174697 349417 524136 0. 944565 We found some run-rich strings by using heuristic search. The strings in the buffer are sorted with respect to r(w 3)-r(w 2), instead of r(w) for improving asymptotic behavior. 18/ 23 Prague Stringology Conference 2008 current best lower bound
Discussion What is the class of run-rich strings? Sturmian words are not run-rich. [Rytter 2008] (for any Sturmian word w) Any recursive construction of a sequence of run-rich strings? We believe that compression a clue to understanding. has 19/ 23 run-rich string τ (|τ|=184973) can be represented by only 24 LZ factors. Prague Stringology Conference 2008
LZ-factorization of τ ( |τ| = 184973 ) τ = aababaababbabaababab… (0, 1) (1, 3) (1, 4) (2, 8) (5, 13) : LZ(τ)= a, (0, 1) / b / (1, 3) / (1, 4) / (2, 8) / (5, 13) (12, 19) / (26, 31) / (49, 38) / (50, 63) / (89, 93) / (113, 162) / (57, 317) / (249, 693) / (275, 984) / (879, 2120) / (942, 3041) / (2811, 6521) / (2999, 9374) / (8764, 20072) / (9332, 28878) / (27096, 45341) / (38210, 67195) 20/ 23 Prague Stringology Conference 2008
Conclusion We Introduced new approach for analyzing lower bounds using heuristic search. We Improved the lower bound of the number of runs in a string. new lower bound is 0. 944565. 21/ 23 Prague Stringology Conference 2008
Further research Improving heuristic algorithm Analyzing the class of run-rich strings Speed up for counting runs in strings Find good heuristics Guess run-rich strings in compressed form (LZ factors) Any recursive construction of a sequence of run-rich strings? Relation with compression Algorithms for finding all runs in strings process compressed string without decompression. 22/ 23 Prague Stringology Conference 2008
c 5 n cn [Kolpakov & Kucherov ’ 99] Max Number of Runs in a String c 5 n [Rytter ’ 06] 1. 05 n 4 n 3. 48 n [Puglisi et al. ’ 08] 3 n 3. 44 n [Rytter ’ 07] 1. 6 n [Crochemore & Ilie ’ 08] 2 n 1. 00 n 0. 95 n 0. 90 n n 0 23 / 23 1. 048 n [Crochemore et al. ’ 08] 0. 944565 n [Matsubara et al. ’ 08] 0. 927 n [Franek et al. ’ 03]
Appendix 24/ 23 Prague Stringology Conference 2008
Conjecture: ρ(n) < n 60 50 40 30 n 20 ρ(n) 10 0 0 25/ 23 10 20 30 Prague Stringology Conference 2008 40 50 60
c35d2fe5117935e81e9e6eedcedc09df.ppt