Optimizing Access Cost for Top-k Queries over Web

Зарегистрируйтесь, чтобы просмотреть полный документ!

Optimizing Access Cost for Top-k Queries over Web Sources: A Unified Cost-based Approach Seung-won Hwang, Kevin Chen-Chuan Chang The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign

Problem: Web “Middleware” Top-k Query Processing k S 1: dineme. com S 2: superpages. com p 1: rating p 2: close Middleware top-k Algorithm v 1: F[v 1] …. . . vk: F[vk] F=min(p 1, p 2) To evaluate each predicate pi source Si provides: • sorted access e. g. , returning the restaurant with the next highest rating • random access for each object uj e. g. , returning the rating for a specific restaurant uj AIM 2

Goal: Minimizing Access Cost § Various cost scenarios dineme. com s 1 =32 ms r 1 =700 ms p 1: rating s 2 =344 ms r =1400 ms superpages. com 2 p 2: close s 1, s 2, s 3 =44 ms r 1, r 2, r 3 =0 ms p 1: rating hotels. com p 2: close p 3: cheap § Cost model: Aggregate cost of all predicate accesses § Goal: Minimizing the access cost AIM 3

Beyond State-of-the-art: How to be General and Adaptive? Current state-of-the-art: Fixed algorithms for a specific scenario Sorted Access s =1 (cheap) Random Access r =1 (cheap) r=h (expensive) r=¥ (impossible) FA, TA, Quick. Combine CA, SR-Combine NRA, Stream. Combine FA, TA, Quick. Combine NRA, Stream. Combine s=h (expensive) s=¥ (impossible) AIM TAz, MPro, Upper 4

Solution: A Cost-based Approach n Cost-based optimization: Finding optimal algorithm, with minimum cost, from a space n General across a wide range of scenarios q n One “algorithm” for all Adaptive to the specific one at run time q Truly optimal (in principle) AIM 5

Challenges: Enabling Cost-based Optimization § § Challenge #1: Defining algorithm space Analogy: SQL queries are composed of logical operators to schedule into a query plan. What are such “logical tasks” for top-k queries, as a building block of algorithm space? Challenge #2: Searching for Mopt Analogy: SQL queries are optimized with systematic heuristics (e. g. , left-deep joins) and search schemes (e. g. , dynamic programming) What are efficient search schemes for top-k queries? AIM 6

Challenge #1: Defining Algorithm Space n n n Basis: View of logical tasks For every object ui, any algorithm must satisfy logical task w i: q If ui is top-k: wi must compute the exact score; q Otherwise: wi must indicate (by some partial scores) that score will be less than lowest-topk-score How to define an algorithm space? How to identify unsatisfied tasks? AIM 7

Challenge #2: Searching for Mopt n Space reduction: By “systematic” heuristics q q n S-then-R: For each predicate pi, perform sorted accesses first to depths di, before any random accesses Global schedule: For each object, follow the same schedule H for random accesses of any object Cost estimation: q Sampling (getting “statistics”): Sample a representative subset from DB q Simulation (getting overall costs) Simulate query plans on sample to estimate their costs n Dynamic search over different query plans q Hill-climbing and query-driven strategies AIM 8

Contribution: Unification and Contrast Unification: For symmetric function, e. g. , avg(p 1, p 2), framework NC behaves similarly to TA cost depth into p 2 T depth into p 2 N Contrast: For asymmetric function, e. g. , min(p 1, p 2), NC adapts with different behaviors and outperforms TA cost N T N depth into p 1 AIM depth into p 1 9

Contribution: Generality and Adaptivity n Over 1000 random configurations, q For unstudied scenarios (74%), NC generalizes with significantly better performances q For existing scenarios (26%), NC adapts to similar behaviors to specific algorithms existing scenarios unstudied scenarios AIM 10

Conclusion: Summary For a general and adaptive optimization of top-k queries, we developed: § Key insight: Abstracting top-k query as a task scheduling problem § Algorithm space for top-k queries: Defining an algorithm space considering only those scheduling unsatisfied tasks § Dynamic search schemes: Identifying efficient search schemes for top-k queries AIM 11

Thank You! For more information: The AIM Project: http: //aim. cs. uiuc. edu AIM 12

Скачать презентацию Optimizing Access Cost for Top-k Queries over Web

f9542e342f62322c657d0aa1bdb64031.ppt

Количество слайдов: 12