e75ee557e3a747d6d8283dc666aa0afb.ppt
- Количество слайдов: 29
Supporting Top-k join Queries in Relational Databases By: Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539) Deepak Anand (1000603813)
AGENDA Introduction Motivation Requirements Overview of Ripple Join Rank Join Algorithm Example HRJN* Performance Conclusion References
Introduction What are top-k queries ? Ranking queries which order results on some computed score What are top-k join queries ? Typically these queries involve joins Usually, users are interested in the top-k join results.
Introduction. . Searches are performed using multiple features Every feature produces a different ranking for the query Join and aggregate the individual feature rankings for a global ranking. Answer to a Top-k join query is an ordered set of join results according to some provided function that combines the orders on each input
Example Find location for a house such that the combination of the cost of the house and 5 years tuition at a nearby school is minimal.
Motivation Existing join operators decouple join and sorting (ranking) of results. Sorting is expensive and is a blocking operation. Sort-merge joins (MGJN) only preserves order of joined column data Nested-loop joins (NLJN) only orders on the outer relations are preserved through the join Hash join (HSJN) doesn’t preserve order if hash tables do not fit in memory
Example Ranking Query SELECT A. 1, B. 2 FROM A, B, C WHERE A. 1 = B. 1 and B. 2 = C. 2 ORDER BY (0. 3*A. 1 + 0. 7*B. 2) STOP AFTER 5
Requirements • Perform basic join operation • Conform with current query operator interface • Use the orderings of its inputs • Produce top ranked join results ASAP • Adapt quickly to input fluctuations
Overview of Ripple Join JOIN : L. A = R. A, L and R are descending, ordered by B We get a tuple from L and a tuple from R (L 1(1, 1, 5) R 1(1, 3, 5)) No valid join result L R
Ripple Join. . JOIN : L. A = R. A, L and R are descending, ordered by B We get a second tuple from L and a second tuple from R and join with prior tuples, creating all possible combinations L (L 2, R 2) {(2, 2, 4), (2, 1, 4)} R
Ripple Join. . JOIN : L. A = R. A, L and R are descending, ordered by B (L 2, R 2) {(2, 2, 4), (2, 1, 4)} is an invalid join result! (L 2, R 1) {2, 2, 4), (1, 3, 5)} is an invalid join result! L (L 1, R 2) {(1, 1, 5), (2, 1, 4)} is a valid join result! R
Variations of the Ripple Join • Rectangular version – obtain tuples from one source at a higher rate than from the other source • Block Ripple Join– obtain data, b tuples at a time, for classic ripple join b = 1 • Hash Ripple Join –in memory, maintain hash tables of the samples obtained so far
Rank-Join Algorithm 1. Generate new valid join combinations 2. Compute score for each valid combination 3. For each incoming input, calculate threshold value T 4. Store top k in priority queue Lk 5. Halt when lowest value of queue, scorek ≥ T
A Rank-Join Algorithm Example Select * From L, R Where L. A = R. A Join Condition Order By L. B + R. B Stop After 2 Initial Input: Score (1). Get a valid join combination using some join strategy Ripple Select (L 1, R 1) => Not a valid join result Next input: (1). Get a valid join combination using some join strategy Ripple Select (L 2, R 2), (L 2, R 1), (L 1, R 2) => (L 1, R 2) is a valid join result k
Example Continued. . Select * From L, R Where L. A = R. A Order By L. B + R. B Stop After 2 (2). Compute the score (J) for the result J 1(L 1, R 2) => L. B + R. B = 5 + 4 = 9 (3). Compute a threshold score T = Max ( Last L. B + First R. B, First L. B + Last R. B ) For Ripple Selection (L 2, R 2) => T = Max ( L 2. B + R 1. B, L 1. B + R 2. B ) = Max (4+5, 5+4) = 9
Example Continued. . Select * From L, R Where L. A = R. A Order By L. B + R. B Stop After 2 J 1 = 9 T=9 (4). J 1 >= T, so report J 1 in top-k results (i. e. add it to list Lk ) Since we need top 2 (k=2), continue until k=2 and Min(J 1, J 2, …Jk) >= T
Example Continued. . Select * From L, R Where L. A = R. A Order By L. B + R. B Stop After 2 Next input: 1). Get a valid join combination using some join strategy Ripple Select (L 3, R 3), (L 3, R 1), (L 3, R 2), (L 1, R 3), (L 2, R 3) => (L 3, R 3), (L 2, R 3) are valid join results (2). Compute the scores (J) for the results J 2(L 2, R 3) = 4 + 3 = 7 J 3(L 3, R 3) = 3 + 3= 6
Example Continued. . Select * From L, R Where L. A = R. A Order By L. B + R. B Stop After 2 (3). Calculate a NEW threshold T T = Max ( Last L. B + First R. B, First L. B + Last R. B ) = Max ( L 3. B + R 1. B , L 1. B + R 3. B ) = Max(3 + 5, 5 + 3) =8
Example Continued. . Select * From L, R Where L. A = R. A Order By L. B + R. B Stop After 2 T=8 J 1(L 1, R 2) = 9 reported J 2( L 2, R 3) = 7 J 3(L 3, R 3) = 6 Note, J’s are in descending order (4). Min (J) = 6 < T so continue
Example Continued. . Select * From L, R Where L. A = R. A Order By L. B + R. B Stop After 2 Next input: (1). Get a valid join combination using some join strategy Ripple Select ( L 4, R 4) => (L 4, R 1), (L 2, R 4), (L 3, R 4) are valid join results (2). Compute the scores (J) for the results J(L 4, R 4) = 7, J(L 2, R 4) = 6, J(L 3, R 4) = 5 (3). Calculate a NEW threshold T T = Max( L 4. B+R 1. B, L 1. B + R 4. B ) = Max( 7, 7 ) = 7
Example Continued. . Select * From L, R Where L. A = R. A Order By L. B + R. B Stop After 2 T= 7 J 1(L 1, R 2) = 9, J 2(L 2, R 3) = 7, J 3(L 4, R 1) = 7, J 3(L 3, R 3) = 6, J 4(L 2, R 4) = 6, J 5(L 3, R 4) = 5 (4). Min(J 1, J 2) = 7 >= T (k = 2), so report J 2 and STOP
Rank-Join Continued … Join strategy is very crucial Recommended strategy = Ripple Join Alternates between tuples of the two relations Flexible in the way it sweeps out (rectangular, etc) Retains ordering in considering samples Variant of Rank-Join Hash Rank Join (HRJN) Block Ripple Join
Hash Rank Join (HRJN) Operator Built on idea of hash ripple join Inputs are stored in two hash tables Maintains highest (first) and lowest (last selected) objects from each relation Results are added to a priority queue Advantages: Smaller space requirement Can be pipelined
HRJN* • Score-Guided Strategy Consider L = 100, 50, 25, 10…. For 3 tuples from each input, T = max(108, 35) = 108 R= 10, 9, 8, 5. . T 1= 108 , T 2 =35 For 4 tuples from R and 2 tuples from L, T = max(105, 60) = 105 T 1= 105 , T 2 =60 • If T 1 > T 2, more inputs should be taken from R and reduce T 1. • Therefore value of Threshold T will reduce => faster reporting of ranked join results.
Performance
Performance
Conclusion Integrates well with query plans and supports top – k queries Produces results as fast as possible Minimizes space requirements Eliminates needs for sorting after join
References “Supporting top-k join queries in relational databases” - Ihab Ilyas, Walid Aref, Ahmed Elmagarmid (2004) Jing Chen : CSE 6392 Spring 2005, CSE-UT Arlington http: //ranger. uta. edu/~gdas/Courses/ Spring 2005/DBIR/slides/topk_join. ppt Zubin Joseph : CSE 6392 Spring 2006, CSE-UT Arlington http: //crystal. uta. edu/~gdas/Courses/Spring 2006/DBExplo ration/Zubin_Supporting_Top_k_join_Queries. ppt
TIME TO ASK QUESTIONS
e75ee557e3a747d6d8283dc666aa0afb.ppt