
19611584acca459b00e4bc89654bb89b.ppt
- Количество слайдов: 26
Distributed Query Processing using different Semijoin operations. Presented By: Jamal Uddin Ahamed Friday, March 12, 2004 1
Presentation Outline: 1. Overview. 2. Semijoin Operation. 3. Different semijoin operations. a. 2 way semijoin. b. Hash Semijoin. c. Domain Specific Semijoin. d. Composite semijoin. 4. References. 5. Questions and Answer. 2
1. 1 What is distributed database system? ¨ A distributed database system is characterized by the distribution of the system components of hardware , control and data. For this research, a distributed system is a collection of independent computers interconnected via point-to-point communication lines. 3
1. 2 Node Characteristics: Each computer , known as a node in the network, has a processing capability, a data storage capability, and is capable of operating autonomously in the system. Each node contains a version of a distributed DBMS. 4
1. 3 What is distributed query processing? ¨ The retrieval of data from different sites in a network is known as distributed query processing. 5
1. 4 Phases of distributed query processing with a semijoin operator. 1. Initial Local processing (Selections and Projects are processed at each site. ) 2. Semijoin processing ( A semijoin program) is derived from the remaining join operations and executed to reduce the size of the relations in a cost-effective way) 3. Final processing (all relations involved are transmitted to final site and all joins are performed there. ) 6
2. 1 Semijoin: ¨ A semijoin from Ri to Rj on attribute A can be denoted as Rj⋉ Ri. It is used to reduce the data transmission cost. Computing steps: 1) Project Ri on attribute A (Ri[A] ) and ship this projection ( a semijoin projection) from the site of Ri to the site of Rj ; 2) Reduce Rj to Rj’ by eliminating tuples where attribute A are not matching any value in Ri[A]. 7
2. 2 Example: Example (semijoin s: R 1—A R 2): R 1 A B 1 Site 2 R 1[A] Site 1 1 2 3 R 2 Ship(3) A C 4 3 7 2 5 4 8 3 6 5 9 projection reduce 3 Ship(6) Ship(2) 7 R 2 ’ qs Benefit (s) = 6 -2 = 4 Cost (s) = 3 Cost effectiveness D(s) = B(s)-C(s) >0 8
3. a. 1 Definition of 2 way semijoin. 2 -way Semijoin—an extended version of the semijoin ¨ Definition: A 2 -way semijoin (t) of Ri and Rj on attribute A can be denoted as Ri A Rj = {Ri—A Rj, Rj—A Ri } So t reduces Ri and Rj to Ri’ and Rj’ respectively. 9
3. a. 2 Properties of 2 way semijoin. ¨ ¨ Computing steps: 1) Send Ri [A] from site i to site j ; 2) Reduce Rj to Rj’ by eliminating tuples whose attribute A are not matching any of Ri [A] and at the same time partition Ri [A] to Ri [A]m (match one of Rj [A]) and Ri [A]nm(Ri [A]- Ri [A]m) ; 3) Send min(Ri [A]m , Ri [A]nm) back to site i ; 4) Reduce Ri to Ri ’ using Ri [A]m (or Ri [A]nm). Evaluation: – Benefit: B(t) = [S(Ri ) - S(Ri ’)] + [S(Rj) - S(Rj’)] – Cost: C(t) = S(Ri [A] ) + min[S(Ri [A]m ) , S( Ri [A]nm)] – If the benefit exceeds the cost (D(t) >0) then it is called a cost-effective 2 -way semioin 10
3. a. 3 2 -way semijoin example. 1 2 3 Site 1 R 1[A] Site 2 Ship(3) R 2 projection A B A C 1 4 3 7 2 5 4 8 3 6 5 9 Ship(1) 3 partition 3 reduce R 1 ’ R 1[A]m reduce R 1[A]nm 1 2 6 3 Ship(2) 7 R 2 ’ Ship(2) qs 11
3. a. 4 Semijoin Vs 2 -way semijoin. -It is an extended version of semijoin. – It has more reduction power than semijoin. – The propagation of reduction effects by the 2 way semijoin is further than by the semijoin. 12
3. b. 1 Hash-semijoin operator. Main idea : use a search filter which represents the semijoin projection with a small bit array. Definition: The hash-semijoin of Ri and Rj is denoted Rj∝ Ri. It is computed as follow: – The Semijoin projection of Ri is represented as a bit array; – Shipping this bit array to the site of Rj ; – finally, the tuples of Rj are screened by the search filter. 13
3. b. 2 hash semijoin example. R 1 S#(R 1) S# Name 1 Cindy 3 Jemal 4 Sunny 8 Maggie R 2 1 projection 3 4 8 B 1 0 H ( (R )) B 1 1 H(x)=X 0 0 0 1 ij i ij S # 2 Ship(Bij) Rj Phon e 222 3 333 4 444 5 555 6 666 3 4 reduc e 333 444 14
3. b. 3 Semijoin Vs Hash Semijoin. • Advantages: – Hash-semijoin is more cost-effective than semijoin – The search filter in the hash-semijoin achieves considerable savings in the cost of a semijoin operation • Limitation: – Only works on execution tree – Tightly related with the hash functions 15
3. c. 1 What is horizontally partitioned database We can call a distributed database system is horizontally partitioned (or fragmented) if the relations can be split horizontally into several disjoint sets of tuples, which are called horizontal fragments. 16
3. c. 2 Horizontally partitioned database system. (Example) EMP 1: 1 D-no 10 EMP E-name D-no 101 E-no johnson 01 E-name D-no 101 johnson 01 103 jordan 03 105 erving 01 109 jabbar 12 E-no E-name D-no 110 sampso n 14 109 jabbar 12 110 sampson 14 141 chang 16 EMP 2: 11 D-no 20 17
3. c. 3 Horizontally partitioned database system. (Properties) ¨ A fragmented relation Ri can be constructed by performing a union operation on all its fragment. Ri = Uk Rik ¨ There is commutative rule between the binary operations join and union for fragmented relations: a join between two fragmented relation R 1 and R 2 is equivalent to a union over the joins between each fragment of R 1 and each fragment of R 2. Mathematically: (U R 1 k)[A=B] (U R 2 m)= U(R 1 k[A=B] R 2 m) k m k. m 18
3. c. 4 Why can’t we use regular semjoin between two fragment to reduce the size of fragments? (Continue) We consider a joint Ri[A=B] Rj between two fragmented relations Ri and Rj. We want to reduce the size of Rik, a fragment of Ri , by semijoin before it is sent to the final processing site. We cannot perform the semijoin Rik A=B] Rjm between Rik and any fragment Rjm of Rj without considering the other fragment Rjm of Rj , because the join operation dictates that no tuple of a relation can be eliminate before it is compare with all tupls of the other joining relation which may be contribute to the join. 19
Example: sal: 101 E-no 105 EMP 1: 1 D-no 10 E-no Sal D-no E-name D-no 101 1000 12 101 johnson 01 102 2000 03 Dno 103 jordan 03 105 3000 11 01 135 erving 01 03 EMP 2: 11 D-no 20 12 E-no E-name D-no 14 109 jabbar 12 110 sampson 14 141 chang 16 sal: 105 E-no 110 E-no Sal D-no 107 1000 12 107 2000 03 110 3000 11 16 20
3. c. 5 Definition of Domain Specific Semijoin. The domain-specific semijoin operation, Rik( A=B] Rjm, where A and B are the joining attributes and Rik, Rjm are two fragments of the joining relation Ri and Rj respectively, is defined as follows: Rik( A=B] Rjm ={r|r Rik ; r. A Rjm [B] U(Dom[Rj. B]Dom[Rjm. B])} Where Rik is the restricted fragment and Rjm is the restricting fragment. We also called Ri the restricted relation and Rj is the restricting relation of the domain-specific semijoin. 21
3. d. 1 Definition of Composite Semijoin. ¨ Composite Semijoin: a semijoin in which the projection and the transimssion involve multiple columns (attrs). 22
3. d. 2 Example of Composite Semijoin. R 2 R 1 A 1 1 1 2 3 A 2 Non-join Attr aa bb cc cc - No False loop!! A 1 1 1 2 3 A 2 Non-join Attr cc aa bb bb - A 1 A 2 Non-join Attr 1 aa 23
3. d. 3 Semijoin Vs Composite Semijoin. ¨ Composite semijoins in a query processing algorithm is likely to result in substantial RT reduction. ¨ Composite semijoins should not always be used. If it results greater RT, ignore it. ¨ Strategy with composite semijoins is at least as good as that without composite semijoins. 24
References: 1. 2. 3. 4. Using 2 -way semijoin in distributed query processing. By Hyunchul Kang and Nick Roussopoulos. Improving distributed query processing by hash-semijoins. By Judy Tseng and Arbee Chen. Domain Specific Semijoin: A new operation for distributed query processing. By Jason Chen and Victor Li. Composite Semijoin in distributed query processing. By William Perrizio and Chun Chen 25
Comments & Questions? ? Thank You! 26
19611584acca459b00e4bc89654bb89b.ppt