Скачать презентацию On Top-n Reverse Top-k Queries Variants Algorithms and Скачать презентацию On Top-n Reverse Top-k Queries Variants Algorithms and

5d4604a6f5cbf12f5c4ce7177d0e8489.ppt

  • Количество слайдов: 45

On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L. P. Chen On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L. P. Chen National Chengchi University 9/21/2012 at NCHU

IEEE International Conference on Data Engineering (ICDE) • A premium international conference on databases IEEE International Conference on Data Engineering (ICDE) • A premium international conference on databases • Inaugural conference held at Los Angeles in 1984 • Held in Taiwan in 1995

ICDE 2012 Research Papers Distribution • System Aspects – Privacy and Security 8% – ICDE 2012 Research Papers Distribution • System Aspects – Privacy and Security 8% – Storage Management and Performance 7% – Entity resolution/Versioning 7% – Query Processing 31% • • • Top-k query 9% Distributed/parallel/map-reduce 8% Location-aware 5% Execution Plan 5% Graph indexing 4%

 • Text/Web/Keyword Search 19% • Stream/Trajectory/Sequence/Spatio-Temporal 10% • Social Media 7% • Uncertain • Text/Web/Keyword Search 19% • Stream/Trajectory/Sequence/Spatio-Temporal 10% • Social Media 7% • Uncertain Database 6% • Data Mining 5%

Efficient Dual-Resolution Layer Indexing for Top-k Queries, ICDE 2012 H 1 H 2 price Efficient Dual-Resolution Layer Indexing for Top-k Queries, ICDE 2012 H 1 H 2 price 0. 55 0. 45 distance to the airport 0. 4 0. 6 service 0. 5 0. 4 H 9 0. 5 H 7 0. 1 H 2 H 6 H 8 H 4 H 9 H 5 H 3

(price, distance to the airport) (0. 45, (0. 55, 0. 6) H 2 0. (price, distance to the airport) (0. 45, (0. 55, 0. 6) H 2 0. 4) H 0. 475 1 (0. 6, 0. 2) 0. 4 H 7 0. 525 (0. 55, 0. 3) H 6 0. 425 (0. 7, 0. 4) 0. 55 H 8 (0. 3, 0. 6) 0. 45 H 4 (0. 5, 0. 5) 0. 5 H 9 (0. 2, 0. 7) 0. 45 H 5 (0. 3, 0. 7) 0. 5 H 3

(price, distance to the airport) Hotel H 7 H 6 (0. 6, 0. 2) (price, distance to the airport) Hotel H 7 H 6 (0. 6, 0. 2) 0. 4 H 7 (0. 55, 0. 3) H 6 0. 425 (0. 55, 0. 4) H 0. 475 1 H 4 H 5 (0. 3, 0. 6) 0. 45 H 1 H 4 (0. 2, 0. 7) 0. 45 H 5

Answering Why-not Questions on Top-k Queries, ICDE 2012 (Cleanliness, delicious, Parking spaces) • Top-k Answering Why-not Questions on Top-k Queries, ICDE 2012 (Cleanliness, delicious, Parking spaces) • Top-k query Top-2(0. 4, 0. 5, 0. 1) p 3 (50, 90, 60) 71 p 2 (70, 20, 30) 41 p 4 (75, 70, 50) 70 p 6 (58, 20, 30) 36. 2 p 1 (95, 80, 40) 82 p 5 (85, 60) 69

(Cleanliness, delicious, Parking spaces) p 1 (95, 80, 40) 82 p 2 83. 5 (Cleanliness, delicious, Parking spaces) p 1 (95, 80, 40) 82 p 2 83. 5 (70, 20, 30) 41 p 3 46 p 5 (50, 90, 60) 71 (85, 60) 67 p 4 69 71. 7 (75, 70, 50) • Why-not question Should I revise my Why p 5 change my Should I is not in my Top-2(0. 5, 0. 4, 0. 1) p 5 does not exist? query to look for top-2 query list? weights? top-5 hotels? p 6 (58, 20, 30) 36. 2 40 70 70. 5

The Min-dist Location Selection Query, ICDE 2012 c 1 c 6 c 2 Nearest The Min-dist Location Selection Query, ICDE 2012 c 1 c 6 c 2 Nearest facility distance c 3 f 1 p 1 Minimize Nearest facility distance f 2 p 2 c 5 c 7 c 8 c 4

c 1 c 6 c 2 Nearest facility distance c 3 f 1 p c 1 c 6 c 2 Nearest facility distance c 3 f 1 p 1 f 2 c 5 c 7 c 8 c 4

c 1 c 6 c 2 Nearest facility distance c 3 f 1 c c 1 c 6 c 2 Nearest facility distance c 3 f 1 c 5 f 2 c 7 p 2 c 8 c 4

Introduction • k. NN (k-Nearest Neighbors) Queries Assume k = 3 a b k. Introduction • k. NN (k-Nearest Neighbors) Queries Assume k = 3 a b k. NN(q) = {a, b, c} q c 13

Introduction • Rk. NN (Reverse k-Nearest Neighbors) Queries Assume k = 3 Rk. NN(q) Introduction • Rk. NN (Reverse k-Nearest Neighbors) Queries Assume k = 3 Rk. NN(q) = {a, …} a d q d 14

Introduction • BRk. NN (Bi-chromatic Reverse k-Nearest Neighbors) Queries Two types of data Assume Introduction • BRk. NN (Bi-chromatic Reverse k-Nearest Neighbors) Queries Two types of data Assume k = 3 a BRk. NN(q) = {a, …} d q d 15

Application I shop customer Which location is the best? Application I shop customer Which location is the best?

Top-n Reverse k. NN Queries Given two types of data G (goal) and C Top-n Reverse k. NN Queries Given two types of data G (goal) and C (condition) G : C: g 3 g 2 g 1 Retrieve n data points from G, which have the largest BRk. NN values Example: n=2, k=2 BR 2 NN value of g 1 = 4 BR 2 NN value of g 2 = 9 BR 2 NN value of g 3 = 5 BR 2 Top-2 = {g 2, g 3}

Voronoi Diagram of G : goal point (VD-node) : condition point 18 Voronoi Diagram of G : goal point (VD-node) : condition point 18

A Filter-Refinement Framework for Solving BRk. NN Queries Assume k = 2 Lower-bound region A Filter-Refinement Framework for Solving BRk. NN Queries Assume k = 2 Lower-bound region of VDi (layer 0) Upper-bound region of VDi (layer 0 ~ layer (k-1)) Layer 1 VDi Layer 0 Layer 1 19

Filter phase Assume k = 2 VDi Construct bisectors layer by layer to reduce Filter phase Assume k = 2 VDi Construct bisectors layer by layer to reduce the region 20

Refinement Phase Assume k = 2 For a data point p, we want to Refinement Phase Assume k = 2 For a data point p, we want to check VDs at layer 1 ~ layer 2 to make sure whether VDi is one of the 2 NN of p p VDi 21

Refinement Phase Assume k = 2 p VDi 0. 9 >1. 2 2. 1 Refinement Phase Assume k = 2 p VDi 0. 9 >1. 2 2. 1 VD 30 VDi: (VD 13, 1. 2) (VD 26, 1. 4) (VD 27, 1. 7) (VD 3, 1. 7) (VD 4, 1. 8) (VD 30, 2. 1) (VD 5, 2. 5) dist(p, VD 30) > 1. 2 … (VD 7, 4. 8) 22

Refinement Phase Assume k = 2 p VDi 0. 9 >1. 2 2. 1 Refinement Phase Assume k = 2 p VDi 0. 9 >1. 2 2. 1 VD 30 VDi: (VD 13, 1. 2) (VD 26, 1. 4) (VD 27, 1. 7) (VD 3, 1. 7) (VD 4, 1. 8) (VD 30, 2. 1) (VD 5, 2. 5) dist(VDi, VDj) > 2 dist(VDi, p) … (VD 7, 4. 8) 23

Application II Maximum Coverage BRk. NN Queries Retrieve 2 points from dataset G Assume Application II Maximum Coverage BRk. NN Queries Retrieve 2 points from dataset G Assume k = 2 24

BRk. NN value = 9 25 BRk. NN value = 9 25

BRk. NN value = 8 26 BRk. NN value = 8 26

total = 12 27 total = 12 27

total = 14 28 total = 14 28

Maximum Coverage BRk. NN Queries • Given: – A set of goal points (G) Maximum Coverage BRk. NN Queries • Given: – A set of goal points (G) – A set of condition points (C) – k: the k value of BRk. NN C G • Goal: – Find n points from G, g 1, g 2, …, gn, which maximize |∪i=1~n. BRk. NN(gi, G, C)| 29

Application III • Find n Most Favorite Products based on Reverse Topk Queries Application III • Find n Most Favorite Products based on Reverse Topk Queries

Airlines Hotels Location Comfort Cleanness Airline Fare Food a 1 0. 8 0. 2 Airlines Hotels Location Comfort Cleanness Airline Fare Food a 1 0. 8 0. 2 h 1 0. 4 0. 6 0. 4 a 2 0. 6 0. 4 h 2 0. 4 0. 6 a 3 0. 4 1 h 3 0. 4 0. 8 0. 2 a 4 0. 8 h 4 0. 6 0. 2 a 5 0. 4 0. 6 h 5 0. 6 0. 8 0. 4 h 6 1 0. 2 0. 6 All candidate packages Package Fare Food Location Comfort Cleanness (a 1, h 1) 0. 8 0. 2 0. 4 0. 6 0. 4 (a 1, h 2) 0. 8 0. 2 0. 4 0. 6 (a 1, h 3) 0. 8 0. 2 0. 4 0. 8 0. 2 … (a 5, h 5) 0. 4 0. 6 0. 8 0. 4 (a 5, h 6) 0. 4 0. 6 1 0. 2 0. 6 Which are the most favorite packages? 31

Top-k Queries (Customer’s View) All candidate packages Package Fare Food Location Comfort Cleanness (a Top-k Queries (Customer’s View) All candidate packages Package Fare Food Location Comfort Cleanness (a 1, h 1) 0. 8 0. 2 0. 4 0. 6 0. 4 (a 1, h 2) 0. 8 0. 2 0. 4 0. 6 (a 1, h 3) 0. 8 0. 2 0. 4 0. 8 0. 2 … (a 5, h 5) 0. 4 0. 6 0. 8 0. 4 (a 5, h 6) 0. 4 0. 6 1 0. 2 0. 6 C 1(a 1, h 1): 0. 8 0+0. 2+0. 4 0. 5+0. 6 0. 1+0. 4 0. 2 =0. 38 (a 1, h 2): 0. 8 0+0. 2+0. 4 0. 5+0. 6 0. 1+0. 6 0. 2 =0. 42 … C 2(a 1, h 1): 0. 8 0. 1+0. 2 0. 3+0. 4 0. 1+0. 6 0. 3+0. 4 0. 2 =0. 44 (a 1, h 2): 0. 8 0. 1+0. 2 0. 3+0. 4 0. 1+0. 6 0. 3+0. 6 0. 2 =0. 48 … Customer preferences Customer Fare Food Location Comfort Cleanness Top-2 favorites c 1 0 0 0. 2 0. 5 0. 1 0. 2 {(a 3, h 6), (a 5, h 6)} c 2 0. 1 0. 3 0. 2 {(a 3, h 2), (a 3, h 5)} c 3 0 0 0. 1 0. 3 {(a 1, h 2), (a 1, h 5)} c 4 c 5 0. 3 0 0 0. 1 0. 2 0. 3 0 0 0. 1 0. 6 {(a 1, h 5), (a 2, h 5), (a 3, h 5)} {(a 3, h 6), (a 4, h 6)} 32

Reverse Top-k Queries (Travel Agency’s View) All candidate packages Package Fare Food Location Comfort Reverse Top-k Queries (Travel Agency’s View) All candidate packages Package Fare Food Location Comfort Cleanness (a 1, h 1) 0. 8 0. 2 0. 4 0. 6 0. 4 (a 1, h 2) 0. 8 0. 2 0. 4 0. 6 (a 1, h 3) 0. 8 0. 2 0. 4 0. 8 0. 2 … (a 5, h 5) 0. 4 0. 6 0. 8 0. 4 (a 5, h 6) 0. 4 0. 6 1 0. 2 0. 6 Customer preferences Customer Location Comfort Cleanness Retrieve the customers whose top -2 favorites contain (a 1, h 2) {c 3} #customers in the reverse top-k query for a product is a good estimate of the favoring degree of the product in the market Fare Food Top-2 favorites c 1 0 0. 2 0. 5 0. 1 0. 2 {(a 3, h 6), (a 5, h 6)} c 2 0. 1 0. 3 0. 2 {(a 3, h 2), (a 3, h 5)} c 3 0 0. 1 0. 3 {(a 1, h 2), (a 1, h 5)} c 4 0. 3 0. 1 0. 2 0. 3 0. 1 {(a 1, h 5), (a 2, h 5), (a 3, h 5)} c 5 0 0. 1 0. 3 0 0. 6 {(a 3, h 6), (a 4, h 6)} 33

All candidate packages Package Fare Food Location Comfort Cleanness (a 1, h 1) 0. All candidate packages Package Fare Food Location Comfort Cleanness (a 1, h 1) 0. 8 0. 2 0. 4 0. 6 0. 4 (a 1, h 2) 0. 8 0. 2 0. 4 0. 6 0. 8 0. 4 1 0. 2 0. 6 … (a 1, h 5) 0. 8 0. 2 0. 6 … (a 3, h 6) 0. 4 1 … (a 5, h 6) 0. 4 0. 6 k (#packages considered by customers) = 2 n (#packages to be offered by the travel agency) = 2 (a 1, (a 2, (a 3, (a 4, (a 5, h 2): h 5): h 6): {c 3} {c 3, c 4} {c 2} {c 2, c 4} {c 1, c 5} {c 1} Customer preferences Customer Fare Food Location Comfort Cleanness Top-2 favorites c 1 0 0. 2 0. 5 0. 1 0. 2 {(a 3, h 6), (a 5, h 6)} c 2 0. 1 0. 3 0. 2 {(a 3, h 2), (a 3, h 5)} c 3 0 0. 1 0. 3 {(a 1, h 2), (a 1, h 5)} c 4 0. 3 0. 1 0. 2 0. 3 0. 1 {(a 1, h 5), (a 2, h 5), (a 3, h 5)} c 5 0 0. 1 0. 3 0 0. 6 {(a 3, h 6), (a 4, h 6)} 34

Problem Definition of n-k MFP • Given a set of component tables T 1, Problem Definition of n-k MFP • Given a set of component tables T 1, T 2, …, and Tx, which form a set of the candidate products P, a set of customers C with different preferences on the products, and two positive integers k and n • RTOPk(cp, P, C): the set of the customers whose top-k favorites contain the candidate product cp • Retrieve the minimum subset P’ of P such that |P’| n and is maximized • Maximum coverage problem: NP-hard 35

Skyline • An object p is said to dominate another object q if and Skyline • An object p is said to dominate another object q if and only if p is larger than or equal to q on all dimensions and p is larger than q on at least one dimension • Given a set of multi-dimensional objects, the skyline consists of the objects which are not dominated by any other object A 2 0 A 1 36

Property 1 • Only the component tuples dominated by at most (k-1) other tuples Property 1 • Only the component tuples dominated by at most (k-1) other tuples in the same component table have the possibility of being a part of a top-k product for a customer c Airlines Fare Hotel Food h 1 … a 3 0. 4 Comfort 0. 4 0. 6 Cleanness 0. 4 0. 8 a 5 Location 1 a 4 Hotels 0. 6 Package … Fare Food Location Comfort Cleanness (a 3, h 1) 0. 4 1 0. 4 0. 6 0. 4 (a 4, h 1) 0. 4 0. 8 0. 4 0. 6 0. 4 (a 5, h 1) 0. 4 0. 6 0. 4 37

Reduce component tables Airlines Hotels Location Comfort Cleanness Fare Food a 1(0) 0. 8 Reduce component tables Airlines Hotels Location Comfort Cleanness Fare Food a 1(0) 0. 8 0. 2 h 1(2) 2(0) 0. 4 0. 6 0. 4 a 2(0) 0. 6 0. 4 h 3(1) 2(0) 0. 4 0. 8 0. 6 0. 2 0. 6 a 3(0) 0. 4 1 h 3(1) 4 0. 6 0. 8 0. 2 a 4(1) 0. 4 0. 8 h 4(1) 5(0) 0. 6 0. 8 0. 6 0. 4 0. 2 a 5(2) 0. 4 0. 6 h 6(0) 5 1 0. 6 0. 2 0. 8 0. 6 0. 4 h 6(0) 1 0. 2 0. 6 38

Property 2 • For any two candidate products cp 1 and cp 2 in Property 2 • For any two candidate products cp 1 and cp 2 in P, if cp 1 dominates cp 2, RTOPk(cp 2, P, C) RTOPk(cp 1, P, C) • For any candidate product cp in P, if cp Skyline(P), cp n-k MFP A 2 The candidate products in the n-k MFP must be in Skyline(P) 0 A 1 39

Property 2 (cont. ) • : the set of candidate products generated from Skyline(T Property 2 (cont. ) • : the set of candidate products generated from Skyline(T 1), Skyline(T 2), …, and Skyline(Tx) • A candidate product cp Skyline(P) if and only if cp [VLDB’ 09] • Only the skyline tuples of each component table have the possibility of being a part of a candidate product in the n-k MFP Airlines Hotels Fare Food a 1(0) 0. 8 0. 2 h 2(0) 0. 4 0. 6 a 2(0) 0. 6 0. 4 h 3(1) 0. 4 0. 8 0. 2 a 3(0) 0. 4 1 h 4(1) 0. 6 0. 2 a 4(1) 0. 4 0. 8 h 5(0) 0. 6 0. 8 0. 4 h 6(0) 1 0. 2 0. 6 Hotel Location Comfort Cleanness 40

Property 3 • Only the customers in RTOPk(cp, Skyline(P), C) possibly become the members Property 3 • Only the customers in RTOPk(cp, Skyline(P), C) possibly become the members in RTOPk(cp, P, C) The upper bounds of the remaining candidate packages Package Upper bound (a 1, h 2) {c 3} (a 1, h 5) {c 3, c 4} (a 1, h 6) {} (a 2, h 2) {} (a 2, h 5) {c 4} (a 2, h 6) {c 1, c 5} (a 3, h 2) {c 2} (a 3, h 5) {c 2, c 4} (a 3, h 6) {c 1, c 5} RTOPk(cp, Skyline(P), C) is an upper bound of RTOPk(cp, P, C) 41

Refinement Package Upper bound (a 1, h 2) {c 3} (a 1, h 5) Refinement Package Upper bound (a 1, h 2) {c 3} (a 1, h 5) {c 4} (a 2, h 6) {c 1, c 5} (a 3, h 2) {c 2} (a 3, h 5) {c 2, c 4} (a 3, h 6) The top-2 favorites of C 4: {(a 1, h 5), (a 2, h 5), (a 3, h 5)} {c 3, c 4} (a 2, h 5) The top-2 favorites of C 3: {(a 1, h 5), (a 1, h 2)} {c 1, c 5} P’ : {(a 1, h 5)} 42

Refinement Package Upper bound (a 2, h 6) {c 1, c 5} (a 3, Refinement Package Upper bound (a 2, h 6) {c 1, c 5} (a 3, h 2) {c 2} (a 3, h 6) The top-2 favorites of C 5: {(a 3, h 6), (a 4, h 6)} {c 2} (a 3, h 5) The top-2 favorites of C 1: {(a 3, h 6), (a 4, h 6)} {c 1, c 5} P’ : {(a 1, h 5), (a 3, h 6)} )} 43

Application IV • Find Most Favorite Products by Top-k Reverse Skyline Queries : user Application IV • Find Most Favorite Products by Top-k Reverse Skyline Queries : user preferences 1 Year : products 1 1 u 1 1 2 1 1 k=1 u 2 Mileage

Thank you for your attention! Thank you for your attention!