Скачать презентацию Reverse Top-k Queries Akrivi Vlachou Christos Doulkeridis Yannis Скачать презентацию Reverse Top-k Queries Akrivi Vlachou Christos Doulkeridis Yannis

b7f6fbb5027f5893527103f38f598504.ppt

  • Количество слайдов: 26

Reverse Top-k Queries Akrivi Vlachou*, Christos Doulkeridis*, Yannis Kotidis#, Kjetil Nørvåg* *Norwegian University of Reverse Top-k Queries Akrivi Vlachou*, Christos Doulkeridis*, Yannis Kotidis#, Kjetil Nørvåg* *Norwegian University of Science and Technology (NTNU), Trondheim, Norway #Athens University of Economics and Business (AUEB), Greece

Outline l Motivation & Preliminaries l Monochromatic Reverse Top-k Queries l Bichromatic Reverse Top-k Outline l Motivation & Preliminaries l Monochromatic Reverse Top-k Queries l Bichromatic Reverse Top-k Queries l Threshold-based Algorithm l Materialized Views l Experimental Evaluation l Conclusions & Future Work 2

Rank-aware Query Processing l Huge amount of available data l Users prefer to retrieve Rank-aware Query Processing l Huge amount of available data l Users prefer to retrieve a limited set of k ranked data objects that best match their preferences (top-k queries) 3

Top-k Query Given a scoring function f(), retrieve the k object that best match Top-k Query Given a scoring function f(), retrieve the k object that best match the user preferences l Linear scoring function l f w(p) = Σw[i]*p[i] l Weight w[i]: l l relative importance of attribute i Definition TOPk(w): Given a weighting vector w and a positive integer k, find the k data points p with the minimum f(p) scores Query line of w at point p: defines the score of p Query space of w defined by point p: number of enclosed points determines the rank of p 4

Reversing the Top-k Query l From the perspective of manufacturers: l it is important Reversing the Top-k Query l From the perspective of manufacturers: l it is important that a product is returned in the highest ranked positions for as many user preferences as possible l estimate the impact of a product compared to their competitors products l advertise a product to potential customers sales representative Which customers would be interested? customer 5

Reversing the Top-k Query l Reverse top-k query: Given a potential product q and Reversing the Top-k Query l Reverse top-k query: Given a potential product q and a positive integer k, which are the weighting vectors w for which q is in the top-k query result set? l Two different versions Monochromatic: l sales representative no knowledge of user preferences Bichromatic: l a dataset with user preferences is given customer 6

Car Database Example A database containing information about different cars Different users have different Car Database Example A database containing information about different cars Different users have different preferences Bob prefers a cheap car, and does not care much about the age l the best choice (top-1) for Bob is the car p 1 with score 2. 5 l Tom prefers a newer car rather than a cheap car l the best choice for Tom and Max is the car p 2 l l l 7

Car Database Example Query point q=p 2, k=1: l Bichromatic reverse top-k: {(0. 2, Car Database Example Query point q=p 2, k=1: l Bichromatic reverse top-k: {(0. 2, 0. 8), (0. 5, 0. 5)} l l advertise product to Tom and Max Monochromatic reverse top-k: line segment w[price]=[1/7, 5/6] l estimate the impact of p 2 as 69% Query point q=p 3, k=1: empty result set for the bichromatic query 8

Outline l Motivation & Preliminaries l Monochromatic Reverse Top-k Queries l Bichromatic Reverse Top-k Outline l Motivation & Preliminaries l Monochromatic Reverse Top-k Queries l Bichromatic Reverse Top-k Queries l Threshold-based Algorithm l Materialized Views l Experimental Evaluation l Conclusions & Future Work 9

Monochromatic Reverse Top-k Query m. RTOPk(q): Given a point q, a positive number k Monochromatic Reverse Top-k Query m. RTOPk(q): Given a point q, a positive number k and a dataset S, the result set of the monochromatic reverse top-k query is the locus for which there exists p in TOPk(wi) such that fwi(q) ≤ fwi(p). l The solution space W can be split into a finite set of nonadjacent partitions such that query point q has the same rank for all the weighting vectors. l For the monochromatic case: we focus on the 2 -d space l 2 m. RTOP 1(q) 1 2 Solution space 10

Geometric Interpretation d=2, k =1 l If q belongs to the convex hull, then Geometric Interpretation d=2, k =1 l If q belongs to the convex hull, then there exists exactly one partition in m. RTOP 1(q) Weighting vectors that are perpendicular to pq and qr define the line segment l For weighting vectors with smaller and larger slopes than w 1, the relative order of p and q changes l l Monochromatic reverse top-k, k>1: l The solution space may contain more than 1 partition 11

Outline l Motivation & Preliminaries l Monochromatic Reverse Top-k Queries l Bichromatic Reverse Top-k Outline l Motivation & Preliminaries l Monochromatic Reverse Top-k Queries l Bichromatic Reverse Top-k Queries l Threshold-based Algorithm l Materialized Views l Experimental Evaluation l Conclusions & Future Work 14

Bichromatic Reverse Top-k Query b. RTOPk(q): Given a point q, a positive number k Bichromatic Reverse Top-k Query b. RTOPk(q): Given a point q, a positive number k and two datasets S and W, where S represents data points and W is a dataset containing different weighting vectors, a weighting vector wi belongs to the result set, if and only if there exists p in TOPk(wi) such that fwi(q) ≤ fwi(p) l Naïve approach: l for each weighting vector process the top-k query l test if query point q is in the top-k list l 15

Threshold-based Algorithm (RTA) l Goal: l reduce the number of top-k evaluations by discarding Threshold-based Algorithm (RTA) l Goal: l reduce the number of top-k evaluations by discarding weighting vectors l Threshold-based l Algorithm (RTA): sort the weighting vectors based on pairwise similarity l top-k queries defined by similar vectors, have similar result sets evaluate the first top-k query, calculate a threshold l For each weighting vector l possibly prune based on threshold l refine threshold l 16

Example of RTA Algorithm (k=2) l Evaluate Buffer: p 1, p 2 top-2 query Example of RTA Algorithm (k=2) l Evaluate Buffer: p 1, p 2 top-2 query for w 1 l Set threshold based on w 2 l fw 2(q) > threshold discard w 2 l Refine threshold for w 3 p 9 p 8 p 5 p 10 p 6 p 4 p 2 w 3 w 1 w 2 p 3 p 7 q W=[ w 1, w 2, w 3 ] 17

Materialized Views l Threshold-based Algorithm (RTA) reduce the top-k evaluations by discarding some weighting Materialized Views l Threshold-based Algorithm (RTA) reduce the top-k evaluations by discarding some weighting vectors that are not in the reverse top-k result set l process at least as many top-k evaluations as the cardinality of the result set l l Materialized l Views find weighting vectors that belong definitely to the result without top-k evaluation 18

Materialized Views l Grid-based space w 1 , w 2 , w 3 partitioning Materialized Views l Grid-based space w 1 , w 2 , w 3 partitioning l cell Ci lower left corner Ci. L l upper right corner Ci. U l l We store for each cell Ci the results of reverse top -k queries for corners Ci. L and Ci. U 19

Materialized Views l Given a point q enclosed in cell Ci l all weighting Materialized Views l Given a point q enclosed in cell Ci l all weighting vectors in RTOPk(Ci. U) belong to the result set of q l only weighting vectors in w 1 , w 2 , w 3 RTOPk(Ci. L) - RTOPk(Ci. U) have to be examined l Materialized views can be generalized for arbitrary k

Outline l Motivation & Preliminaries l Monochromatic Reverse Top-k Queries l Bichromatic Reverse Top-k Outline l Motivation & Preliminaries l Monochromatic Reverse Top-k Queries l Bichromatic Reverse Top-k Queries l Threshold-based Algorithm l Materialized Views l Experimental Evaluation l Conclusions & Future Work 21

Experimental Setup l Comparison between Naïve and RTA (varying dimensionality, cardinality, data distribution – Experimental Setup l Comparison between Naïve and RTA (varying dimensionality, cardinality, data distribution – real data) l Queries: uniform and k-skyband points l Metrics: time l I/Os l number of top-k evaluations l 22

RTA vs. Naïve uniform distribution of S and uniform weights W |S|=10 K, |W|=10 RTA vs. Naïve uniform distribution of S and uniform weights W |S|=10 K, |W|=10 K, top-k=10, skyband query points l l RTA outperforms naive by 1 to 2 orders of magnitude as dimensionality increases, |RTOPk(q)| decreases leading to fewer top-k evaluations 23

Scalability of RTA Algorithm various distributions (UN, AC, CO) of S and uniform weights Scalability of RTA Algorithm various distributions (UN, AC, CO) of S and uniform weights W |S|=10 K or |W|=10 K, d=5, top-k=10, skyband query points l l naive requires |W| top-k query evaluations |W|=5 K, correlated dataset: l RTA needs on 544 out of 5000 top-k evaluations (saves 89. 12% of the cost) l the average size of the result set is 459 24

Performance of RTA on Real Data NBA consists of 17265 tuples, d=5 (number of Performance of RTA on Real Data NBA consists of 17265 tuples, d=5 (number of points scored, rebounds, assists, steals and blocks) HOUSE consists of 127930 tuples, d=6 (income spent on gas, electricity, water, heating, insurance, and property tax) uniform and clustered weights W (|W|=10 K) l clustered weights lead to fewer top-k evaluations l 25

Outline l Motivation & Preliminaries l Example of Reverse Top-k Queries l Monochromatic Reverse Outline l Motivation & Preliminaries l Example of Reverse Top-k Queries l Monochromatic Reverse Top-k Queries l Bichromatic Reverse Top-k Queries l Threshold-based Algorithm l Materialized Views l Experimental Evaluation l Conclusions & Future Work 26

Conclusions and Future Work We introduced reverse top-k queries l geometric interpretation of the Conclusions and Future Work We introduced reverse top-k queries l geometric interpretation of the solution space l efficient algorithm for bichromatic reverse top-k query l materialized reverse top-k views l Future Work l interpretation of solution space for higher dimensions (monochromatic reverse top-k) l improve the performance of the bichromatic reverse top-k computation l 27

Thank you! Related work: Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, Kjetil Nørvåg: Thank you! Related work: Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, Kjetil Nørvåg: "Reverse Top-k Queries" Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørvåg, Yannis Kotidis: "Identifying the Most Influential Data Objects with Reverse Top-k Queries" More information: http: //www. idi. ntnu. no/~vlachou/ 28