Скачать презентацию Authors Kyriakos Mouratidis Spiridon Bakiras Dimitris Papadias Presenter Скачать презентацию Authors Kyriakos Mouratidis Spiridon Bakiras Dimitris Papadias Presenter

5f5d4f20da0d0507337e7ee5f86923ef.ppt

  • Количество слайдов: 39

Authors: Kyriakos Mouratidis, Spiridon Bakiras Dimitris Papadias Presenter: Kamiru Continuous Monitoring of Top-k Queries Authors: Kyriakos Mouratidis, Spiridon Bakiras Dimitris Papadias Presenter: Kamiru Continuous Monitoring of Top-k Queries over Sliding Windows The university of Hong Kong Department of Computer Science

Outline v Motivation v Problem Setting v Related Works § Top-k Queries § Skyband Outline v Motivation v Problem Setting v Related Works § Top-k Queries § Skyband v Solutions § Top-k Computation § Maintenance Module § Skyband Monitoring Algorithm v Experimental Evaluation v Conclusion v Future Works The university of Hong Kong Department of Computer Science

Motivation v We define the top-k query first: § Given a dataset P and Motivation v We define the top-k query first: § Given a dataset P and a preference function f, a top-k query retrieves the k tuples in P with the highest scores according to f. v One real life application is: find the top 5 hotels with the following preference function f(hotel) = -hotel. price + hotel. quality The university of Hong Kong Department of Computer Science

Motivation v Existing methods are not applicable to streaming environment v The internet traffic Motivation v Existing methods are not applicable to streaming environment v The internet traffic flow monitoring is one real life application for the streaming case. § The data on the internet have very high data rate § Each tuple may include • Source IP address, destination IP address, start time, end time, MTU, TTL…etc. The university of Hong Kong Department of Computer Science

Motivation v The availability of such records § traffic estimation § network security § Motivation v The availability of such records § traffic estimation § network security § troubleshooting v For instance, top-k query helps the system to prevent the DDo. S (Distributed Denial of Service) attack if it monitors the top-k flows with the largest individual throughput in real time The university of Hong Kong Department of Computer Science

Motivation v The server 155. 223. 2. 4 has higher chance to have DDo. Motivation v The server 155. 223. 2. 4 has higher chance to have DDo. S attack than 155. 223. 2. 3 on this network. No. Packets destination ip 12 155. 213. 2. 3 11 155. 223. 2. 4 2 155. 11. 5. 2 22 155. 11. 5. 6 50 155. 223. 2. 4 2 155. 223. 2. 1 No. Packets destination ip 32 155. 223. 2. 4 155. 213. 2. 4 2 155. 11. 5. 6 155. 223. 2. 3 The university of Hong Kong Department of Computer Science

Problem Setting v A function f is increasingly monotone on dimension xi if for Problem Setting v A function f is increasingly monotone on dimension xi if for any pair of tuples (points) p 1, p 2 with p 1. xi≥p 2. xi and p 1. xj=p 2. xj j!=i we have score(p 1)≥score(p 2), where score(pi)=f(p 1. x 1, …, pn. xn) v The decreasingly monotone can be defined as the same with the reverse operation (≤). The university of Hong Kong Department of Computer Science

Problem Setting v Notice that a function may be increasingly monotone on some dimensions, Problem Setting v Notice that a function may be increasingly monotone on some dimensions, and decreasingly monotone on the remaining. v For instance, f(p)=p. x 1–p. x 2, v f is increasingly monotone on x 1 and decreasingly monotone on x 2 line defined by f=x 1 -x 2 f has lower value b f has higher value a x 1 The university of Hong Kong Department of Computer Science

Problem Setting v Problem definition: Given a set of queries Q and a set Problem Setting v Problem definition: Given a set of queries Q and a set of points P. The top -k results (Rq) of query q Q are {Rq | |Rq|=k, f(ri)>f(rj)}, which ri Rq, rj Rq v For each timestamp, § update the new arrival objects Pins § remove the objects which are expired Pdel § outputs the top-k results for each query q Q to the remaining P The university of Hong Kong Department of Computer Science

Related Works – Top-k query computation v Several existing methods solve the top-k calculation Related Works – Top-k query computation v Several existing methods solve the top-k calculation in various scenarios. v They focus on computing the top-k results from multiple data repositories. v Fagin et. al. introduce two efficient methods for processing ranked queries: § Threshold algorithm (TA) § No Random Access algorithm (NRA) The university of Hong Kong Department of Computer Science

TA and NRA v Both methods need to do sorted access in parallel to TA and NRA v Both methods need to do sorted access in parallel to each of the m sorted lists Si § which m is the number of inputs (attributes), the data in domain i are stored into Si v Descending order is used to scan the data points from all Si The university of Hong Kong Department of Computer Science

TA and NRA v As an object o is seen in input Si v TA and NRA v As an object o is seen in input Si v TA § do random access to the other lists to find the grade xi of object o in every list Si. Then compute the value of function f. v NRA § does not access to other list. Instead of compute the value of function f, it just updates two bounding attributes. v Both algorithms stop when top-k result is large than threshold T The university of Hong Kong Department of Computer Science

Example of TA and NRA v. Assume that we have 3 ranked inputs, and Example of TA and NRA v. Assume that we have 3 ranked inputs, and 5 records (a~e) in our database, find the top-1 query with the preference function f=SUM by TA and NRA. The university of Hong Kong Department of Computer Science

Example of TA and NRA v TA § First loop § Get object c, Example of TA and NRA v TA § First loop § Get object c, compute f(c)=0. 9+0. 2+0. 9=2 • Update result R={(c, 2)} • Threshold value T=0. 9+∞+∞=∞>Rk. value, continue § Get object a, compute f(a)=0. 1+0. 9+0. 8=1. 8 • Do not update the results since Rk. value>1. 8 • Threshold value T=0. 9+∞=∞>Rk. value, continue S 1 S 2 S 3 c 0. 9 a 0. 9 c 0. 9 d 0. 8 b 0. 8 a 0. 8 b 0. 6 e 0. 6 b 0. 6 e 0. 3 d 0. 4 d 0. 6 a 0. 1 c 0. 2 e 0. 5 § Get object c, do not compute f • Threshold value T=0. 9+0. 9=2. 7>Rk. value, continue § Second loop, … § Until T

Example of TA and NRA v. NRA maintains the objects whose upper rub and Example of TA and NRA v. NRA maintains the objects whose upper rub and lower rlb bound of their aggregate score v. For initial setting, if the range of value is [0, 1] § rlb = {0, 0, 0}, rub = {∞, ∞, ∞} The university of Hong Kong Department of Computer Science

Example of TA and NRA v. NRA § Get object c (0. 9), and Example of TA and NRA v. NRA § Get object c (0. 9), and c (0. 9) from S 1, S 2, and S 3 • rlb = {0. 9, 0, 1. 8, 0, 0} – Update newly accessed objects S 1 – e. g. update ra ub = 0. 9+0. 9 = 2. 7 • R = {(c, 1. 8)} • t = min{rxlb: x R} = 1. 8 • u = max{rxub: x R} = 2. 7 • if t

LARA v Mamoulis proposed the LARA (Lattice-based Rank Aggregation) algorithm which is an optimized LARA v Mamoulis proposed the LARA (Lattice-based Rank Aggregation) algorithm which is an optimized NRA method v LARA separates the algorithm into two phases § Growing phase • If t=min{rxlb: x R}

Conclusion of Top-k query computation v The performance NRA should be better than TA Conclusion of Top-k query computation v The performance NRA should be better than TA in conventional database, since it avoids a lot of random accesses. v The performance of LARA is much better than NRA which is shown on their experiments. The university of Hong Kong Department of Computer Science

Related Works – Skyband v The skyline is the points which are not dominated Related Works – Skyband v The skyline is the points which are not dominated by any point § A record pi is said to dominate another pj, if and only if, pi is preferable to pj on every attribute v The skyline of a dataset contains all tuples that belong to the result of any top-1 query with a monotone function. v The k-skyband contains the tuples that are dominated by at most k-1 other points p 3 p 1 p 5 skyline p 4 p 2 p 7 p 6 2 -skyband The university of Hong Kong Department of Computer Science

Related Works – Skyband v The skyband is used to monitor the top-k results Related Works – Skyband v The skyband is used to monitor the top-k results in score-time space. v Assume that we want to monitor the top-2 results in the following example: score {p 1, p 4} {p 1, p 3} p 1 p 2 {p 4} p 1 p 2 {p 1, p 2} p 3 p 4 p 5 expiration time p 3 {-} p 4 expiration time The university of Hong Kong Department of Computer Science

Top-k computation v Grid-based indexing method is used v For each cell c in Top-k computation v Grid-based indexing method is used v For each cell c in grid G, maxscore(c) is the maximum possible value in cell c v For each query q § Start from: • The algorithm starts from the c which has highest maxscore(c) § Terminate condition: • The search terminates when the cell c under consideration has maxscore(c) Rk. value The university of Hong Kong Department of Computer Science

Top-k computation v An example is given to explain how the top-k computation works. Top-k computation v An example is given to explain how the top-k computation works. v Assume that we have two inputs (x 1 and x 2) and a function f=x 1+2 x 2 P’’’ P ’ v The highest maxscore(c) is c 4, 4 c 3, 4 § maxscore(c)=f(P) § Scan c 4, 4 p 1 p 2 § maxscore(p’)>maxscore(p’’) v Until maxscore(c) Rk. value P’’’’ p 3 v Next scanning cell is c 3, 4 v… c 4, 4 c 1, 1 The university of Hong Kong Department of Computer Science P P’’

The maintenance module v Given two datasets: Pins and Pdel v For all p The maintenance module v Given two datasets: Pins and Pdel v For all p Pins § Insert p into the corresponding cell c § For all q who visited c, • Insert into q. R if f(p) q. Rk. value v For all p Pdel § Delete p from the corresponding cell c § For all q who visited c, • If p q. R, mark q as affected The university of Hong Kong Department of Computer Science

The maintenance module v For each affected query q, § Invoke Top-k Computation(q) § The maintenance module v For each affected query q, § Invoke Top-k Computation(q) § For all c which are not scanned by Top-k Computation(q) • Delete q from c. visitedquery The university of Hong Kong Department of Computer Science

Example of maintenance module v q: f=x 1+2 x 2, find top-1 result v Example of maintenance module v q: f=x 1+2 x 2, find top-1 result v Timestamp 1 § Pins={p 3, p 4}, Pdel={p 1, p 2} v Timestamp 2 § Pins={p 5}, Pdel={p 3} p 3 p 1 p 5 p 4 p 2 The university of Hong Kong Department of Computer Science

Summary of the maintenance module v Insertion does not invoke any top-k re-computation v Summary of the maintenance module v Insertion does not invoke any top-k re-computation v Deletion has more higher cost than insertion § Affected query need to do • Top-k computation • Update the cells which are not scanned by top-k computation, the worst case is |cell| The university of Hong Kong Department of Computer Science

Skyband Monitoring Algorithm v I demonstrate how to use the k-skyband to monitor the Skyband Monitoring Algorithm v I demonstrate how to use the k-skyband to monitor the results in score-time space in previous slide v The dominance counter (DC) can be used to get the kskyband § DC is the number of records with higher score that score expire after p Monitoring a top-2 query p 1 p 2 p 3 p 4 p 5 p 6 0 0 2 1 1 0 5 4 expiration time The university of Hong Kong Department of Computer Science

Skyband Monitoring Algorithm v The computation of dominance count can be calculated by a Skyband Monitoring Algorithm v The computation of dominance count can be calculated by a balance tree (BT) v The expiration time of every processed element of q. skyband is stored into a balanced tree BT sorted in descending order § The order of insertion is in descending score order v p. DC is simply the number of tulples that precede p in BT score Balance tree p 1 3 p 1 p 2 0 1 p 3 p 4 p 5 p 1 1 0 p 4 4 p 2 p 5 expiration time The university of Hong Kong Department of Computer Science

Skyband Monitoring Algorithm v Given two datasets: Pins and Pdel v For all p Skyband Monitoring Algorithm v Given two datasets: Pins and Pdel v For all p Pins § Insert p into the corresponding cell c § For all q who visited c, • If f(p) q. Rk. value – Insert p into q. skyband p. DC=0 – For each p’ in q. skyband with f(p’) f(p) » Update p’. DC=p’. DC+1 » If p’. DC=k evict p’ from q. skyband The university of Hong Kong Department of Computer Science

Skyband Monitoring Algorithm v For all p Pdel § Delete p from the corresponding Skyband Monitoring Algorithm v For all p Pdel § Delete p from the corresponding cell c § For all q who visited c, • If p q. R, delete p from q. skyband v For all q whose skyband has changed § If q. skyband has at least k points • q. R=top-k(q. skyband) § Else • Invoke Top-k Computation(q) • Compute dominance counters The university of Hong Kong Department of Computer Science

Experimental Evaluation v They evaluate the proposed methods using streams of both independent (IND) Experimental Evaluation v They evaluate the proposed methods using streams of both independent (IND) and anti-correlated (ANT) datasets IND (d=2) ANT (d=2) The university of Hong Kong Department of Computer Science

Experimental Evaluation v Default experimental setting § § § Data dimensionality (d): 4 Data Experimental Evaluation v Default experimental setting § § § Data dimensionality (d): 4 Data cardinality (N): 1 M Arrival rate (r): 10 K Query cardinality (Q): 1 K Result cardinality (k): 20 The university of Hong Kong Department of Computer Science

Experimental Evaluation The university of Hong Kong Department of Computer Science Experimental Evaluation The university of Hong Kong Department of Computer Science

Experimental Evaluation The university of Hong Kong Department of Computer Science Experimental Evaluation The university of Hong Kong Department of Computer Science

Experimental Evaluation The university of Hong Kong Department of Computer Science Experimental Evaluation The university of Hong Kong Department of Computer Science

Conclusions v The top-k computation module processes the minimum number of cells v Proposed Conclusions v The top-k computation module processes the minimum number of cells v Proposed two monitoring algorithms § TMA and SMA v TMA re-computes the result from scratch v SMA maintains a superset of the current answer in the form of k-skyband v In the experimental evaluation, SMA shows that it overcomes other proposed solutions The university of Hong Kong Department of Computer Science

Future works v Non-monotone preference function v Queries support various dimensionality § Cluster the Future works v Non-monotone preference function v Queries support various dimensionality § Cluster the queries to make a super query SQ, and monitor the results for these superset of queries The university of Hong Kong Department of Computer Science

Thank you for your attention! PS. Hope I can show this page on the Thank you for your attention! PS. Hope I can show this page on the time! The university of Hong Kong Department of Computer Science

References The university of Hong Kong Department of Computer Science References The university of Hong Kong Department of Computer Science