Скачать презентацию Progressive Computation of The Min-Dist Optimal-Location Query Donghui Скачать презентацию Progressive Computation of The Min-Dist Optimal-Location Query Donghui

c9a4e62af0bf6488fc14c861069b04d2.ppt

  • Количество слайдов: 56

Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University of Hong Kong Donghui Zhang et al. Optimal Location Query VLDB’ 06, Seoul, Korea

Motivation • “What is the optimal location in Boston area to build a new Motivation • “What is the optimal location in Boston area to build a new Mc. Donald’s store? ” • Suppose a customer drives to the closest Mc. Donald’s. • Optimality: Minimize AVG driving distance. Donghui Zhang et al. Optimal Location Query 2

Who will be interested? • Corporations – Chained restaurants (e. g. Mc. Donald’s, Burger Who will be interested? • Corporations – Chained restaurants (e. g. Mc. Donald’s, Burger King, Starbucks) – Supermarkets (e. g. Wal-Mart, Costco, Stop & Shop) – Location-based service providers (e. g. Verizon, AT&T) • Computer Scientists especially in – Databases – Computational Geometry – Algorithms Donghui Zhang et al. Optimal Location Query 3

min-dist OL 200 600 • Without any new site: AD = (200+600+600)/4 = 400. min-dist OL 200 600 • Without any new site: AD = (200+600+600)/4 = 400. Donghui Zhang et al. Optimal Location Query 4

min-dist OL 30 30 l 1 600 • Without any new site: AD = min-dist OL 30 30 l 1 600 • Without any new site: AD = (200+600+600)/4 = 400. • With new site l 1: AD(l 1) = (30+30+600)/4 = 315. Donghui Zhang et al. Optimal Location Query 5

min-dist OL 200 30 30 200 • Without any new site: AD = (200+600+600)/4 min-dist OL 200 30 30 200 • Without any new site: AD = (200+600+600)/4 = 400. • With new site l 1: AD(l 1) = (30+30+600)/4 = 315. • With new site l 2 : AD(l 2) = (200+30+30)/4 = 115. Donghui Zhang et al. Optimal Location Query l 2 6

Formal Definition • Given a set S of sites, a set O of objects, Formal Definition • Given a set S of sites, a set O of objects, and a query range Q , • min-dist OL is a location l Q which minimizes distance between o and its nearest site • “Solution”: compute all AD(l). But… Donghui Zhang et al. Optimal Location Query 7

Challenging 1. There are infinite number of locations in Q! How to produce a Challenging 1. There are infinite number of locations in Q! How to produce a finite set of candidates (yet keeping optimality)? 2. How to avoid computing AD(l) for all candidates? Donghui Zhang et al. Optimal Location Query 8

Solution Highlights 1. Algorithm to compute AD(l). 2. Theorems to limit #candidates. 3. Lower-bound Solution Highlights 1. Algorithm to compute AD(l). 2. Theorems to limit #candidates. 3. Lower-bound of AD(l) for all locations l in a cell C. 4. Progressive algorithm. Donghui Zhang et al. Optimal Location Query 9

L 1 Distance • d(o, s) = |o. x – s. x|+|o. y – L 1 Distance • d(o, s) = |o. x – s. x|+|o. y – s. y| Donghui Zhang et al. Optimal Location Query 10

1. Compute AD(l) • Remember • Define • Let RNN(l) be the objects “attracted” 1. Compute AD(l) • Remember • Define • Let RNN(l) be the objects “attracted” by l. • AD(l)=AD if RNN(l)= l RNN(l)= AD=AD(l) Donghui Zhang et al. Optimal Location Query 11

1. Compute AD(l) • Remember • Define • Let RNN(l) be the objects “attracted” 1. Compute AD(l) • Remember • Define • Let RNN(l) be the objects “attracted” by l. • AD(l)=AD if RNN(l)= l RNN(l)={o 7, o 8} AD(l) < AD Donghui Zhang et al. Optimal Location Query 12

1. Compute AD(l) • Remember • Define • Let RNN(l) be the objects “attracted” 1. Compute AD(l) • Remember • Define • Let RNN(l) be the objects “attracted” by l. • AD(l)=AD if RNN(l)= • AD(l)=AD - ? Average savings for customers in RNN(l) Donghui Zhang et al. Optimal Location Query 13

1. Compute AD(l) • Theorem • S and O are “static” versus l. – 1. Compute AD(l) • Theorem • S and O are “static” versus l. – AD can be pre-computed. – So is d. NN(o, S) • To compute AD(l): – Find RNN(l) – o RNN(l), compute d(o, l) Donghui Zhang et al. Optimal Location Query 14

How to compute RNN(l)? • This is an implementation detail, dealing with computational geometry How to compute RNN(l)? • This is an implementation detail, dealing with computational geometry and spatial databases. • Naïve solution: o O , compare with all sites and l. • More efficient: 1. Compute Voronoi cell of l. 2. Retrieve objects inside the Voronoi cell using a range search on R-tree. Donghui Zhang et al. Optimal Location Query 15

How to compute RNN(l)? (1) Compute Voronoi cell • Remember: RNN(l) is the set How to compute RNN(l)? (1) Compute Voronoi cell • Remember: RNN(l) is the set of objects close to l than to any existing site in S. • Consider all sites. Draw a spatial region close to l than to any site. l Donghui Zhang et al. Optimal Location Query 16

How to compute RNN(l)? (2) Retrieve objects • Standard range search. • Any spatial How to compute RNN(l)? (2) Retrieve objects • Standard range search. • Any spatial access methods, e. g. Rtree. Donghui Zhang et al. Optimal Location Query 17

y axis 10 m g h l 8 6 k e f i d y axis 10 m g h l 8 6 k e f i d 4 b 2 j a c x axis 0 2 4 6 8 10 Range query: find the objects in a given range. E. g. find all hotels in Boston. No index: scan through all objects. NOT EFFICIENT! Donghui Zhang et al. Optimal Location Query 18

Donghui Zhang et al. Optimal Location Query 19 Donghui Zhang et al. Optimal Location Query 19

Donghui Zhang et al. Optimal Location Query 20 Donghui Zhang et al. Optimal Location Query 20

Donghui Zhang et al. Optimal Location Query 21 Donghui Zhang et al. Optimal Location Query 21

y axis 10 m g h l 8 k e f 6 E 1 y axis 10 m g h l 8 k e f 6 E 1 d 4 b 2 i j E 2 a c x axis 0 a E 3 b E 3 E 4 c d E 4 Donghui Zhang et al. 4 10 8 6 E 1 2 E 2 Root E 5 E 6 e f E 5 g E 7 i h E 6 Optimal Location Query E 2 j l k m E 7 22

y axis 10 m g h l 8 k e f 6 E 1 y axis 10 m g h l 8 k e f 6 E 1 d 4 b 2 i j E 2 a c x axis 0 a E 3 b E 3 E 4 c d E 4 Donghui Zhang et al. 4 10 8 6 E 1 2 E 2 Root E 5 E 6 e f E 5 g E 7 i h E 6 Optimal Location Query E 2 j l k m E 7 23

y axis 10 m g h l 8 k e f 6 E 1 y axis 10 m g h l 8 k e f 6 E 1 d 4 b 2 i j E 2 a c x axis 0 a E 3 b E 3 E 4 c d E 4 Donghui Zhang et al. 4 10 8 6 E 1 2 E 2 Root E 5 E 6 e f E 5 g E 7 i h E 6 Optimal Location Query E 2 j l k m E 7 24

2. Limit #candidates • Theorem: within the X/Y range of Q, draw grid lines 2. Limit #candidates • Theorem: within the X/Y range of Q, draw grid lines crossing objects. Only need to consider intersections! Q Donghui Zhang et al. Optimal Location Query 25

2. Limit #candidates • Theorem: within the X/Y range of Q, draw grid lines 2. Limit #candidates • Theorem: within the X/Y range of Q, draw grid lines crossing objects. Only need to consider intersections! Q Donghui Zhang et al. Optimal Location Query 5 x 6=30 candidates 26

2. Limit #candidates • Proof idea: suppose the OL is not, move it will 2. Limit #candidates • Proof idea: suppose the OL is not, move it will produce a better (or equal) result. δ l • Consider RNN(l). • Move to the right saves total dist. Donghui Zhang et al. Optimal Location Query 27

2. VCU(Q) • A spatial region, enclosing the objects closer to Q than to 2. VCU(Q) • A spatial region, enclosing the objects closer to Q than to sites in S. • It’s the Voronoi cell of Q versus sites in S. Q Donghui Zhang et al. Optimal Location Query 28

2. Further Limit #candidates • Only consider objects in VCU(Q). 5 x 6=30 candidates 2. Further Limit #candidates • Only consider objects in VCU(Q). 5 x 6=30 candidates Donghui Zhang et al. Optimal Location Query 29

2. Further Limit #candidates • Only consider objects in VCU(Q). 5 x 6=30 candidates 2. Further Limit #candidates • Only consider objects in VCU(Q). 5 x 6=30 candidates Donghui Zhang et al. Optimal Location Query 30

2. Further Limit #candidates • Only consider objects in VCU(Q). 4 x 4=16 candidates 2. Further Limit #candidates • Only consider objects in VCU(Q). 4 x 4=16 candidates Donghui Zhang et al. Optimal Location Query 31

Naïve Algorithm • Derive candidates. • Compute AD(l) for each. • Pick smallest. • Naïve Algorithm • Derive candidates. • Compute AD(l) for each. • Pick smallest. • Not efficient! Too many candidates! To compute AD(l) for each one, need: • compute RNN(l) • retrieve all these objects… Donghui Zhang et al. Optimal Location Query 32

Progressive Idea • Treat Q as a cell and consider its corners. Donghui Zhang Progressive Idea • Treat Q as a cell and consider its corners. Donghui Zhang et al. Optimal Location Query 33

Progressive Idea • Divide the cell. Donghui Zhang et al. Optimal Location Query 34 Progressive Idea • Divide the cell. Donghui Zhang et al. Optimal Location Query 34

Progressive Idea • Divide the cell. Donghui Zhang et al. Optimal Location Query 35 Progressive Idea • Divide the cell. Donghui Zhang et al. Optimal Location Query 35

Progressive Idea • Recursively divide a sub-cell. Donghui Zhang et al. Optimal Location Query Progressive Idea • Recursively divide a sub-cell. Donghui Zhang et al. Optimal Location Query 36

Progressive Idea • Recursively divide a sub-cell. • Able to check all candidates. Donghui Progressive Idea • Recursively divide a sub-cell. • Able to check all candidates. Donghui Zhang et al. Optimal Location Query 37

Progressive Idea • Q: What do you save? • A: Cell pruning, if its Progressive Idea • Q: What do you save? • A: Cell pruning, if its lower bound AD(l 0) of some candidate l 0. AD(lo ) =50 Suppose 60 is a lower bound for AD(l), l C Donghui Zhang et al. Optimal Location Query 38

3. LB(C): lower bound for AD(l), l C AD(c 1)=1000 AD(c 2)=3000 c AD(c 3. LB(C): lower bound for AD(l), l C AD(c 1)=1000 AD(c 2)=3000 c AD(c 3)=4000 Donghui Zhang et al. AD(c 4)=2500 Optimal Location Query 39

3. LB(C): lower bound for AD(l), l C AD(c 1)=1000 AD(c 2)=3000 c AD(c 3. LB(C): lower bound for AD(l), l C AD(c 1)=1000 AD(c 2)=3000 c AD(c 3)=4000 AD(c 4)=2500 • Theorem: is a lower bound, where p is perimeter. • e. g. LB(C)=3500 -p/4 Donghui Zhang et al. Optimal Location Query 40

3. LB(C): lower bound for AD(l), l C • A better lower bound Theorem: 3. LB(C): lower bound for AD(l), l C • A better lower bound Theorem: • Comparing with the previous lower bound: • Higher quality since the lower bound is larger. • More computation. Donghui Zhang et al. Optimal Location Query 41

4. The Progressive Algorithm 1. Maintain a heap of cells ordered by LB(). Initially 4. The Progressive Algorithm 1. Maintain a heap of cells ordered by LB(). Initially one cell: Q. 2. Maintain the best candidate lopt 3. Pick the cell with minimum LB() and partition it. 4. Compute AD() for the corners of subcells. 5. Compute LB() for the sub-cells. 6. Insert sub-cell ci to heap if LB(ci)

Progressiveness • The algorithm quickly reports a candidate OL with a confidence interval, and Progressiveness • The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining. AD(best corner of Q) AD( real OL ) is inside the interval LB(Q) Time Donghui Zhang et al. Optimal Location Query 43

Progressiveness • The algorithm quickly reports a candidate OL with a confidence interval, and Progressiveness • The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining. AD(best candidate) AD( real OL ) is inside the interval LB(Q) Time Donghui Zhang et al. Optimal Location Query 44

Progressiveness • The algorithm quickly reports a candidate OL with a confidence interval, and Progressiveness • The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining. AD(best candidate) AD( real OL ) is inside the interval Min{ LB(C) | C in heap } • Time User may choose to terminate any time. Donghui Zhang et al. Optimal Location Query 45

Batch Partitioning • • • To partition a cell, should partition into multiple sub-cells. Batch Partitioning • • • To partition a cell, should partition into multiple sub-cells. Reason: to compute AD(l), need to access the R*-tree of objects. When access the R*-tree, want to compute multiple AD(l). Tradeoff: if partition too much: wasteful! Since some candidates could be pruned. Donghui Zhang et al. Optimal Location Query 46

Performance Setup • O: 123, 593 postal addresses in Northeastern part of US. Stored Performance Setup • O: 123, 593 postal addresses in Northeastern part of US. Stored using an R*-tree. • S: randomly select 100 sites from O. • Buffer: 128 pages. • Dell Pentium IV 3. 2 GHz. • Query size: 1% in each dimension. Donghui Zhang et al. Optimal Location Query 47

2. Further Limit #candidates • Only consider objects in VCU(Q). 4 x 4=16 candidates 2. Further Limit #candidates • Only consider objects in VCU(Q). 4 x 4=16 candidates Donghui Zhang et al. Optimal Location Query 48

Effect of VCU Computation Donghui Zhang et al. Optimal Location Query 49 Effect of VCU Computation Donghui Zhang et al. Optimal Location Query 49

3. LB(C): lower bound for AD(l), l C AD(c 1)=1000 AD(c 2)=3000 c AD(c 3. LB(C): lower bound for AD(l), l C AD(c 1)=1000 AD(c 2)=3000 c AD(c 3)=4000 AD(c 4)=2500 • Theorem: is a lower bound, where p is perimeter. • e. g. LB(C)=3500 -p/4 Donghui Zhang et al. Optimal Location Query 50

3. LB(C): lower bound for AD(l), l C • A better lower bound Theorem: 3. LB(C): lower bound for AD(l), l C • A better lower bound Theorem: • Comparing with the previous lower bound: • Higher quality since the lower bound is larger. • More computation. Donghui Zhang et al. Optimal Location Query 51

Comparison of Lower Bounds Donghui Zhang et al. Optimal Location Query 52 Comparison of Lower Bounds Donghui Zhang et al. Optimal Location Query 52

Effect of Batch Partitioning Donghui Zhang et al. Optimal Location Query 53 Effect of Batch Partitioning Donghui Zhang et al. Optimal Location Query 53

Progressiveness • The algorithm quickly reports a candidate OL with a confidence interval, and Progressiveness • The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining. AD(best candidate) AD( real OL ) is inside the interval Min{ LB(C) | C in heap } • Time User may choose to terminate any time. Donghui Zhang et al. Optimal Location Query 54

Progressiveness • Each step: partition a cell to 40 sub-cells. • After 200 steps, Progressiveness • Each step: partition a cell to 40 sub-cells. • After 200 steps, accurate answer. • After 20 steps, answer is 1% away from optimal. Donghui Zhang et al. Optimal Location Query 55

Conclusions • Introduced the min-dist optimallocation query. • Proved theorems to limit the number Conclusions • Introduced the min-dist optimallocation query. • Proved theorems to limit the number of candidates. • Presented lower-bound estimators. • Proposed a progressive algorithm. Donghui Zhang et al. Optimal Location Query 56