Скачать презентацию Time-focused density-based clustering of trajectories of moving objects Скачать презентацию Time-focused density-based clustering of trajectories of moving objects

e7c14f63c971503b4195c14697d7e3ff.ppt

  • Количество слайдов: 20

Time-focused density-based clustering of trajectories of moving objects Margherita D’Auria Mirco Nanni Dino Pedreschi Time-focused density-based clustering of trajectories of moving objects Margherita D’Auria Mirco Nanni Dino Pedreschi

Plan of the talk n Introduction ¨ ¨ ¨ n Density-based clustering on trajectories Plan of the talk n Introduction ¨ ¨ ¨ n Density-based clustering on trajectories ¨ ¨ n Trajectory data model distance measure Results Temporal Focusing ¨ ¨ n Motivations Problem & context Density-based Clustering (OPTICS) A clustering quality measure Heuristics for optimal temporal interval Conclusions & future work 2

Motivations n n Plenty of actual and future data sources for spatio-temporal data Sophisticated Motivations n n Plenty of actual and future data sources for spatio-temporal data Sophisticated analysis method are required, in order to fully exploit them ¨ Data mining methods ¨ Which kind of patterns/models? n Main objectives ¨A better understanding of the application domain ¨ An improvement for private and public services 3

Problem & context n A distinguishing case: Mobile devices PDAs ¨ Mobile phones ¨ Problem & context n A distinguishing case: Mobile devices PDAs ¨ Mobile phones ¨ LBS-enabled devices (may include the two above) ¨ n They (can) yield traces of their movement n An important problem: Discovering groups of individuals that (approx. ) move together in some period of time ¨ E. g. : detection of traffic jams during rush hours ¨ n A candidate Data Mining reformulation of the problem ¨ Clustering of individuals’ trajectories 4

Which kind of clustering? n n Several alternatives are available General requirements: ¨ Non-spherical Which kind of clustering? n n Several alternatives are available General requirements: ¨ Non-spherical n n clusters should be allowed E. g. : A traffic jam along a road It should be represented as a cluster which individuals form a “snake-shaped” cluster ¨ Tolerance to noise ¨ Low computational cost ¨ Applicability to complex, possibly non-vectorial data n A suitable candidate: Density-based clustering ¨ In particular, we adopt OPTICS 5

A crushed intro to OPTICS n A density threshold is defined through two parameters: A crushed intro to OPTICS n A density threshold is defined through two parameters: ¨ ε: A neighborhood radius ¨ Min. Pts: Minimum number of points n Key concepts: ¨ Core objects n ¨ Reachability-distance reach-d( p, q ) n n Objects with a ε-Neighborhood that contains at least Min. Pts objects (simplified definition: ) Distance between objects p and q Example: Object “q” is a core object if Min. Pts=2 ¨ Object “p” is not ¨ Their reach-d() is shown ¨ ε q reach-d(p, q) ch p ε –neighborhood of q 6

A crushed intro to OPTICS The algorithm: 1. 2. Repeatedly choose a non-visited random A crushed intro to OPTICS The algorithm: 1. 2. Repeatedly choose a non-visited random object, until a core object is selected Select the core object having the smallest reachability distance from all the visited core objects. If none can be found, go to step 1 Order of visit Output: reach-d() of all visited points (reachability plot) “jump” from left-hand group (0 -9) to right-hand one (10 -18) Reachability threshold Cluster 1 Cluster 2 7

Applying OPTICS to trajectories n Two key issues have to be solved ¨A suitable Applying OPTICS to trajectories n Two key issues have to be solved ¨A suitable representation for trajectories is needed n ¨A Which data model for trajectories? mean for comparing trajectories has to be provided n n Which distance between objects? OPTICS needs to define one to perform range queries 8

A trajectory data model n Raw input data: Each trajectory is represented as a A trajectory data model n Raw input data: Each trajectory is represented as a set of time-stamped coordinates ¨ T=(t 1, x 1, y 1), …, (tn, xn, yn) => Object position at time ti was (xi, yi) ¨ n Data model ¨ Parametric-spaghetti: linear interpolation between consecutive points 9

A distance between trajectories n Adopted distance = average distance n It is a A distance between trajectories n Adopted distance = average distance n It is a metric => efficient indexing methos allowed 10

A sample dataset n Set of trajectories forming 4 clusters + noise n Generated A sample dataset n Set of trajectories forming 4 clusters + noise n Generated by the CENTRE system (KDDLab software) 11

OPTICS vs. HAC & K-means HAC-average OPTICS 12 OPTICS vs. HAC & K-means HAC-average OPTICS 12

Temporal focusing n Different time intervals can show different behaviours ¨ E. g. : Temporal focusing n Different time intervals can show different behaviours ¨ E. g. : objects that are close to each other within a time interval can be much distant in other periods of time n The time interval becomes a parameter ¨ E. g. : n rush hours vs. low traffic times Problem: significant time intervals are not always known a priori ¨ An automated mechanism is needed to find them 13

Temporal focusing n The proposed method 1. Provide a notion of interestingness to be Temporal focusing n The proposed method 1. Provide a notion of interestingness to be associated with time intervals 1. 2. We define it in terms of estimated quality of the clustering extracted on the given time interval Formalize the Temporal focusing task as an optimization problem 1. Discover the time interval that maximizes the interestingness measure 14

A quality measure for density -based clustering n General principle ¨ n High-density clusters A quality measure for density -based clustering n General principle ¨ n High-density clusters separated by low-density noise are preferred The method High-density clusters correspond to low dents in the reachability plot => Evaluate the global quality Q of the clustering output as the average reachability within clusters (noise is discarded) ¨ n HIGH DENSITY MEDIUM DENSITY LOW DENSITY Definition: given ε and dataset D, compute QD, ε as: QD, ε = - R (D, ε’) = - AVGo in D’ reach-d(o) D’ = D – {noise objects} 15

FAQs n How Q() is computed for a given time interval I ? Step FAQs n How Q() is computed for a given time interval I ? Step 1: trajectory segments out of I are clipped away ¨ Step 2: OPTICS is run on the clipped trajectories ¨ Step 3: Q(I) is computed on the output reachability plot ¨ n How is the reachability threshold set for each interval? A reachability threshold is needed in order to locate clusters (and noise) ¨ The threshold for the largest I is manually set by the user ¨ Thresholds for other intervals I’ I are computed from the first one by proportionally rescaling w. r. t. average reachability ¨ n Is the optimal Q(I) biased towards tiny intervals? Yes. The problem has been fixed by defining Q’(I) = Q(I) / log |I| => A small decrease in Q(I) is accepted when it yields a much larger I ¨ 16

Esperiments n A more complex sample dataset (generated by CENTRE) ¨ Clear clusters in Esperiments n A more complex sample dataset (generated by CENTRE) ¨ Clear clusters in the central time interval vs. dispersion on the borders 17

Optimizing Q() n Find the optimal Q() by plotting values for all time intervals Optimizing Q() n Find the optimal Q() by plotting values for all time intervals ¨ The optimum corresponds to the central time interval 18

Heuristics for optimum search n Each Q() value computation requires a run of the Heuristics for optimum search n Each Q() value computation requires a run of the OPTICS algorithm n Computing all O(N 2) values is too expensive n Alternative approaches are needed n Preliminary tests with hill-climbing (i. e. , greedy) approach: (N=|{sub-intervals}|) n starting points global optimum local optima Test on the same dataset n Global optimum found in the 70, 7% of runs n Avg. number of steps: 17 n Avg. OPTICS runs: 49 19

Conclusions & Future works n Summary of the work ¨ Extension of OPTICS to Conclusions & Future works n Summary of the work ¨ Extension of OPTICS to a trajectory data model & distance ¨ Definition of the Temporal Focusing problem ¨ Definition of a clustering quality measure ¨ (Preliminary) Tests with exhaustive & greedy optimization n Future work ¨ Experimental validation over broader benchmarks ¨ Tighter integration between OPTICS and search strategy ¨ Alternative, domain-specific definition of quality measures 20