Скачать презентацию Online Pattern Discovery Applications in Data Streams Скачать презентацию Online Pattern Discovery Applications in Data Streams

627f82bd376380fe1c067626608e4673.ppt

  • Количество слайдов: 9

Online Pattern Discovery Applications in Data Streams • Sensor-less: Pairs-trading in stock trading (find Online Pattern Discovery Applications in Data Streams • Sensor-less: Pairs-trading in stock trading (find highly correlated pairs in n log n time) • Sensor-full: Gamma Ray Detection in astrophysics (burst detection over a large number of window sizes in almost linear time) • Dennis Shasha (joint work with Yunyue Zhu) • yunyue, [email protected] nyu. edu

Application 1: Pairs Trading • Stock prices streams – The New York Stock Exchange Application 1: Pairs Trading • Stock prices streams – The New York Stock Exchange (NYSE) – 50, 000 securities (streams); 100, 000 ticks (trade and quote) • Pairs Trading, a. k. a. Correlation Trading • Query: “which pairs of stocks were correlated with a value of over 0. 9 for the last three hours? ” XYZ and ABC have been correlated with a correlation of 0. 95 for the last three hours. Now XYZ and ABC become less correlated as XYZ goes up and ABC goes down. They should converge back later. I will sell XYZ and buy ABC …

Online Detection of High Correlation • Given tens of thousands of high speed time Online Detection of High Correlation • Given tens of thousands of high speed time series data streams, to detect high-value correlation, including synchronized and time-lagged, over sliding windows in real time. • Real time – high update frequency of the data stream – fixed response time, online Correlated!

Stat. Stream: Algorithm • Naive algorithm – N : number of streams – w Stat. Stream: Algorithm • Naive algorithm – N : number of streams – w : size of sliding window – space O(N) and time O(N 2 w) VS space O(N 2) and time O(N 2). • Suppose that the streams are updated every second. – With a Pentium 4 PC, the exact method can monitor only 700 streams with a delay of 2 minutes. • Our Approach – Discrete Fourier Transform to approximate correlation – grid structure to filter out unlikely pairs – Our approach can monitor 10, 000 streams with a delay of 2 minutes.

Stat. Stream: Stream synoptic data structure • Three level time interval hierarchy – Time Stat. Stream: Stream synoptic data structure • Three level time interval hierarchy – Time point, Basic window, Sliding window • Basic window (the key to our technique) – The computation for basic window i must finish by the end of the basic window i+1 – The basic window time is the system response time. • Digests Time point Basic window digests: sum DFT coefs Basic window Sliding window digests: sum DFT coefs Basic window digests: sum DFT coefs

Application 2: elastic burst detection • Discover time intervals with an unusually large numbers Application 2: elastic burst detection • Discover time intervals with an unusually large numbers of events. – In astrophysics, the sky is constantly observed for high-energy particles. When a particular astrophysical event happens, a shower of high-energy particles arrives in addition to the background noise. – In finance, stocks with unusual high trading volumes should attract the notice of traders (or perhaps regulators). • Challenge : to discover time and duration of burst, which may vary – In astrophysics, a burst of high-energy particles associated with a special event might last for a few milliseconds or a few hours or even a few days NB: Similar idea may apply to spatial burst detection.

Application 2: burst detection • example Application 2: burst detection • example

Burst Detection: Problem Statement • Problem: Given a time series of positive number x Burst Detection: Problem Statement • Problem: Given a time series of positive number x 1, x 2, . . . , xn, and a threshold function f(w), w=1, 2, . . . , n, find the subsequences of any size such that their sums are above thresholds: – all 0 f(w) • Brute force search : O(n^2) time • Our shift wavelet tree (SWT): O(n+k) time. – k is the size of the output, i. e. the number of windows with bursts

Burst Detection: Data Structure and Algorithm – Lemma 1: any subsequence s is included Burst Detection: Data Structure and Algorithm – Lemma 1: any subsequence s is included by one window w in the SWT. – Lemma 2: if Sum(s)>threshold, then Sum(w)>threshold (no false positives).