Скачать презентацию F 4 Large Scale Automated Forecasting Using Fractals Скачать презентацию F 4 Large Scale Automated Forecasting Using Fractals

b725005bae80d8a2880f59fb1326fe29.ppt

  • Количество слайдов: 53

F 4: Large Scale Automated Forecasting Using Fractals -Deepayan Chakrabarti -Christos Faloutsos Deepayan Chakrabarti F 4: Large Scale Automated Forecasting Using Fractals -Deepayan Chakrabarti -Christos Faloutsos Deepayan Chakrabarti CIKM 2002 1

Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n n Fractal Dimensions Background Our method Results Conclusions Deepayan Chakrabarti CIKM 2002 2

General Problem Definition ? Value Time Given a time series {xt}, predict its future General Problem Definition ? Value Time Given a time series {xt}, predict its future course, that is, xt+1, xt+2, . . . Deepayan Chakrabarti CIKM 2002 3

Motivation Traditional fields • Financial data analysis • Physiological data, elderly care • Weather, Motivation Traditional fields • Financial data analysis • Physiological data, elderly care • Weather, environmental studies Sensor Networks (MEMS, “Smart. Dust”) • Long / “infinite” series • No human intervention “black box” Deepayan Chakrabarti CIKM 2002 4

Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n n Fractal Dimensions Background Our method Results Conclusions Deepayan Chakrabarti CIKM 2002 5

How to forecast? ARIMA but linearity assumption n Neural Networks but large number of How to forecast? ARIMA but linearity assumption n Neural Networks but large number of parameters and long training times [Wan/1993, Mozer/1993] n Hidden Markov Models O(N 2) in number of nodes N; also fixing N is a problem [Ge+/2000] µ Lag Plots n Deepayan Chakrabarti CIKM 2002 6

Q 0: Interpolation Method Lag Plots Q 1: Lag = ? xt Q 2: Q 0: Interpolation Method Lag Plots Q 1: Lag = ? xt Q 2: K = ? Interpolate these… To get the final prediction xt-1 4 -NN Deepayan Chakrabarti CIKM 2002 New Point 7

Q 0: Interpolation Using SVD (state of the art) [Sauer/1993] xt Deepayan Chakrabarti Xt-1 Q 0: Interpolation Using SVD (state of the art) [Sauer/1993] xt Deepayan Chakrabarti Xt-1 CIKM 2002 8

Why Lag Plots? n n Based on the “Takens’ Theorem” [Takens/1981] which says that Why Lag Plots? n n Based on the “Takens’ Theorem” [Takens/1981] which says that delay vectors can be used for predictive purposes Deepayan Chakrabarti CIKM 2002 9

Extra Inside Theory Example: Lotka-Volterra equations ΔH/Δt = r. H – a. H*P ΔP/Δt Extra Inside Theory Example: Lotka-Volterra equations ΔH/Δt = r. H – a. H*P ΔP/Δt = b. H*P – m. P H is density of prey P is density of predators Suppose only H(t) is observed. Internal state is (H, P). Deepayan Chakrabarti CIKM 2002 10

Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n n Fractal Dimensions Background Our method Results Conclusions Deepayan Chakrabarti CIKM 2002 11

Problem at hand Given {x 1, x 2, …, x. N} F Automatically set Problem at hand Given {x 1, x 2, …, x. N} F Automatically set parameters - L(opt) (from Q 1) - k(opt) (from Q 2) F in Linear time on N F to minimise Normalized Mean Squared Error (NMSE) of forecasting F Deepayan Chakrabarti CIKM 2002 12

Previous work/Alternatives n n n Manual Setting : BUT infeasible [Sauer/1992] Cross. Validation : Previous work/Alternatives n n n Manual Setting : BUT infeasible [Sauer/1992] Cross. Validation : BUT Slow; leave-oneout crossvalidation ~ O(N 2 log. N) or more “False Nearest Neighbors” : BUT Unstable [Abarbanel/1996] Deepayan Chakrabarti CIKM 2002 13

Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n n Fractal Dimensions Background Our method Results Conclusions Deepayan Chakrabarti CIKM 2002 14

x(t) Intuition time X(t) The Logistic Parabola xt = axt-1(1 -xt-1) + noise Intrinsic x(t) Intuition time X(t) The Logistic Parabola xt = axt-1(1 -xt-1) + noise Intrinsic Dimensionality ≈ Degrees of Freedom ≈ Information about Xt given Xt-1 Deepayan Chakrabarti X(t-1) CIKM 2002 15

Intuition x(t) x(t-2) -1) x(t-1) x(t) x(t-2) x -1) (t x(t-2) CIKM 2002 -1) Intuition x(t) x(t-2) -1) x(t-1) x(t) x(t-2) x -1) (t x(t-2) CIKM 2002 -1) x(t 16

Intuition n To find L(opt): n n Go further back in time (ie. , Intuition n To find L(opt): n n Go further back in time (ie. , consider Xt-2, Xt-3 and so on) Till there is no more information gained about Xt Deepayan Chakrabarti CIKM 2002 17

Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n n Fractal Dimensions Background Our method Results Conclusions Deepayan Chakrabarti CIKM 2002 18

Fractal Dimensions n FD = intrinsic dimensionality “Embedding” dimensionality = 3 Intrinsic dimensionality = Fractal Dimensions n FD = intrinsic dimensionality “Embedding” dimensionality = 3 Intrinsic dimensionality = 1 Deepayan Chakrabarti CIKM 2002 19

FD Fractal dimensionality = intrinsic Dimensions [Belussi/1995] log( # pairs) Points to note: • FD Fractal dimensionality = intrinsic Dimensions [Belussi/1995] log( # pairs) Points to note: • FD can be a noninteger • There are fast methods to compute it Deepayan Chakrabarti CIKM 2002 log(r) 20

Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n n Fractal Dimensions Background Our method Results Conclusions Deepayan Chakrabarti CIKM 2002 21

Q 1: Finding L(opt) Use Fractal Dimensions to find the optimal lag length L(opt) Q 1: Finding L(opt) Use Fractal Dimensions to find the optimal lag length L(opt) Fractal Dimension n epsilon f L(opt) Deepayan Chakrabarti CIKM 2002 Lag (L) 22

Q 2: Finding k(opt) n To find k(opt) • Conjecture: k(opt) ~ O(f) We Q 2: Finding k(opt) n To find k(opt) • Conjecture: k(opt) ~ O(f) We choose k(opt) = 2*f + 1 Deepayan Chakrabarti CIKM 2002 23

Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n n Fractal Dimensions Background Our method Results Conclusions Deepayan Chakrabarti CIKM 2002 24

Value Datasets n Logistic Parabola: Time xt = axt-1(1 -xt-1) + noise Models population Value Datasets n Logistic Parabola: Time xt = axt-1(1 -xt-1) + noise Models population of flies [R. May/1976] Deepayan Chakrabarti CIKM 2002 25

Value Datasets n n Logistic Parabola: Time xt = axt-1(1 -xt-1) + noise Models Value Datasets n n Logistic Parabola: Time xt = axt-1(1 -xt-1) + noise Models population of flies [R. May/1976] LORENZ: Models convection currents in the air Deepayan Chakrabarti CIKM 2002 26

Value Datasets n n n Logistic Parabola: Time xt = axt-1(1 -xt-1) + noise Value Datasets n n n Logistic Parabola: Time xt = axt-1(1 -xt-1) + noise Models population of flies [R. May/1976] LORENZ: Models convection currents in the air LASER: fluctuations in a Laser over time (from the Santa Fe Time Series Competition, 1992) Error NMSE = ∑(predicted-true)2/σ2 CIKM 2002 27

FD Logistic Parabola Value Lag • FD vs L plot flattens out • L(opt) FD Logistic Parabola Value Lag • FD vs L plot flattens out • L(opt) = 1 Deepayan Chakrabarti Timesteps CIKM 2002 28

Logistic Parabola Our Prediction from here Value Timesteps Deepayan Chakrabarti CIKM 2002 29 Logistic Parabola Our Prediction from here Value Timesteps Deepayan Chakrabarti CIKM 2002 29

Value Logistic Parabola Comparison of prediction to correct values Timesteps Deepayan Chakrabarti CIKM 2002 Value Logistic Parabola Comparison of prediction to correct values Timesteps Deepayan Chakrabarti CIKM 2002 30

FD Logistic Parabola exactly minimizes NMSE Deepayan Chakrabarti NMSE Our L(opt) = 1, which FD Logistic Parabola exactly minimizes NMSE Deepayan Chakrabarti NMSE Our L(opt) = 1, which CIKM 2002 Lag 31

FD LORENZ Value Lag • L(opt) = 5 Timesteps Deepayan Chakrabarti CIKM 2002 32 FD LORENZ Value Lag • L(opt) = 5 Timesteps Deepayan Chakrabarti CIKM 2002 32

Our Prediction from here LORENZ Value Timesteps Deepayan Chakrabarti CIKM 2002 33 Our Prediction from here LORENZ Value Timesteps Deepayan Chakrabarti CIKM 2002 33

Value LORENZ Comparison of prediction to correct values Timesteps Deepayan Chakrabarti CIKM 2002 34 Value LORENZ Comparison of prediction to correct values Timesteps Deepayan Chakrabarti CIKM 2002 34

FD LORENZ Also NMSE is optimal at Lag = 5 Deepayan Chakrabarti NMSE L(opt) FD LORENZ Also NMSE is optimal at Lag = 5 Deepayan Chakrabarti NMSE L(opt) = 5 CIKM 2002 Lag 35

FD Laser Value Lag • L(opt) = 7 Timesteps Deepayan Chakrabarti CIKM 2002 36 FD Laser Value Lag • L(opt) = 7 Timesteps Deepayan Chakrabarti CIKM 2002 36

Our Prediction starts here Laser Value Timesteps Deepayan Chakrabarti CIKM 2002 37 Our Prediction starts here Laser Value Timesteps Deepayan Chakrabarti CIKM 2002 37

Value Laser Comparison of prediction to correct values Timesteps Deepayan Chakrabarti CIKM 2002 38 Value Laser Comparison of prediction to correct values Timesteps Deepayan Chakrabarti CIKM 2002 38

FD Laser Corresponding NMSE is close to optimal Deepayan Chakrabarti NMSE L(opt) = 7 FD Laser Corresponding NMSE is close to optimal Deepayan Chakrabarti NMSE L(opt) = 7 CIKM 2002 Lag 39

Speed and Scalability n n Preprocessing is linear in N Proportional to time taken Speed and Scalability n n Preprocessing is linear in N Proportional to time taken to calculate FD Deepayan Chakrabarti CIKM 2002 40

Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n Outline n n Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method n n Fractal Dimensions Background Our method Results Conclusions Deepayan Chakrabarti CIKM 2002 41

Conclusions Our Method: ü Automatically set parameters ü ü ü L(opt) (answers Q 1) Conclusions Our Method: ü Automatically set parameters ü ü ü L(opt) (answers Q 1) k(opt) (answers Q 2) In linear time on N Deepayan Chakrabarti CIKM 2002 42

Conclusions n n Black-box non-linear time series forecasting Fractal Dimensions give a fast, automated Conclusions n n Black-box non-linear time series forecasting Fractal Dimensions give a fast, automated method to set all parameters So, given any time series, we can automatically build a prediction system Useful in a sensor network setting Deepayan Chakrabarti CIKM 2002 43

Extra http: //snapdragon. cald. cs. cmu. edu/TSP Snapshot Deepayan Chakrabarti CIKM 2002 44 Extra http: //snapdragon. cald. cs. cmu. edu/TSP Snapshot Deepayan Chakrabarti CIKM 2002 44

Extra Future Work n n Feature Selection Multi-sequence prediction Deepayan Chakrabarti CIKM 2002 45 Extra Future Work n n Feature Selection Multi-sequence prediction Deepayan Chakrabarti CIKM 2002 45

Extra Discussion – Some other problems How to forecast? Given: • x 1, x Extra Discussion – Some other problems How to forecast? Given: • x 1, x 2, …, x. N • L(opt) • k(opt) How to find the k(opt) nearest neighbors quickly? Deepayan Chakrabarti CIKM 2002 46

Extra Motivation Forecasting also allows us to • Find outliers anything that doesn’t match Extra Motivation Forecasting also allows us to • Find outliers anything that doesn’t match our prediction! • Find patterns if different circumstances lead to similar predictions, they may be related. Deepayan Chakrabarti CIKM 2002 47

Motivation (Examples) Extra Traditional • EEGs : Patterns of electromagnetic impulses in the brain Motivation (Examples) Extra Traditional • EEGs : Patterns of electromagnetic impulses in the brain • Intensity variations of white dwarf stars • Highway usage over time Sensors • “Active Disks” forecasting / prefetching / buffering • “Smart House” sensors monitor situation in a house • Volcano monitoring Deepayan Chakrabarti CIKM 2002 48

Extra L(opt) = ? General Method K(opt) = ? • Store all the delay Extra L(opt) = ? General Method K(opt) = ? • Store all the delay vectors {xt-1, …, xt-L(opt)} and corresponding prediction xt • Find the latest delay vector xt • Find nearest neighbors • Interpolate Deepayan Chakrabarti Interpolate Xt-1 CIKM 2002 49

Extra Intuition Fractal dimension • The FD vs L plot does flatten out • Extra Intuition Fractal dimension • The FD vs L plot does flatten out • L(opt) = 1 Deepayan Chakrabarti CIKM 2002 Lag 50

Extra Inside Theory n n n Internal state may be unobserved But the delay Extra Inside Theory n n n Internal state may be unobserved But the delay vector space is a faithful reconstruction of the internal system state So prediction in delay vector space is as good as prediction in state space Deepayan Chakrabarti CIKM 2002 51

Fractal Dimensions n n Extra Many real-world datasets have fractional intrinsic dimension There exist Fractal Dimensions n n Extra Many real-world datasets have fractional intrinsic dimension There exist fast (O(N)) methods to calculate the fractal dimension of a cloud of points [Belussi/1995] Deepayan Chakrabarti CIKM 2002 52

Speed and Scalability n Extra Preprocessing varies as L(opt)2 Deepayan Chakrabarti CIKM 2002 53 Speed and Scalability n Extra Preprocessing varies as L(opt)2 Deepayan Chakrabarti CIKM 2002 53