3968a94ff4b506bf4b39a654bc837782.ppt
- Количество слайдов: 20
Space Shuttle Engine Valve Anomaly Detection by Data Compression Matt Mahoney
Outline • • Problem Statement Related Work Anomaly Detection by Data Compression Future Work
Problem: How to Detect Anomalies in Space Shuttle Valves • Normal Solenoid Current • Abnormal
Current Method • Identify features (zero crossings, peaks…) • Specify correct behavior using SCL rules
Labeled Rising Edge Details
Goal • Reduce the human workload in specifying “normal” behavior of time-series data • Rule output should be in Space Command Language (SCL, an expert system language) to allow manual adjustments • Anomaly detection must be real time (1 K 10 K samples per second)
Related Work • Automated waveform segmentation (Gecko, Stan Salvador) • Segment characteristics (level, slope, curvature) identify states • Rules are specified as allowed state transitions • Problem: segmentation is slow
Proposal: Modeling using Data Compression • Train model on “normal” time series • Test by measuring goodness of fit to the trained model
Cross Entropy • Measures fitness of a model M relative to a true (but unknown) probability distribution, P • Minimized when M = P • Estimated by a data compressor that uses M HM(P) = x X -P(x) log M(x) • • HM(P) = Cross entropy (compressed data size) X = set of all possible inputs (waveforms) P(x) = true probability of x M(x) = estimated probability by model M
Measuring Cross Entropy Normal, uncompressed Abnormal, uncompressed Normal, compressed Abnormal, compressed Normal 1 or 2 Normal 2 Abnormal
Anomaly Score(y) = (C(xy) – C(x)) / C(y) • • x = Training (normal) waveform y = Test (possibly abnormal) waveform xy = Concatenation of x and y C(. ) = Size after compression • A higher score (worse compression after training) indicates an anomaly
Data Compressors • GZIP (Gailly) – LZ 77: duplicate strings are replaced by pointers to the previous occurrence • PAQ 3 (Mahoney) – Weighted context mixing – Arithmetic coding of next-bit probability • RK 1. 04 (Taylor) – PPMZ (models longest matching context) – Delta coding option for analog data
Data • TEK 0, TEK 1 = Normal on/off cycle of Marotta valve S/N 37898 • TEK {2, 3, 5, 10, 11, 15, 16, 17} = various forced failures • 1000 solenoid current samples at 1 ms intervals • Range: -3. 1 to 7. 06 A at 0. 04 A resolution • Converted to 1000 8 -bit values (1000 byte files)
Experimental Procedure • Nor 0: Train on TEK 0, test on TEK 1 (normal) • Nor 1: Train on TEK 1, test on TEK 0 (normal) • Ab 0: Train on TEK 0, average of tests on 8 abnormal traces • Ab 1: Train on TEK 1, average of tests on 8 abnormal traces
Anomaly Scores
Anomaly Scores for TEK 0 TEK 1 TEK 2 TEK 3 TEK 5 TEK 10 TEK 11 TEK 15 TEK 16 TEK 17 GZIP. 716. 914. 903. 937. 918. 925. 763. 919. 916 PAQ 3. 773 1. 087 1. 091 1. 121 1. 094 1. 117. 870 1. 129 1. 115 RK –mx 3 –fd 1. 834 1. 056 1. 045 1. 034 1. 029 1. 039. 812 1. 006 1. 036
Run Time Performance (750 MHz PC) • • Real Time = 1 K sample/sec GZIP – 3000 K samples/sec PAQ 3 – 40 K samples/sec RK -mx 3 –fd 1 – 78 K samples/sec
Summary • Data compression detects anomalies in the TEK valve data (2 normal, 8 abnormal traces) • GZIP and PAQ 3 detect anomalies in 8 of 8 cases using either training set • RK detects 7 of 8 anomalies using either training set (TEK 15 appears more “normal” to all 3 compressors)
Future Work • Verify with more data sets (voltage, temperature, plunger blockage) • Identify anomalous points within the trace • Improve modeling of analog data • Translate models to SCL Work is preliminary. Much needs to be done.
Thank You • For more information, http: //cs. fit. edu/~mmahoney/nasa/
3968a94ff4b506bf4b39a654bc837782.ppt