A Framework for Constructing Features and Models for

Скачать презентацию A Framework for Constructing Features and Models for

259396b0ca5aa5b9025d58808945939d.ppt

Количество слайдов: 29

A Framework for Constructing Features and Models for Intrusion Detection Systems Authors: Wenke Lee & Salvatore J. Stolfo Published in ACM Transactions on Information and System Security, Volume 3, Number 4, 2000. Presented By Suchandra Goswami

Contributions of this work • Development of automated Intrusion Detection Systems (IDSs) rather than pure knowledge encoding and engineering approaches • Provides a novel framework called MADAM ID, for Mining Audit Data for Automated Models for Intrusion Detection • First work applying data mining and machine learning algorithms for IDSs

Introduction • Intrusion Detection (ID) is the art of detecting inappropriate, incorrect, or anomalous activity. • ID systems that operate on a host to detect malicious activity on that host are called hostbased ID systems • ID systems that operate on network data flows are called network-based ID systems. • statistical anomaly detection and patternmatching most commonly used for ID

• Two main ID techniques: misuse detection and anomaly detection • Misuse detection – Use patterns of well-known attacks or weak spots of the system to match and identify known intrusions • E. g, signature rule for “guessing password attack” will be “there are more than 4 failed login attempts within 2 minutes” • Anomaly detection – Flag observed activities that deviate significantly from the established normal usage profiles.

• This research takes a data centric point of view and ID is considered to be a data analysis process • Data mining programs used to compute models that accurately capture actual behavior (patterns) • Eliminates the need to manually analyze and encode intrusion patterns • Validated with large amounts of audit data

Some Data Mining techniques and its application to IDSs • Classification – maps data items into one of several pre-defined categories. Algorithms generally output classifiers like decision trees or rules. • E. g, gather sufficient “normal” and “abnormal” audit data. Apply classification algorithm to learn a classifier that can label unseen audit data normal or abnormal • Link Analysis – determines relation/correlation between fields in the database records. E. g, “emacs” may be highly associated with “C” files

• Sequence analysis – models sequential patterns. Algorithms discover what time based sequence of audit events are frequently occurring together. • Frequent event patterns provide guidelines for incorporating temporal statistical measures into intrusion detection models • E. g, patterns from audit data containing network based Do. S attacks suggest that several per-host and per-service measures should be included

models features Patterns evaluation feedback Connection/session records Packets/events (ASCII) Raw audit data Data Mining process of building ID models

Data Mining techniques used in MADAM ID • Audit data consist of pre-processed timestamped audit records with a number of features/fields • ID is considered to be a classification problem • Given a set of records, where one of the features is a class label, classification algorithms compute a model that uses the most discriminating feature values to describe a concept

Telnet Records label service flag hot Failed_logins compromised Root_ shell su duration …. normal telnet SF 0 0 0 10. 2 … normal telnet SF 0 0 0 3 1 2. 1 … guess telnet SF 0 6 0 0 0 26. 2 … normal telnet SF 0 0 0 126. 2 … overflow telnet SF 3 0 2 1 0 92. 5 … normal telnet SF 0 0 0 2. 1 … guess telnet SF 0 5 0 0 0 13. 9 … overflow telnet SF 3 0 2 1 0 92. 5 … normal telnet SF 0 0 0 1248 … … … …

Rule Learning • RIPPER is a classification rule learning program that generate rules • Accuracy of classification model depends on the set of features provided in the training set • Classification algorithm looks for features with large information gain • Adding per-host and perservice temporal service reulted in significant improvement in accuracy RIPPER RULE Meaning Guess : - failed_logins ≥ 4 If number of failed logins is at least 4, then this telnet connection is “guess”, a guessing password attack Overflow : - hot ≥ 3, compromised ≥ 2, root_shell = 1 If the number of hot indicators is at least 3, the number of compromised conditions is at least 2, and a root shell is obtained, then this telnet connection is a buffer overflow attack …. . Normal : - true If none of the above, then this connection is “normal”

Meta-classification • Meta-learning is a mechanism for inductively learning the correlation of predictions made by a number of base classifiers • Each record in training data has the true class label and the predictions made by the base classifiers • Meta-classifier combines the base models to make a final prediction • IDS should consist of multiple cooperative lightweight subsystems that monitors separate parts of the network environment

Association Rules • Program executions and user activities exhibit frequent correlations among system features • Goal of mining association rules - derive multifeature correlations from database table • Support(X) - % of records that contain item set X where each record is a set of items • Association rule – an expression of the form X → Y, [c, s] where X and Y are itemsets, X ∩ Y = Φ, s = support(X U Y) c = support(X U Y) / support(X) is the confidence

time hostname command arg 1 arg 2 am pascal mkdir 1 … am pascal cd dir 1 … am pascal vi text … am pascal tex vi … am pascal subject progress … am pascal vi text … vi → time = am, hostname = pascal, arg 1 = text, [1. 0, 44. 4] Support(vi) = 44. 4% When using vi to edit a file, the user is always (i. e, 100% of the time) editing a text file, in the morning, and at host pascal; and 44. 4% of the command data matches this pattern

Frequent Episodes • A frequent episode rule is the expression X, Y → Z, [c, s, w] where w is the width of the time interval [t 1 , t 2 ] during which the episode occurs • Mined frequent episodes from audit data contain association among features used to construct temporal statistical features for building classifiers

Network Connection Records timestamp duration service Src_host Dst_host Src_ bytes Dst_ bytes flag 1. 1 0 http Spoofed_1 victim 0 0 S 0 1. 1 0 http Spoofed_2 Victim 0 0 S 0 1. 1 0 http Spoofed_3 Victim 0 0 S 0 1. 1 0 http Spoofed_4 Victim 0 0 S 0 1. 1 0 http Spoofed_5 Victim 0 0 S 0 1. 1 0 http Spoofed_6 Victim 0 0 S 0 1. 1 0 http Spoofed_7 Victim 0 0 S 0 ……. . ……. …. . 10. 1 2 ftp A B 200 300 SF Flag = S 0, service = http, dst_host =‘victim’ used to describe the SYN flood attack Victim → service = ‘http’, src_byte = 0, dst_byte = 0, flag = ‘S 0’ [1. 0, 0. 7, 0]

Constructing features from intrusion patterns • Parse frequent episodes and use three operators, count, percent, and average to construct statistical features Procedure – E. g, assume F 0 say dst_host is a reference feature and the width of the episode is w seconds – Add the following features that examine only the connections in the past w seconds that share the same value in dst_host as the current connection – Add a feature that computes “ the count of these connections” – Let F 1 be service, src_host or dst_host other than F 0. If the same value of F 1 is in all item sets of the episode, add a feature that computes “% of connections having same F 1 value as the current connection” – Let V 2 be a value (e. g, S 0) of a feature F 2 (say flag). If V 2 is in all the itemsets of the episode, add a feature that computes “% of connections having same V 2”; otherwise if F 2 is a numerical feature, add a feature that “computes the average of F 2 values”.

Example to illustrate feature construction • Suppose record 7 in slide 17 is our current connection ( F 0 value = ‘victim’, F 1 value = ‘http’) Assume w = 0, F 0 is the feature ‘dst_host’ • Count number of connections in the past w = 0 time units having the same value for feature F 0 i. e, dst_host = ‘victim’ ( = 7 for this e. g) • Create a new feature count_F 0 (containing value 7 in this e. g) • Assume F 1 is the feature ‘service’ • Compute the % of connections having service = ‘http’ in the past w = 0 time units for a given F 0 i. e, dst_host = ‘victim’ ( = 100% for this e. g) • Create a new feature pcnt_F 1_F 0 (containing value 100 in this e. g) • Assume V 2 = S 0 • Compute the % of connections having V 2 = ‘S 0’ in the past w = 0 time units for a given F 0 i. e, dst_host = ‘victim’ • Create a new feature pcnt_V 2_F 0 (containing value 100 in this e. g)

Experimentation • Experiments were conducted on 1998 DARPA Intrusion Detection Evaluation Program data and DARPA BSM data • Algorithms and tools of MADAM ID were used to process audit data, mine patterns, construct features and build RIPPER classifiers • DARPA data – 4 gigabytes of compressed tcpdump data of 7 weeks of network traffic • Data was processed into 5 million connection records of about 100 bytes each • Do. S, R 2 L, U 2 R, PROBING attacks in training data

Model # of features in records # of rules # of features used in rules 22 55 11 traffic 20 26 4+9 Host traffic 14 8 1+5 Model Complexities content

User Anomaly Detection • Goal is to determine whether the behavior of a user is normal or not • Difficult to classify a single event by a user as normal or abnormal • A user’s actions during a login session needs to be studied as a whole to determine whether he/she is behaving normally • Approach - Mine the frequent patterns from command data - Form the normal usage profile of the user - Analyze a login session by comparing its similarity to the profile

User Descriptions User Anomaly Description

Strengths • Paper very well written • Exhaustive experimentation with real world data • Developed a simple, intuitive yet powerful method for feature construction • First attempt to incorporate data mining algorithms in IDS • Experimented with both misuse detection and user anomaly detection • Models performed better than the systems built with knowledge engineering approaches • Critique their own work

Weaknesses • Results show that the tools are not effective for attacks having large variance in behavior (like Do. S and R 2 L) • Results depend on quality and quantity of training data – may lead to overtraining • Network anomaly detection not implemented to detect new attacks • Computationally expensive

Future Improvements • Develop algorithms to learn network anomaly detection models • ID models should be sensitive to cost factors like development cost, operational cost, i. e, needed resources, cost of damages of an intrusion, cost of detecting and responding to potential intrusion • Algorithms should incorporate user-defined factors and policies to compute cost-sensitive ID models