Скачать презентацию Data Mining for Intrusion Detection A Critical Review Скачать презентацию Data Mining for Intrusion Detection A Critical Review

e5abf55fc71f573cad79e6fff96b8fb0.ppt

  • Количество слайдов: 8

Data Mining for Intrusion Detection: A Critical Review Klaus Julisch From: Applications of data Data Mining for Intrusion Detection: A Critical Review Klaus Julisch From: Applications of data Mining in Computer Security (Eds. D. Barabara and S. Jajodia)

Knowledge Discovery from databases (KDD) • Five steps – (1) Understanding the application domain Knowledge Discovery from databases (KDD) • Five steps – (1) Understanding the application domain – (2) Data integration and selection – (3) Data mining – (4) Pattern evaluation – (5) Knowledge representation

Data Mining Meets Intrusion Detection • IDS: Misuse detection and anomaly detection – Misuse Data Mining Meets Intrusion Detection • IDS: Misuse detection and anomaly detection – Misuse detection: Requires a collection of known attacks – Anomaly detection: Requires user or system profile • IDS: Host-based and network-based IDS – Host-based: Analyze host-bound audit sources such as audit trails, system logs, or application logs. – Network-based: Analyze packets captured on a network • MADAMID: Mining Audit Data for Automated Models for Intrusion Detection---At Columbia University---Learn classifiers that distinguish between intrusions and normal activities – (i) Training connection records are partitioned into---normal connection records and intrusion connection records – (ii) Frequent episode rules are mined separately for the two categories of training data---form intrusion-only patterns – (iii) Intrusion-only patterns are used to derive additional attributes---indicative of intrusive behavior – (iv) Initial training records are augmented with the new attributes – (v) A classifier is learnt that distinguishes normal records from intrusion records---the misuse IDS – the classifier ---is the end product of MADAMID

ADAM (Audit Data Analysis and Mining) • Network-based anomaly detection system • Learns normal ADAM (Audit Data Analysis and Mining) • Network-based anomaly detection system • Learns normal network behavior from attack-free training data and represents it as a set of association rules---the profile • At runtime, the records of the past δ seconds are continuously mined for new association rules that are not contained in the profile---which are sent to a classifier which separates false positives from true positives • Its association rules are of the form: ∏ Ai = vi – Each association rule must have the source host and destination port among the attributes – Multi-level association rules have been introduced to capture coordinated and distributed attacks

Clustering of Unlabeled ID Data • Main focus: Training anomaly detection systems over noisy Clustering of Unlabeled ID Data • Main focus: Training anomaly detection systems over noisy data – Number of normal elements in the training data is assumed to be significantly larger than the number of anomalous elements – Anomalous elements are assumed to be qualitatively different from normal ones – Thus, anomalies appear as outliers standing out from normal data---thus explicit modeling of outliers results in anomaly detection • • • Use of clustering--- all normal data may cluster into similar groups and all intrusive into the others---intrusive ones will be in small clusters since they are rare Real-time data is compared with the clusters to determine a classification Network-based anomaly detection has been built In addition to the intrinsic attributes (e. g. , source host, destination host, start time, etc. ), connection records also include derived attributes such as the #of failed login attempts, the #of file-creation operations as well as various counts and averages over temporally adjacent connection records Euclidean distance is used to determine similarity between connection records

Mining the Alarm Stream • Applying data mining to alarms triggered by IDS – Mining the Alarm Stream • Applying data mining to alarms triggered by IDS – (i) Model the normal alarm stream so as to henceforth raise the severity of “abnormal alarms” – (ii) Extract predominant alarm patterns---which a human expert can understand act upon---e, g. , write filters or patch a weak IDS signature • Manganaris et al: – Models alarms as tuples (t, A)---t timestamp and A is an alarm type – All other attributes of an alarm are ignored – The profile of normal alarm behavior is learned as: • Time-ordered alarm stream is partitioned into bursts • Association rules are mined from the bursts • This results in profile of normal alarms – At run time various tests are carried out to test if an alarm burst is normal

 • Clifton and Gengo; Julisch: – Mine historical alarm logs to find new • Clifton and Gengo; Julisch: – Mine historical alarm logs to find new knowledge---to reduce the future alarm load---e. g. , to write filtering rules to discard false positives • Tools: Frequent episode rules • Attribute-oriented induction – Repeated replacing attributes by more abstract values » E. g. , IP addresses to networks, timestamps to weekdays, and ports to port ranges; the hierarchies are provided by user – Generalization helps previously distinct alarms getting merged into a few classes---huge alarm logs are condensed into short and comprehensible summaries---reduces the alarm load by 80%

 • Isolated application of data mining techniques can be a dangerous activity--leading to • Isolated application of data mining techniques can be a dangerous activity--leading to the discovery of meaningless or misleading patterns • Data mining without a proper understanding of the application domain should be avoided • Validation step is extremely important