
43198afc7a7f835a2114f5397e7887db.ppt
- Количество слайдов: 51
Network Security Monitoring and Analysis based on Big Data Technologies Bingdong Li August 26, 2013
Outline v. Motivation v. Objectives v. System Design v. Monitoring and Visualization v. Network Measurement v. Classification and Identification of Network Objects v. Conclusion v. Future Work 1
Motivation § Traditional security systems assume a static system § Network attacks – sophisticated – organized – targeted – persistent – dynamic – external – internal 3
Motivation § Problem: Network Security is becoming more challenging § Resource: A Large Amount of Security Data – – – Network flow Firewall log Application log Server log SNMP § Opportunity: Big Data Technologies, Machine Learning 6
Objectives § A network security monitor and analysis system based on Big Data technologies to – Measures the network – Real time continuous monitoring and interactive visualization – Intelligent network object classification and identification based on role behavior as context 7
Objectives Network Security Big Data Machine Learning 8
10
System Design § Data Collection 11
System Design § Online Real Time Process 12
System Design § No. SQL Storage 13
System Design § User Interfaces 14
15
System Design § The Design supports features: – Real Time Continuous Monitoring and Interactive Visualization – Network Measurement – Classification and Identification of Network Objects 16
Monitoring and Visualization § Real Time response within a time constraint § Interactive involve user interaction § Continuously “continue to be effective overtime in light of the inevitable changes that occur” (NIST) 17
Monitoring and Visualization § Retrieve Data § Web User Interfaces § Video Demo 18
Monitoring and Visualization § Data Retrieving: Data are stored with IP as primary key and time slice as the secondary key in column Accessing these data is in ϑ (1) 19
Real Time Querying 20
Host Network Connection 21
Network Status 22
Top N 23
Demo of Interactivity and Continuity Video Demo 24
Network Measurement § A case study The Anonymity Technology Usage on Campus Network Using s. Flow – Geo-Location – Usage of Anonymity Systems 25
Geo-location of Anonymity Usage on Campus One Instance: Bahamas, Belarus, Belgium, Bulgaria, Cambodia, Chile, Colombia, Estonia, Ghana, Greece, Hungary, Ireland, Israel, Jamaica, Jordan, Korea, Mongolia, Namibia, Nigeria, Pakistan, Panama, Philippines, Slovakia, Turkey, Ukraine, Vietnam, Zimbabwe Two Instances: Chad, Chezch. Rep, Denmark, Hongkong, Iran, Japan, Kazakhistan, Poland, Romania, Spain, Switzerland Three Instances: Austria, France, Singapore Four Instances: Australia, Indonesia, Taiwan, Thailand 26
Usage of Anonymity Systems Packets (%) Traffic (MB %) Observed IPs (%) Proxies 5, 580 (62. 65) 8. 13 (43. 53) 234 (3. 23) Tor 3, 129 (35. 13) 9. 04 (48. 37) 152 (0. 25) I 2 P 190 (2. 13) 1. 50 (8. 02) 23 (1. 01) 7 (0. 08) 0. 016 (0. 08) 2 (N/A) 8, 906 (100) 16. 69 (100) 411 (N/A) Commercial Total 27
Classification of Host Roles Data: Three months s. Flow data from a large campus Role Client Server Public Place Personal Office College 1 College 2 Web Server Web Email Server Count 5494 1920 784 416 163 253 56 25 28
Classification of Host Roles § Algorithms v. Decision Tree v. On-line SVM 29
Classification of Host Roles § Features v. Ad hoc based on domain knowledge v. Aggregating features for on-line classification v 24 features normalized between 0 and 1, inclusive 30
Classification of Host Roles § Features 24 features derived from vsrc/dest IP address vsrc/dest Port number v. TTL v. Package Size v. Transport protocol 31
Classification of Host Roles § Ground Truth v. Host Information in Active Directory v. Crawler to validate its status 32
Classification of Host Roles § Classifying Client vs. Server § Classifying Web Server vs. Web Email Server § Classifying Hosts at Personal Office vs. Public Place § Classifying Hosts at Two Different Colleges § Feature Contributions 33
Classifying Client vs. Server 34
Classifying Web Server vs. Web Email Server 35
Classifying Host From Personal Office vs. Public Place 36
Classifying Host From Two Different Colleges 37
Accuracy § High accuracies of Host Role Classification Accuracy (%) Clients vs. Server 99. 2 Regular web server vs. Web email server 100 Hosts from personal office vs. public places 93. 3 Host from two different colleges 93. 3 38
Feature Contribution 39
Identification of a User Data: Net. Flow data from a large campus College 1 College 2 Count 163 253 40
Identification of a User § Algorithms v. Decision Tree v. On-line SVM § Ground Truth v. Host Information in Active Directory v. Crawler to validate its status 41
Identification of a User § Features Discrete probability distribution function (pdf) An Example: System Port Number [6, 8, 9, 11, 14, 30, 80, 1020] – Outliner (P) is 1%, – 80 is the interested port (S) – Number of bin 4 ( R ) 42
Identification of a User § An Example (1 -0. 01) * 8 to 7, the 7 th is 80, bin slice size = 80 / (4 -1) = 26. 6 [6, 8, 9, 11, 14, 30, 80, 1020] pdf = 0. 625 6, 8, 9, 11, 14 0. 125 30 0. 125 80 0. 125 1020 43
Identification of a User § An Example without P and S Bin size slice is 1024/4 = 256, [6, 8, 9, 11, 14, 30, 80, 1020] pdf = 0. 875 6, 8, 9, 11, 14, 30, 80 0 0 0. 125 1020 44
Identify a User Among Other Users 45
Accuracy § Identifying a particular user among other users Decision Tree 93. 3% On-line Support Vector Machine 78. 5% 46
Feature Contribution 47
Conclusion § Major Contributions – A Big Data analysis system • a conference paper – Monitoring and interactive visualization – Usage of anonymity technologies • a conference and a journal paper – Models of classification of host roles and identification and users • a conference paper 48
Conclusion § The Big Data analysis system is high performance and scalable § Real Time Continuous Network Monitoring and Interactive Visualization are implemented and supported by the high performance system 49
Conclusion § Proxies and Tor are main anonymity technologies used on campus; – US, Germany, and China are the top 3 countries § Models and Features for Classification of Host roles: – client vs. server, non-web server vs. web server, personal office vs. public office, from two different colleges § Models of Features for Identification of a particular user among other users 50
Future Work § Improvement to the Current Work – More interactive features and better user interfaces – Further analysis on user identification: features, algorithm (such as deep learning) 51
Future Work § Extension to the Current Work – Define and filter out background traffic – Detection of operating system fingerprinting – Identity anonymity – Fusion with other network security data source 52
Future Work § Vision To Provide network security as a service for individuals, small businesses, or government offices 53
43198afc7a7f835a2114f5397e7887db.ppt