Networking Research Uof C Carey Williamson i CORE

Скачать презентацию Networking Research Uof C Carey Williamson i CORE

4edb84b0a95687d3d6eaf988a496321d.ppt

Количество слайдов: 12

Networking Research (Uof. C) Carey Williamson i. CORE Chair and NSERC/i. CORE/TELUS Mobility Industrial Research Chair Department of Computer Science University of Calgary 1

Research Team n Faculty: n Research Staff: n Students: Majid Ghaderi, Zongpeng Li, Mea Wang, Carey Williamson Martin Arlitt, Niklas Carlsson, Ahmed Obied, Terence Robinson, Hongxia Sun Ali Abedi, Ali Dabirmoghaddam, Mostafa Dehghan, Marian Doerk, Mingwei Gong, Ajay Gopinathan, Emir Halepovic, Islam Hegazy, Andreas Hirt, Ali Hosseini, Ibrahim Ismail, Rohit Joshi, Aniket Mahanti, Shreya Maheshwar, Nadim Parvez, Tuan Vu, Song Zhang, . . . 2

Research Overview n Research area? n n Wireless networks, Internet protocols, computer systems performance evaluation Mission: “Make the Internet go faster” Approach? n Experimental, simulation, analytical Key challenges? n n Citius, Altius, Fortius! Performance, scalability, robustness 3

Experimental Facilities n Wireless Internet Performance Lab (Uof. C) n n IEEE 802. 11 b wireless LAN Sniffer. Pro, Airopeek wireless network analyzers PCs, laptops, PDAs, APs, wireless NICs, sensors Experimental Laboratory for Internet Systems and Applications (Uof. C/Uof. S, CFI) n n n Geographically distributed Internet testbed between Calgary and Saskatoon Clients, servers, notebooks, routers, switches, Web proxies, network analyzers, 802. 11 a/b Fully operational since Spring 2004 4

Research Highlights n Network Traffic Measurements n n Internet Traffic Classification n n Jeff Erman, Anirban Mahanti, et al. Wireless LAN Traffic Measurements n n Martin Arlitt, et al. Aniket Mahanti, Martin Arlitt, et al. Cellular Network Capacity Planning n Yujing Wu, Jingxiang Luo, Hongxia Sun 5

Network Traffic Measurement n n n Collect and analyze packet-level traces from a live network, using special equipment Process traces, statistical analysis Diagnose performance problems (network, protocol, application) 101 6

Network Traffic Measurement n n n Continuous monitoring of U of C traffic on commercial Internet link (100 Mbps), recording TCP SYN/FIN/RST pkt headers 36 months of data and counting… Specific measurement studies to date: n n TCP reset behaviour (Arlitt) P 2 P traffic evolution (Madhukar) Internet traffic classification (Erman) Malicious network attacks (Obied) 7

TCP and HTTP Results 8

Semi-Supervised Network Traffic Classification Jeffrey Erman¤, Anirban Mahanti§, Martin Arlitt¤ж, Ira Cohenж, Carey Williamson¤ ¤Department of Computer Science, University of Calgary § Department of Computer Science and Engineering, Indian Institute of Technology (Delhi) ж. Enterprise Systems & Software Labs, HP Labs Semi-Supervised Results Introduction Identifying and categorizing network traffic by application type is challenging because of the continued evolution of applications, especially of those with a desire to be undetectable. The diminished effectiveness of port-based identification and the overheads of deep packet inspection approaches motivate us to propose a traffic classification methodology that relies on using only flow statistics to classify Campus traffic. Router Retraining Detection Although we found that our classifiers remained robust for extended periods of time, a mechanism for determining when the classifier needs updating is still required. Labelling of training feature vectors is one of the most time consuming steps of the classification process. Web Streaming Figure 2: Training with (Un)labelled Flows P 2 P Figure 1: Selective Labelling of Flows U of Calgary Internet Our proposed technique is a flexible mathematical framework that leverages both labeled and unlabeled flows. This semisupervised approach to learning a network traffic classifier is a key contribution of this work. Classification Framework Unlabelled Training Data Labelled Clusters Clustering Classifier Algorithm Labelled Training Data Classified Flows Unclassified Flows In Figure 1 we test the hypothesis that if a few flows are labelled in each cluster then we have a reasonable basis for creating the cluster to application mapping. With as few as two labels per cluster, we attain 94% flow Real-Time Classification accuracy. The results in Figure 2 show the effect on the classifier’s precision when we used a fixed number of labelled flows and a varying numbers of unlabelled flows in the training data set. Our results show that for a fixed number of labelled training flows, increasing the number of unlabelled flows increases the classifier’s precision. A fundamental challenge in the design of the real-time classification system is the need to classify a flow as soon as possible. Unlike offline classification where all discriminating flow statistics are available a priori, in the real-time context we only have partial information on the flow statistics. Step 2: Classification A clustering algorithm partitions the training flows into disjoint groups called clusters based on similarity. The advantages are: q Classifier assigns each new q Builds natural clusters. becomes the classification of the flow. q The number of training flows needed is small (e. g. , 8000) Training Data: Training data can be a mix of labelled and unlabelled flows. Features include: Average Packet Size, Number of Packets, Payload Bytes, Header Bytes, etc. unclassified flow to the nearest cluster using Euclidean distance. This is the maximum likelihood cluster assignment. q Label of the assigned cluster q A cluster label is obtained using the labelled flows available in each cluster. These can be obtained through a variety of means: (automated) payload analysis, port numbers, expert knowledge. q Clusters with no labels can be left as unknown. q total number of packets a flow has sent or received reaches a specific value. § Each layer has an independent classifier. § Flow statistics are monitored in real-time. § As a flow reaches a packet milestone it is classified/reclassified by the appropriate layer. This layered approach allows us to revise and potentially improve the classification of flows. We propose using the average distance of new flows to the centroid of the nearest cluster; a significant increase in the average distance indicates the need for an update. Conclusions q Fast and accurate classifiers can be obtained by training with a small number of labelled flows mixed with a large number of unlabelled flows. q High flow and byte accuracy can be achieved for classification offline and real-time q Robust classifiers can be built that are Our solution uses a layered classification system based on the idea of packet milestones. § A packet milestone is reached when the count of the Step 1: Model Building Figure 5: Correlation Between Average Distance and Flow Accuracy immune to transient changes in network conditions. Figure 3: Performance of Real-time Classifier References q Our approach can be integrated with solutions that collect flow statistics. We developed a prototype real-time classifier [2] J. Erman, A. Mahanti, M. Arlitt, I. Cohen, and C. Williamson. Offline/Online using Bro [4]. Traffic Classification Using Semi-Supervised Learning. To Appear in Proc. of [1] O. Chapelle, B. Scholkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, Cambridge, MA, 2006. IFIP Performance 2007 [3] J. Erman, A. Mahanti, M. Arlitt, and C. Williamson. Identifying and Discriminating Between Web and Peer-to-Peer Traffic in the Network Core. In WWW’ 07, Banff, Canada, May 2007. Typical byte accuracies in the 70% to 90% range. Figures 3 & 4 present example results by using the April 13, Figure 4: Byte Accuracy of Real-time 9 am trace we collected from the Uof. C. We see that the Classifier classier performs well, with byte accuracies typically in the 70% 90% range. Fullto. Paper available at: http: //pages. cpsc. ucalgary. ca/~erman/ [4] V. Paxson. Bro: A System for Detecting Nework Intruders in Real-time. Computer Networks, 31(23 -24): 2435 -2463, 1999. Acknowledgements This work was supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada and Informatics Circle of Research Excellence (i. CORE) of the province of Alberta, Canada. 9

Wireless-side Trace Collection n n RFGrabbers were configured to scan channels 1, 6, and 11 to capture Air. UC WLAN traffic in the `b/g’ mode. Over 6 weeks, RFGrabbers captured packets from 97 APs at 9 locations, representing 20% of the Uof. C WLAN. 10

CDMA 2000 EV-DO Downlink current feasible rate: r( i ) PF scheduler C MS forward link data flow 2 data flow n MS schedule queue i at slot t Index of the scheduled queue at slot t . . . A data flow 1 . . . flow arrivals C(t) TDM 1. 25 ms frame maximum feasible rate of queue j at slot t MS Propagation loss, shadowing, fast fading realized throughput of queue j up to slot t 11

Future Plans n n n More of the same! P 2 P systems modeling and analysis Wireless Internet measurement/modeling n n n Wi. Max (IEEE 802. 16) Qo. S in CDMA 2000 EV-DO Wireless mesh networks? Sensor networks? Grid computing? Internet security? 12