Скачать презентацию Automatically Inferring Patterns of Resource Consumption in Network Скачать презентацию Automatically Inferring Patterns of Resource Consumption in Network

a73189072d838b52eacb78dbee25dcdc.ppt

  • Количество слайдов: 27

Automatically Inferring Patterns of Resource Consumption in Network Traffic In Proceedings of SIGCOMM 2003 Automatically Inferring Patterns of Resource Consumption in Network Traffic In Proceedings of SIGCOMM 2003 Reviewed By Michael Attig http: //www. acm. org/sigs/sigcomm 2003/papers. html#p 137 -estan Authors: Cristian Estan, Stefan Savage, George Varghese Note: Portions of this presentation are from the author’s SIGCOMM presentation slides available at: http: //www. cs. ucsd. edu/~cestan/ Attig 1

Organization • Motivation • Traffic Clusters • Algorithms – Unidimensional – Multidimensional • Auto. Organization • Motivation • Traffic Clusters • Algorithms – Unidimensional – Multidimensional • Auto. Focus • Experimental Results • Conclusions Attig 2

Motivation • New applications new traffic patterns • Network Admins need to know how Motivation • New applications new traffic patterns • Network Admins need to know how being used – provide better service – Reconfigure network elements to recognize traffic model • Difficult to know how being used, must infer characteristics • Currently use pre-defined list of patterns to identify – This paper focuses on the standard 5 -tuple – Allows comparison to Cisco’s Flow. Scan Attig 3

Goals • Reduce the amount of detail in reports – Easier to read and Goals • Reduce the amount of detail in reports – Easier to read and act upon • Provide multiple dimensions to report / aggregate on – 1 dimension lose info – Example: If aggregate on protocol or src port fields, may conclude p 2 p in wide use, when only small set of hosts responsible • Need for Automation - detect/contain Worms/Do. S – Network pushback needs automated technique to distinguish malicious flows Attig 4

How to report? • “top ten” – Could lose important info • Aggregate individual How to report? • “top ten” – Could lose important info • Aggregate individual flows into common category (single dimension) – Choose wrong dimension lose important characteristics • Aggregate on multiple fields (multiple dimensions) – Better • Dynamically define traffic clusters to report Attig 5

Traffic Clusters • Traffic clusters are the multidimensional traffic aggregates identified by reports – Traffic Clusters • Traffic clusters are the multidimensional traffic aggregates identified by reports – Dimensionality – Detail – Utility • A cluster is defined by a range for each field • The ranges are from natural hierarchies (e. g. IP prefix hierarchy) – meaningful aggregates • Example – Traffic aggregate: incoming web traffic for a subnet – Traffic cluster: ( Src. IP=*, Dest. IP in 128. 252. 153. 0/24, Proto=TCP, Src. Port=80, Dest. Port=high ) Attig 6

Traffic Reports • A list of clusters to be presented • Restrict what to Traffic Reports • A list of clusters to be presented • Restrict what to report by volume • # bytes • # packets • Could restrict based on other criteria, they focus on volume Attig 7

What Must be done to report? • Compute – Identify clusters with a traffic What Must be done to report? • Compute – Identify clusters with a traffic volume above threshold, H • Compress – Remove cluster C if can infer it’s traffic from C’ • Compare – show traffic changes over time • Prioritize – Sort in terms of potential level of interest – Use unexpectedness metric Attig 8

Algorithms • Core of Auto. Focus tool • k = number of fields • Algorithms • Core of Auto. Focus tool • k = number of fields • Sets for each field form a natural hierarchy – parent always smallest superset of children – Leaves are individual values the field had – Root is set of all possible values (*) • di = depth of hierarchy of the i-th of the k fields • s = T/H, where T = total traffic, H = threshold – Think of this as the number of reports you can have • n = number flow records of input • m = number distinct values of the fields in the n flow records Attig 9

Data Source • Use simplified Net. Flow records as raw data • Has key Data Source • Use simplified Net. Flow records as raw data • Has key that specifies exact values for all 5 fields • Has two counters – One to count packets that matched key during measurement interval – One to count number of bytes in those packets Attig 10

Unidimensional Compute • Clusters can overlap • Bad Can have up to 1+20*25 = Unidimensional Compute • Clusters can overlap • Bad Can have up to 1+20*25 = 501 reports – H = 5% s = 20, hierarchy size is 25+1 • Only consider prefixes of size 32 down to 8 • Number of reports of clusters above H at most 1 + (d-1)s • Two approaches: – When number of sets in hierarchy small, apply a brute force approach: keep count for each set and traverse all n flow records – At end, list clusters who exceed H Attig 11

Unidimensional Compute – Approach 2 • As go through flow records: – Build leaf Unidimensional Compute – Approach 2 • As go through flow records: – Build leaf nodes that correspond to IP addresses that actually appear in trace – Do trace updating counter of all leaf nodes • In second pass (post-order), determine which clusters over threshold H – Add traffic to that of its parent • Memory required is O(md) • Running time O(n + md) Attig 12

Unidimensional report example Threshold=100 Hierarchy 10. 0/28 500 10. 0/29 120 10. 0. 0. Unidimensional report example Threshold=100 Hierarchy 10. 0/28 500 10. 0/29 120 10. 0. 0. 8/29 380 10. 0/30 50 10. 0. 0. 4/30 70 10. 0. 0. 8/30 305 75 10. 0. 0. 12/30 10. 0. 0. 2/31 50 10. 0. 0. 4/31 70 10. 0. 0. 8/31 270 10. 0. 0. 1 35 0/31 75 10. 0. 0. 14/31 15 10. 0. 0. 2 Attig 35 30 40 10. 0. 0. 3 10. 0. 0. 4 10. 0. 0. 5 160 110 10. 0. 0. 8 10. 0. 0. 9 35 75 10. 0. 0. 10 10. 0. 0. 14 13

Unidimensional Compress • Reduce redundant info – Example: a /30 network may have bit Unidimensional Compress • Reduce redundant info – Example: a /30 network may have bit more traffic than /31 it covers, but not much • Reporting the /30 adds nothing to the report • Define compression threshold C as amount by which a cluster can be off (used C=H) • Trade-off: accuracy versus reduced size • The number of clusters above threshold in a non-redundant compressed report is at most s. • Brings reports of previous example down to 20 instead of 501 Attig 14

Unidimensional report example Compression 10. 0/29 120 Attig 10. 0/28 500 Source Traff IP Unidimensional report example Compression 10. 0/29 120 Attig 10. 0/28 500 Source Traff IP ic 10. 0. 0. 8/29 380 -270≥ 100 10. 0/ 120 29 10. 0. 0. 8/30 305 -270<100 10. 0. 0. 8/ 380 29 10. 0. 0. 8/31 270 10. 0. 0. 160 8 160 110 10. 0. 0. 8 10. 0. 0. 9 9 15

Unidimensional Compare • How has structure of traffic changed? • Absolute change – Number Unidimensional Compare • How has structure of traffic changed? • Absolute change – Number bytes or packets by which clusters increase/decrease – Only if measurement interval constant • Relative change – Comparison of traffic mixes over different measurement intervals – Must normalize for comparison • Do a pass over the trace to compute the exact traffic in both intervals • If estimate lower or larger by H than actual change, report - a compression step – Estimate based on reports already added to report Attig 17

Unidimensional Prioritize • Unexpectedness – deviation from an uniform model – Example • 50% Unidimensional Prioritize • Unexpectedness – deviation from an uniform model – Example • 50% of all traffic is web • Node B receives 20% of all traffic • The web traffic received by node B is 15% instead of 50%*20%=10% • unexpectedness label is 15%/10%=150% – This is used to flag interesting data in a report Attig 18

Multidimensional Structure Example Nodes (clusters) have multiple parents Source net Nodes (clusters) overlap Application Multidimensional Structure Example Nodes (clusters) have multiple parents Source net Nodes (clusters) overlap Application All traffic US EU CA CA NY GB DE Web Mail US Web Attig 19

Another way to understand multi-dim. case decreasing x-specificity x hierarchy decreasing y-specificity y hierarchy Another way to understand multi-dim. case decreasing x-specificity x hierarchy decreasing y-specificity y hierarchy “leaf”nodes Attig “root”node 20

Pruning the hierarchy significant 1 d nodes potentially significant 2 d nodes y hierarchy Pruning the hierarchy significant 1 d nodes potentially significant 2 d nodes y hierarchy x hierarchy Attig 21

Multidimensional Compute • For each cluster, examine n flows and report those above threshold Multidimensional Compute • For each cluster, examine n flows and report those above threshold • Can’t do brute force – ~ n P di clusters – Evaluate all clusters generated by n flows? Too many! • Restrict search based on optimizations that prune search space – Consider only if all unidimensional ancestors above H • Solve k unidimensional problems – Combine clusters Attig 22

Greedy compression algorithm • Add estimate counters of children along all dimensions • Set Greedy compression algorithm • Add estimate counters of children along all dimensions • Set this node’s estimate to largest of sums • If above threshold, report Attig 23

System: Auto. Focus Cluster miner names Grapher Traffic parser Web based GUI categories Packet System: Auto. Focus Cluster miner names Grapher Traffic parser Web based GUI categories Packet header trace Attig 24

Attig 25 Attig 25

Experience • Small network exchange point – SD-NAP • Slammer Worm • Worm traffic Experience • Small network exchange point – SD-NAP • Slammer Worm • Worm traffic separated (black) Uncharacterized spike in “other” category indicates something fishy Attig 26

Analysis of unusual events • UCSD to UCLA route change • Sapphire/SQL Slammer worm Analysis of unusual events • UCSD to UCLA route change • Sapphire/SQL Slammer worm Site 2 Attig 27

Conclusions • Multidimensional traffic clusters using natural hierarchies describe traffic aggregates • Traffic reports Conclusions • Multidimensional traffic clusters using natural hierarchies describe traffic aggregates • Traffic reports using thresholding identify automatically conspicuous resource consumption at the right granularity • Compression produces compact traffic reports and unexpectedness labels highlight non-obvious aggregates • Their prototype system, Auto. Focus, provides insights into the structure of regular traffic and unexpected events Attig 28