
8bdcb0dda044c8e2853a3a83afedff85.ppt
- Количество слайдов: 29
Internet Measurement Jennifer Rexford
Outline • Measurement overview – Why measure? Why model measurements? – What to measure? Where to measure? • Internet challenges • Measurement tools – Active: ping, traceroute, and pathchar – Passive: logs, SNMP, packet, and flow monitoring • Operational applications of measurement • Discussion
Why Measure? • The Internet is a man-made system, so why do we need to measure it? – Because we still don’t really understand it – Because sometimes things go wrong • Measurement for network operations – Detecting and diagnosing problems – What-if analysis of future changes • Measurement for scientific discovery – Characterizing a complex system as organism – Creating accurate models that represent reality – Identifying new features and phenomena
Why Build Models of Measurements? • Compact summary of measurements – Efficient way to represent a large data set – E. g. , exponential distribution with mean 100 sec • Expose important properties of measurements – Reveals underlying cause or engineering question – E. g. , mean RTT to help explain TCP throughout • Generate random but realistic data as input – Generate new data that agree in key properties – E. g. , topology models to feed into simulators “All models are wrong, but some models are useful. ” – George Box
What Can be Measured? • Traffic – Load statistics – Packet or flow traces • Performance of paths – Application performance, e. g, . Web download time – Transport performance, e. g. , TCP bulk throughput – Network performance, e. g. , packet delay and loss • Network structure – Topology, and paths on the topology – Dynamics of the routing protocol
Where Measure? • Short answer – Anywhere you can! • End hosts – Application logs, e. g. , Web server logs – Sending active probes to measure performance • Individual links/routers – Load statistics, packet traces, flow traces – Configuration state – Routing-protocol messages or table dumps – Alarms
Internet Challenges Make Measurement an Art • Stateless routers – Routers do not routinely store packet/flow state – Measurement is an afterthought, adds overhead • IP narrow waist – IP measurements cannot see below network layer – E. g. , link-layer retransmission, tunnels, etc. • Violations of end-to-end argument – E. g. , firewalls, address translators, and proxies – Not directly visible, and may block measurements • Decentralized control – Autonomous Systems may block measurements – No global notion of time
Active Measurement: Ping • Adding traffic for purposes of measurement – Trade-offs between accuracy and overhead – Need careful methods to avoid introducing bias • Ping – Host sends an ICMP ECHO packet to a target – … and captures the ICMP ECHO REPLY – Useful for checking connectivity, and RTT – Only requires control of one of the two end-points • Problems with ping – Round-trip rather than one-way delays – Some hosts might not respond
Active Measurement: Traceroute • Time-To-Live field in IP packet header – Source sends a packet with a TTL of n – Each router along the path decrements the TTL – “TTL exceeded” sent when TTL reaches 0 • Traceroute tool exploits this TTL behavior TTL=1 source TTL=2 Time exceeded destination Send packets with TTL=1, 2, 3, … and record source of “time exceeded” message
Active Measurement: Challenges of Traceroute • Measuring multiple paths – Successive probes may traverse different paths • Non-participating network elements – Some routers and firewalls don’t reply • Inaccurate delay information – Includes processing delays on the router CPU • Round-trip vs. one-way measurements – Paths may have asymmetric properties • Interfaces, not routers – Returns IP address of interfaces, not routers
Active Measurement: Applications of Traceroute • Network troubleshooting – Identify forwarding loops and black holes – Identify long and convoluted paths – See how far the probe packets get • Network topology inference – Launch traceroute probes from many places – … toward many destinations – Join together to fill in parts of the topology – … though traceroute undersamples the edges
Active Measurement: Pathchar for Links rtt(i+1) -rtt(i) Three delay components: min. RTT (L) How to infer d, c? slope=1/c d L
Passive Measurement: Logs at Hosts • Web server logs – Host, time, URL, response code, content length, … – E. g. , 122. 345. 131. 2 - - [15/Oct/1998: 00: 25 0400] "GET /images/wwwtlogo. gif HTTP/1. 0" 304 - "http: //www. aflcio. org/home. htm" "Mozilla/2. 0 (compatible; MSIE 3. 02; Update a; AK; AOL 4. 0; Windows 95)" "-" • DNS logs – Request, response, time • Useful for workload characterization, troubleshooting, etc.
Passive Measurement: SNMP • Simple Network Management Protocol – Coarse-grained counters on the router – E. g. , byte and packet counts • Polling – Management system can poll the counters – E. g. , once every five minutes • Limitations – Extremely coarse-grained statistics – Delivered over UDP! • Advantages: ubiquitous
Passive Measurement: Packet Monitoring • Tapping a link Multicast switch Shared media (Ethernet, wireless) Host A Host B Host A Monitor Host B S w i t c h Host C Monitor Splitting a point-to-point link Router B Router A Monitor Line card that does packet sampling Router A
Packet Monitoring: Selecting the Traffic • Filter to focus on a subset of the packets – IP addresses/prefixes (e. g. , to/from specific Web sites, client machines, DNS servers, mail servers) – Protocol (e. g. , TCP, UDP, or ICMP) – Port numbers (e. g. , HTTP, DNS, BGP, Napster) • Collect first n bytes of packet (snap length) – Medium access control header (if present) – IP header (typically 20 bytes) – IP+UDP header (typically 28 bytes) – IP+TCP header (typically 40 bytes) – Application-layer message (entire packet)
Tcpdump Output (three-way TCP handshake and HTTP request message) timestamp Web server client address and port # (port 80) 23: 40: 21. 008043 eth 0 > 135. 207. 38. 125. 1043 > lovelace. acm. org. www: S 617756405: 617756405(0) win 32120
Analysis of Packet Traces • IP header – Traffic volume by IP addresses or protocol – Burstiness of the stream of packets – Packet properties (e. g. , sizes, out-of-order, etc. ) • TCP header – Traffic breakdown by application (e. g. , Web) – TCP congestion and flow control – Number of bytes and packets per session • Application header – URLs, HTTP headers (e. g. , cacheable response? ) – DNS queries and responses, user key strokes, …
Aggregating Packets into IP Flows flow 1 flow 2 flow 3 flow 4 • Set of packets that “belong together” – Source/destination IP addresses and port numbers – Same protocol, To. S bits, … – Same input/output interfaces at a router (if known) • Packets that are “close” together in time – Maximum spacing between packets (e. g. , 15 sec, 30 sec) – Example: flows 2 and 4 are different flows due to time
Packet vs. Flow Measurement • Basic statistics (available from both techniques) – Traffic mix by IP addresses, port numbers, and protocol – Average packet size • Traffic over time – Both: traffic volumes on a medium-to-large time scale – Packet: burstiness of the traffic on a small time scale • Statistics per TCP connection – Both: number of packets & bytes transferred over the link – Packet: frequency of lost or out-of-order packets, and the number of application-level bytes delivered • Per-packet info (available only from packet traces) – TCP seq/ack #s, receiver window, per-packet flags, … – Probability distribution of packet sizes – Application-level header and body (full packet contents)
Measurement Challenges for Operators • Network-wide view – Crucial for evaluating control actions – Multiple kinds of data from multiple locations • Large scale – Large number of high-speed links and routers – Large volume of measurement data • Poor state-of-the-art – Working within existing protocols and products – Technology not designed with measurement in mind • The “do no harm” principle – Don’t degrade router performance – Don’t require disabling key router features – Don’t overload the network with measurement data
Network Operations Tasks • Reporting of network-wide statistics – Generating basic information about usage and reliability • Performance/reliability troubleshooting – Detecting and diagnosing anomalous events • Security – Detecting, diagnosing, and blocking security problems • Traffic engineering – Adjusting network configuration to the prevailing traffic • Capacity planning – Deciding where and when to install new equipment
Basic Reporting • Producing basic statistics about the network – For business purposes, network planning, ad hoc studies • Examples – – – Proportion of transit vs. customer-customer traffic Total volume of traffic sent to/from each private peer Mixture of traffic by application (Web, Napster, etc. ) Mixture of traffic to/from individual customers Usage, loss, and reliability trends for each link • Requirements – Network-wide view of basic traffic and reliability statistics – Ability to “slice and dice” measurements in different ways (e. g. , by application, by customer, by peer, by link type)
Troubleshooting • Detecting and diagnosing problems – Recognizing and explaining anomalous events • Examples – – – Why Why Why a backbone link is suddenly overloaded the route to a destination prefix is flapping DNS queries are failing with high probability a route processor has high CPU utilization a customer cannot reach certain Web sites • Requirements – Network-wide view of many protocols and systems – Diverse measurements at different protocol levels – Thresholds for isolating significant phenomena
Security • Detecting and diagnosing problems – Recognizing suspicious traffic or disruptions • Examples – Denial-of-service attack on a customer or service – Spread of a worm or virus through the network – Route hijack of an address block by adversary • Requirements – Detailed measurements from multiple places – Including deep-packet inspection, in some cases – Online analysis of the data – Installing filters to block the offending traffic
Traffic Engineering • Adjusting resource allocation policies – Path selection, buffer management, and link scheduling • Examples – OSPF weights to divert traffic from congested links – BGP policies to balance load on peering links – Link-scheduling weights to reduce delay for “gold” traffic • Requirements – Network-wide view of the traffic carried in the backbone – Timely view of the network topology and configuration – Accurate models to predict impact of control operations (e. g. , the impact of RED parameters on TCP throughput)
Capacity Planning • Deciding whether to buy/install new equipment – What? Where? When? • Examples – – – Where to put the next backbone router When to upgrade a link to higher capacity Whether to add/remove a particular peer Whether the network can accommodate a new customer Whether to install a caching proxy for cable modems • Requirements – Projections of future traffic patterns from measurements – Cost estimates for buying/deploying the new equipment – Model of the potential impact of the change (e. g. , latency reduction and bandwidth savings from a caching proxy)
Examples of Public Data Sets • Network-wide data – Abilene and GEANT backbones – Netflow, IGP, and BGP traces • CAIDA Dat. Cat – Data catalogue maintained by CAIDA – http: //imdc. datcat. org/ • Interdomain routing – Route. Views and RIPE-NCC – BGP routing tables and update messages • Traceroute and looking glass servers – http: //www. traceroute. org/ – http: //www. nanog. org/lookingglass. html
Discussion • How important is accuracy of the data? • How can we validate measurement studies? (If we know the answer already, why are we measuring? ) • How to do controlled experiments with measurement techniques? • Can we move measurement to a science rather than an art? • Can we identify incentives for making measurement possible and data available? • Distributed analysis of measurement data? • An architecture for router or line-card support for traffic and performance measurement? • Trade-offs between security and privacy?