0fcd36cd6b0d99d4e689f88a048639cd.ppt
- Количество слайдов: 27
SLAC IEPM Ping. ER and BW monitoring & tools Presented by Les Cottrell, SLAC At LBNL, Jan 21, 2003 www. slac. stanford. edu/grp/scs/net/talk 03/lbl-jan 04. ppt 1 Ping. ER
History of the Ping. ER Project • Early 1990’s: SLAC begins pinging nodes around the world to evaluate the quality of Internet connectivity between SLAC and other HEP Institutions. • Around 1996: The Ping. ER project was funded making it the first Internet end-to-end monitoring tool available to the HEP community. • Today: Believed to be the most extensive Internet end-to-end performance monitoring tool in the world 2 Ping. ER
Ping. ER Today • Today, the Ping. ER Project includes 35 Monitoringhosts in 12 countries. They are monitoring Remotehosts in 80 countries. Over 55 remote sites. • THESE COUNTRIES COVER 75% OF THE WORLD POPULATION AND 99% OF THE INTERNET CONNECTED POPULATION!!! Just added Pakistan! Colored by region Colored countries have remote Ping. ER hosts 3 Ping. ER
Ping. ER Architecture There are three types of hosts • Remote-hosts: hosts being monitored Monitoring • Monitoring-hosts: Monitoring Make ping measurements to REMOTE remote hosts REMOTE • Archive/Analysis. REMOTE hosts: gather data from Monitoring-sites, analyze & make reports Archive Ping. ER Monitoring REMOTE 4
Methodology • Every 30 mins send 11*100 Byte followed by 10*1000 Byte pings from monitor to remote host • Low impact: – By default < 100 bits/s per monitor-remote host pair – Can reduce to ~ 10 bits/s – No need for co-scheduling of monitors • Uses ubiquitous ping – No software to install at any of over 500 remote hosts – Very important for hosts in developing countries • By centrally gathering the data, archiving, analyzing and reporting, the requirements for monitoring hosts are minimal (typically 1 -2 days to install etc. ) 5
Worldwide performance • Performance is improving • Developed world improving factor of 10 in 4 -5 years • S. E. Europe, Russia, catching up • India & Africa worse off & falling behind • Developing world 3 -10 years behind • Many institutes in developing world have less performance than a household in N. America or Europe 6
Current State – Aug ‘ 03 (throughput Mbps) Remote regions Monitoring Country • Within region performance better – E. g. Ca|EDU|GOV-NA, Hu-SE Eu, Eu-Eu, Jp-E Asia, Au-Au, Ru. Ru|Baltics • Africa, Caucasus, Central & S. Asia all bad Bad < 200 kbits/s < DSL Poor > 200 < 500 kbits/s Acceptable > 500 kbits/s, < 1000 kbits/s 7 Good > 1000 kbits/s
Network Readiness Index vs Throughput • NRI from Center for International Development, Harvard U. http: //www. cid. harvard. edu/cr/pdf/gitrr 2002_ch 02. pdf A&R focus Internet for all focus NRI Top 14 Finland 5. 92 US 5. 79 Singapore 5. 74 Sweden 5. 58 Iceland 5. 51 Canada 5. 44 UK 5. 35 Denmark 5. 33 Taiwan 5. 31 Germany 5. 29 Netherlands 5. 28 Israel 5. 22 Switzerland 5. 18 Korea 5. 10 • NRI correlates reasonably well with Network Readiness 8
Typical uses • Troubleshooting § Discerning if a reported problem is network related § Identify the time a problem started § Provide quantitative analysis for Network specialists § Identifying step functions, periodic network behavior, and recognize problems affecting multiple sites. § Setting expectations (e. g. SLAs) § Identifying need to upgrade § Providing quantitative information to Policy makers & Funding agencies § Seeing the effects of upgrades Ping. ER 9
Pakistan performance Routes: ESnet (hops 3 -8) - DC ATT (9 -21) - Karachi NIIT/Rawalpindi Islamabad Lahore Loss % RTT ms Routes: ESnet (hops 3 -6) - SNV SINGTEL (7 -12) - Karachi Pakistan Telecom Karachi Rawalpindi Routes: ESnet (hops 3 -6) - SNV SINGTEL (7 -12) - Karachi Pakistan Telecom Karachi 10 Lahore
NIIT performance from U. S. (SLAC) Preliminary results, started measurements end Dec 2003. Ping RTT & Loss Nb. Heavy losses during congested day -times Avg daily: loss~1 -2%, RTT~320 ms Bandwidth measurements using packet pair dispersion & TCP ABW (pkt-pair dispersion): Average To NIIT: ~350 Kbits/s From NIIT: 365 Kbits/s Iperf/TCP: Average: To NIIT: ~320 Kbits/s From: NIIT 40 Kbits/s Can also derive throughput (assuming standard TCP) from RTT & loss using: BW~1. 2*S(1460 B)/(RTT*sqrt(loss) ~ 260 Kbits/s Nominal path bottleneck capacity 1 Mbits/s 11
In Summary Ping. ER provides ongoing support for monitoring and maintaining the quality of Internet connectivity for the world wide scientific community. Information is available publicly on the web http: //www-iepm. slac. stanford. edu/cgi-wrap/pingtable. pl Ping. ER also quantifies the extent of the “Digital Divide” and provides information to policy makers and funding agencies. 12 Ping. ER
IEPM-BW • Need something for high-performance links – 10 pings/30 mins, i. e. min=0. 21% in day, or 0. 007% in month (10 -8 BER) – today’s better links exceed this – Ping losses may not be like TCP losses • Need for Grid, HENP applications and highperformance network connections – – Set expectations, planning Trouble-shooting, improving performance Application steering Testing new transports (e. g. FAST, HS-TCP, RBUDP, UDT), applications, monitoring tools (e. g. QIperf, packet-pair techniques …) in production environments – Compare with passive measurements, advertised capacities 13
Methodology • Monitoring host every 90 minutes (+- randomization) cycles through collaborating hosts at several remote sites: – Sends active probes in-turn for: bbftp, gridtcp, bbcp, iperf 1, iperf, (qiperf), ping, abwe … • Also measures traceroutes at 15 min intervals • Uses ssh for code deployment, management and to start & stop servers remotely – Deploy server code for iperf, ABw. E, bbftp, Grid. FTP & various utilities • 10 monitoring sites, each with between 2 and 40 remote hosts monitored – Main users SLAC (Ba. Bar) & FNAL (D 0, CDF, CMS) • Data archived, analyzed, displayed at monitoring hosts 14
Deployment 15 Monitor HENP Net research 100 Mbits/s host Gbits/s host 125 measured bw Aug ‘ 02
Visualization • Time series: – Overplot multiple metrics – + route changes – Zoom, history – Choose individual metrics Scatter plots Histograms Access to data 16
Traceroutes • Analyse for unique routes, assign route #s • Display route # at start, then “. ” if no change • If significant change, the display route # in red Host • Links to: – History – Reverse – Single host – Raw data – Summary for emailing – Available BW – Topology Demo Several routes changes simultaneously Hour of day 17
Topology • Select times & hosts & direction on table • Mouse_over to see router name • Click on router to see sub path below • Colored by deduced AS • Click on end nodes to see names of all hops 18
Performance (ABw. E) Current bottleneck capacity (Usually limited by 100 FE) Mbits/s • Requires ABw. E server (mirror) at remote sites • Gets performance for both directions • Low impact 40 * 1000 byte packets • Less than a second for result • Can do “real-time” performance monitoring Iperf (90 m) Available bandwidth Cross-traffic 24 hours 19
20
ABw. E/Iperf match: Hadrian to UFL Heavy load (xtraffic) appeared It shows new DBC on the path CALREN shows sending traffic 600 Mbits/s Normal situation IPLS shows traffic 800 -900 Mbits/s 21
Abing CLI • Demo abing command line tool – Since low impact (40*1000 packets) can run like ping 22
Navigation • Mon. ALISA 23
• For ABw. E: Prediction, trouble shooting • Working on auto detection of long term (many minutes) step changes in bandwidth – Developed simple algorithm and qualifying effectiveness – Looking at NLANR (Mc. Gregor/H -W Braun plateau change detector) • http: //www. ripe. net/pam 2001 /Abstracts/talk_03. html – Look at correlation between performance & route changes & RTT – For significant changes, gather: RTT, routes (fwd/rev, before & after if changed), NDT info, bandwidth info (fwd & rev) – Fold in diurnal changes – Generate real-time email alerts with filtering demo Diurnal Predictions 24
Program API • Not realistic to look at thousands of graphs • Programs also want to look at data. E. g. – Data placement for replica servers – Analysis, visualization (e. g. Mon. ALISA) – Trouble shooting • Correlate data from many sources when suspect/spot problem • Publish the data in standard way • W 3 C Web Service, GGF OGSI Grid Service – Currently XMLRPC and SOAP servers – Using Network Measurement Working Group schema ( NM-WG. xsd) • Demo mainly proof of principal, to access IEPM single & multistream iperf, multistream Grid. FTP & bbftp, ABw. E and Ping. ER data – Not pushing deployment and use until schema more solid 25
IEPM SOAP Client #!/usr/local/bin/perl -w use SOAP: : Lite; my $node = "node 1. cacr. caltech. edu"; my $time. Period="20031201 -20031205 T 143000"; my $measurement = SOAP: : Lite ->service('http: //www-iepm. slac. stanford. edu/tools/soap/wsdl/IEPM_profile. wsdl') ->Get. Bandwidth. Achievable. TCP("$node", "$time. Period"); print “Host=“. $measurement->{'subject'}->{'destination'}->{'name'}, "n"; print $measurement->{'subject'}->{'destination'}->{'address'}->{'IP'}, "n"; print “Times: n”. $measurement->{'path. bandwidth. achievable. TCP'} ->{'timestamp'}->{'start. Time'}, "n"; print “Values: n”. $measurement->{'path. bandwidth. achievable. TCP'} ->{'achievable. Throughput. Result'}->{'value'}, "n"; Host=node 1. cacr. caltech. edu Not-disclosed Times: 1070528106 1070533504 1070538907 1070544307 1070549706 1070555108 1070560505 107 0565907 1070571306 1070576706 1070582106 1070587506 1070592906 1070598310 107060 3706 1070609111 1070614506 1070619905 1070625306 1070630706 1070636106 107064150 8 1070646905 1070652306 1070657705 Values: 183. 5 174. 3 196. 76 188. 75 196. 67 196. 05 195. 86 187. 69 192. 91 152. 99 181. 85 193. 0 3 190. 21 190. 54 168. 71 166. 79 196. 17 172. 1 183. 77 194. 44 195. 84 194. 01 192. 49 17 1. 55 176. 43 Results For more see: http: //www-iepm. slac. stanford. edu/tools/web_services/ Demo: http: //www-iepm. slac. stanford. edu/tools/soap/IEPM_client. html 26
For More Information • Ping. ER: – www-iepm. slac. stanford. edu/pinger/ • ICFA/SCIC Network Monitoring report, Jan 04 – www. slac. stanford. edu/xorg/icfa-net-paper-jan 04. html • The Ping. ER Project: Active Internet Performance Monitoring for the HENP Community, IEEE Communications Magazine on Network Traffic Measurements and Experiments. • IEPM-BW – http: //www-iepm. slac. stanford. edu/bw/ • ABWE: www-iepm. slac. stanford. edu/bw/abwe-cf-iperf. html and http: //moat. nlanr. net/PAM 2003 papers/3781. pdf 27 Ping. ER
0fcd36cd6b0d99d4e689f88a048639cd.ppt