53ad540e9ae903c8b810a99072957d62.ppt
- Количество слайдов: 38
Rethinking the Internet Architecture Process, Architecture, and Troubleshooting Scott Shenker (joint work with many people, including Katerina Argyraki, Hari Balakrishnan, David Cheriton, Petros Maniatis, Ion Stoica, Mike Walfish) 1
Process Why are we doing this, anyway? 2
Why the Clean Slate Mania? • Internet in crisis? - lack of functionality not a crucial problem - lack of reliability is most important problem • Research community in crisis? - little practical impact on architecture - narrowed focus, stopped asking the big questions • NSF’s response: FIND and GENI - but not enough by itself. . 3
You Can Lead an Academic to Architecture, but. . • Normal academic behavior won’t produce architecture - Publication requires differentiation and/or indifference - Architecture comes from critique and synthesis • work on ideas other than your own. . . • Can’t just design, simulate and abandon - must also experiment and deploy. . . -. . . then discuss and synthesize • Process change harder than technical issues - adoption is much harder than both! 4
Some Thoughts on Architecture material covered in several papers (apologies to those who have heard all this before) not comprehensive architecture, many issues ignored 5
What’s Wrong with the Internet? • Internet is everywhere, used for (almost) everything • Main limiting factor seems to be lack of reliability - can’t do telesurgery, air traffic control, etc. • Hard to improve reliability of packet delivery within current architecture • Vulnerable to attacks, misconfigurations and failures 6
Packet Delivery Problems • Access link failures - multihome • Routing failures - security, policy, configuration, convergence, multipath, . . . • Congestion control failures - FQ, XCP, RCP, . . • Do. S - default-off, capabilities, filters, . . . 7
Packet Delivery Problems • Technical solutions are largely at hand - not perfect, but huge improvement over status quo • No overarching synthetic architecture has emerged - symptom of process failure, or just too early? • But packet delivery won’t be the focus of this talk. . - because only experts see it as the major problem 8
Normal User’s Perspective Other forms of failure dominate: • out-of-date email addresses • broken links • misleading urls and/or inauthentic data • applications blocked by NATs, etc. • email unusable or unreliable due to spam • . . . 9
Why? Three Important Changes. . . 1. Host-to-host accessing data and services 2. End-to-end middleboxes 3. Appropriate communication spam 10
Three Important Changes 1. Host-to-host accessing data and services 2. End-to-end middleboxes 3. Appropriate communication spam 11
Not just host-oriented apps. . • Of course, packets always flow from host to host - modulo middleboxes. . • But which host are the packets sent to? • This is controlled by what hostname is used • So adjusting to data-oriented apps involves reevaluating the Internet naming system - data, service specified by host/path pair 12
Problems with host/path names • Data movement causes broken links - names should be persistent • Replication unnecessarily difficult - Akamai expensive, and can’t replicate at object granularity - Google, P 2 P, etc. do this now. . • DNS names lead to legal/political battles - increasingly important, witness ICANN debacle • Names don’t facilitate authentication - can’t easily verify that data originated with intended source 13
Fix #1: Name Data/Services Directly • Network locations: IP addresses • Hosts: endpoints identifiers (EIDs) • Data/Services: service identifiers (SIDs) - direct naming supports fine-grained migration/replication • User-level descriptors: - search terms - canonical names (AOL keywords) -. . . . 14
Fix #2: Use Names in Appropriate Layer User-level descriptors (e. g. , search) App-specific search/lookup returns SIDs App session Application App session Resolves SID to EID Opens transport conns Bind to EID (HIP) Transport Resolves EID to IP IP IP hdr EID TCP SID … IP 15
Fix #3: Names Should be Flat! 0 xf 436 f 0 ab 527 bac 9 e 8 b 100 afeff 394300 • A name can be persistent if and only if it doesn’t embed any mutable information about its referent • Flat names embed no information, so they can be used to persistently name anything - Enables inter-domain migration, etc. • Once you have a large flat namespace, you never need other global handles - no distinction between EIDs, SIDs, etc. 16
Disadvantages of Flat Names • Hard to resolve • No local control • No locality • Not human friendly all can be handled, but flat names do require new resolution infrastructure 17
Fix #4: Make Names Self-certifying • Name = Hash(pubkey, salt) • Value = <pubkey, salt, data, signature> - can verify name related to pubkey and pubkey signed data • Can receive data from caches or other 3 rd parties without worry - much more opportunistic data transfer 18
Proposed Naming System • Flat, self-certifying identifiers for all entities • Used in “layered” fashion so that each protocol binds to the correct level of abstraction • Names are persistent, verifiable, and support easy replication and migration • Requirement: industrial-strength flat name resolver - names, key revocation (later, another use) 19
Three Important Changes 1. Host-to-host accessing data and services 2. End-to-end middleboxes 3. Appropriate communication spam 20
Not just end-to-end. . • Middleboxes provide important functionality - NATs, firewalls, proxies, caches, app accelerators, etc. • But processing between endpoints violates pure endto-end religion, and causes many practical problems - e. g. , NATs interfere with many applications, • How can architecture support middleboxes better? - eliminate problems and make them architecturally sound 21
Delegation via Resolution • Names usually resolve to “location” of entity • Delegation principle: A network entity should be able to direct resolutions of its name not only to its own location, but also to chosen delegates • Semantics: - where am I where should packets be sent to reach me • This allows packets to be directed towards middleboxes in a clean and coherent manner 22
Architecturally-Sound Middleboxes Current (Bad) Middleboxes Example Dest EID d Mapping ipd ipf Packet structure ipd EID hdr ipf TCP d TCP hdr Firewall EID d IP ipd EID s IP ipf • Delegate can be anywhere, not necessarily on path • Can apply to app-layer middle boxes • Including SID, EID in packet is crucial 23
Possible Impacts • More general services: more complex services (like Riverbed, transcoding, etc. ) can fit within framework • Remote services, not boxes: since middleboxes need not be on-path, services like firewalls, virus-scanners, etc. can be provided as remote services • Rethinking transport: with intermediaries between endpoints, basic notion of the transport layer should be rethought, combining ideas from DTN, DOT, etc. 24
Three Important Changes 1. Host-to-host accessing data and services 2. End-to-end middleboxes 3. Appropriate communication spam 25
Restraining Usage • Can’t be at packet level, must be app-dependent • But don’t want separate mechanism for each app - Email, IM, wiki, etc. • Proposal: quota system - quotas allocated in application-dependent manner - quotas enforced through single mechanism • stamp for each usage, canceled through mechanism • see NSDI 06 paper for details. . • Uses flat name resolution 26
Summary: Other Forms of Failure. . . • broken links and pointers: persistent names • inauthentic data: self-certifying names • applications blocked by NATs, etc. : delegation • spam and other clutter: quota enforcement No change to IP or routers! 27
Troubleshooting and Debugging because things inevitably fail. . . 28
User’s Perspective • Want to know who to yell at - identify responsible entity (at appropriate granularity) • Want their complaints to be taken seriously - provide credible and actionable report • Want the problem fixed, now - detailed diagnostic tools - this is traditional focus of troubleshooting 29
User’s Perspective • Want to know who to yell at - identify responsible entity (at appropriate granularity) • Want their complaints to be taken seriously - provide credible and actionable reports • Want the problem fixed - detailed debugging tools - this is traditional focus of work in this area 30
Vision • Incorporate coherent set of monitoring tools into architecture that: - record necessary information - process information to answer relevant questions • Key points: - not just statistics (e. g. , Netflow), but answers - focus broader than just detailed diagnostics • Three examples 31
Ex. #1: Monitoring ISPs • Monitor boxes on peering links record packet digests - no internal information revealed • Boxes exchange information to determine where packets are dropped and/or delayed • Information ends up at source ISP or end user • Overhead: ~2 -4% of packet bandwidth • Can be applied within enterprises, etc. 32
Ex. #2: Multilayer Tracing • Traceroute is useful, but limited to IP • XTrace (just started) is a generalized version: - operates at multiple layers - follows recursive packet generation (DNS queries, etc. ) - can implement policies about when to respond • Requirements: - layer must be able to handle and propagate metadata - module on box to intercept and report on packets 33
Ex. #3: Distributed Debugging • When bugs occur in operation, it can be extremely difficult to locate and reproduce • We are developing liblog, a log-and-replay debugging tool (early) that is always turned on • Lots of log-and-replay debuggers, ours meets a special set of requirements. . (not described here) 34
Logging and Replay 1. Each process logs its execution to a local file 2. Logs are collected at central location and replayed app app liblog Log 1 Node 1 Log 2 Log 3 Node 2 Node 3 Replay Node GDB console GDB app/liblog 1 app/liblog 5 6 3 4 GDB app/liblog 2 8 7 9 35
Extensions • liblog generates too much data - hard to sift through for large systems • Next step: setting global watchpoints and breakpoints • Can specify in terms of general expressions (python) - routing loops, state inconsistencies, etc. • No operational experience yet 36
Troubleshooting and Debugging • Automated end-user reporting tools would be useful to both users and ISPs - lots of low-hanging fruit • Not clear ISPs will take the lead on troubleshooting - ISPs may not be eager to admit fault - but they should be eager to reduce phonebank expenses • Experience needed with distributed debugger in networking context 37
Summary • Biggest challenge is to get community talking to each other rather than past each other • Reliability more pressing than functionality - have tools to provide better packet delivery - then considered wider set of failure modes - can handle without IP/router involvement • Troubleshooting should be part of “architecture” - nowhere near coherent yet - looking for basic building blocks 38
53ad540e9ae903c8b810a99072957d62.ppt