Скачать презентацию 05398 מבוא לתקשורת הרצאות 5 -3 רמת Скачать презентацию 05398 מבוא לתקשורת הרצאות 5 -3 רמת

4699b155894bd7a24a2f1872aaae267a.ppt

  • Количество слайдов: 115

 05398 מבוא לתקשורת הרצאות 5 -3: רמת התמסורת (chapter 3 – Transport) פרופ' 05398 מבוא לתקשורת הרצאות 5 -3: רמת התמסורת (chapter 3 – Transport) פרופ' אמיר הרצברג Based on foils by Kurose & Ross ©, see: http: //www. aw. com/kurose-ross/ My site: http: //amirherzberg. com Computer Networking: A Top Down Approach Featuring the Internet, 3 rd edition. Jim Kurose, Keith Ross Addison-Wesley, July 2005. Transport Layer 1

Today (& Chapter 3): outline r 3. 1 Transport-layer services r 3. 2 Multiplexing Today (& Chapter 3): outline r 3. 1 Transport-layer services r 3. 2 Multiplexing and demultiplexing r 3. 3 Connectionless transport: UDP r 3. 4 Principles of reliable data transfer r 3. 5 Connection-oriented transport: TCP m m segment structure reliable data transfer flow control connection management r 3. 6 Principles of congestion control r 3. 7 TCP congestion control Transport Layer 2

Processes & communication Process: program running within a host. r within same host, two Processes & communication Process: program running within a host. r within same host, two processes communicate using inter-process communication (IPC) m Client process: process that initiates communication Server process: process that waits to be contacted; always on Not in this course r processes in different hosts communicate by exchanging messages m In this course Transport Layer 3

Transport services and protocols r communication between network data link physical al ic g Transport services and protocols r communication between network data link physical al ic g lo d en d- en network data link physical po s an tr rt application processes (not just hosts) r Run in end systems m send segments via network layer m rcv side: reassemble segments to messages m Pass to application via socket (buffer) r Internet transport protocols: TCP and UDP application transport network data link physical Transport Layer 4

Internet transport-layer protocols al d en d- en network data link physical po s Internet transport-layer protocols al d en d- en network data link physical po s an tr rt r Neither ensure: m delay guarantees m bandwidth guarantees network data link physical ic r Connection-less: UDP m Unreliable, unordered delivery m “best-effort”, like IP application transport network data link physical g lo r Connection: TCP m Reliable, in-order delivery m congestion control m flow control m connection setup application transport network data link physical Transport Layer 5

Multiplexing of Packets within Host r Network layer delivers packet to correct host (by Multiplexing of Packets within Host r Network layer delivers packet to correct host (by IP address) r Transport layer delivers packet to correct process r How to identify process (in host)? m m m Use 16 -bit port number (in Transport Layer header) Multiplexing: adding ports identifier to packet delivered to IP Demultiplex: deliver packets to correct apps (using ports in packet) r Server: fixed port # <1024 m m m So request reach server HTTP server: 80 Mail server: 25 r Client: OS-assigned (`random`), dynamic port >1024 m Multiple client processes can work with same server r Process queues data of a connection/port via socket Transport Layer 7

Sockets and Sockets API r processes sends/receive messages via socket API r API=Application Program Sockets and Sockets API r processes sends/receive messages via socket API r API=Application Program Interface r Socket: in/out buffer m Metaphor: mailbox host (Client / Server) Application Layer process socket TCP/UDP Internet TCP/UDP r Sockets API functions: m Send, receive m Connection (TCP, not UDP): open, close m Fix a few parameters, e. g. buffer size (more on this later) m Choose transport service… Transport Layer 8

other Internet layer header fields r host receives datagram carries (one) transport-layer segment m other Internet layer header fields r host receives datagram carries (one) transport-layer segment m Segment has source, destination port numbers r Use IP addresses & ports to identify socket r How? Depends… m m UDP: one socket per port TCP server: one socket per Internet layer packet m Datagram source IP Address dest IP Address source port # dest port # other transport header fields application data (message) Transport layer segment Header IP How demultiplexing works 32 bits Transport Layer 10

NAPT/NAT: Network Address (Port) Translation Goal: share IP addresses among multiple hosts How: identify NAPT/NAT: Network Address (Port) Translation Goal: share IP addresses among multiple hosts How: identify host by port 2. Src. IP=133. 44. 5. 8 Src. Port=6678 Dst. Port=80 3. Dst. IP=133. 44. 5. 8 Src. Port=80 Dst. Port=6678 1. Src. IP=10. 0. 0. 1 Src. Port=3373 Dst. Port=80 4. Dst. IP=10. 0. 0. 1 Src. Port=80 Dst. Port=3373 Transport Layer 12

NAPT: Network Address/Port Translation 2: NAT router changes datagram source addr from 10. 0. NAPT: Network Address/Port Translation 2: NAT router changes datagram source addr from 10. 0. 0. 1, 3345 to 138. 76. 29. 7, 5001, updates table 2 NAT translation table WAN side addr LAN side addr 1: host 10. 0. 0. 1 sends datagram to 128. 119. 40, 80 138. 76. 29. 7, 5001 10. 0. 0. 1, 3345 …… …… S: 10. 0. 0. 1, 3345 D: 128. 119. 40. 186, 80 S: 138. 76. 29. 7, 5001 D: 128. 119. 40. 186, 80 138. 76. 29. 7 S: 128. 119. 40. 186, 80 D: 138. 76. 29. 7, 5001 3: Reply arrives dest. address: 138. 76. 29. 7, 5001 3 1 10. 0. 0. 4 S: 128. 119. 40. 186, 80 D: 10. 0. 0. 1, 3345 10. 0. 0. 2 4 10. 0. 0. 3 4: NAT router changes datagram dest addr from 138. 76. 29. 7, 5001 to 10. 0. 0. 1, 3345 Transport Layer 13

UDP: Connectionless demultiplexing r Create sockets with port numbers: Datagram. Socket my. Socket 1 UDP: Connectionless demultiplexing r Create sockets with port numbers: Datagram. Socket my. Socket 1 = new Datagram. Socket(9111); Datagram. Socket my. Socket 2 = new Datagram. Socket(); r UDP socket identified by destination port (and IP) r When host receives UDP segment: m m checks destination port number in segment directs UDP segment to socket with that port number r Regardless of source IP and port Transport Layer 14

UDP: Connectionless demux (cont) Datagram. Socket server. Socket = new Datagram. Socket(6428); P 3 UDP: Connectionless demux (cont) Datagram. Socket server. Socket = new Datagram. Socket(6428); P 3 SP: 6428 DP: 9157 client IP: A P 1 P 2 SP: 9157 DP: 6428 SP: 6428 DP: 5775 server IP: C SP: 5775 DP: 6428 Client IP: B DP= Destination Port, SP= Source Port Client’s SP provides “return address” (server’s DP) Client’s DP (destination port on server) known to client (e. g. standard) Transport Layer 15

TCP: Connection-oriented demux r TCP socket identified by 4 -tuple: m m source IP TCP: Connection-oriented demux r TCP socket identified by 4 -tuple: m m source IP address source port number dest IP address dest port number r recv host uses all four values to direct segment to appropriate socket r Server host may support many simultaneous TCP sockets: m each socket identified by its own 4 -tuple r E. g. , web servers have different sockets for each connecting client Transport Layer 16

TCP: Connection-oriented demux P 3 SP: 80 DP: 9157 client IP: A SP: 9157 TCP: Connection-oriented demux P 3 SP: 80 DP: 9157 client IP: A SP: 9157 DP: 80 P 1 P 4 SP: 80 DP: 5775 server IP: C SP: 5775 DP: 80 Client IP: B What is different here? Would this work well for web server? Transport Layer 17

TCP: Connection-oriented demux Multithreading server may listen to multiple ports P 3 SP: 80 TCP: Connection-oriented demux Multithreading server may listen to multiple ports P 3 SP: 80 DP: 9157 client IP: A P 1 P 2 SP: 9157 DP: 80 SP: 80 DP: 5775 server IP: C SP: 5775 DP: 80 Client IP: B Transport Layer 18

Internet transport layer services TCP service: UDP service: r connection-oriented: setup r Datagram: no Internet transport layer services TCP service: UDP service: r connection-oriented: setup r Datagram: no setup overhead required m TCP setup: one round trip r reliable transport: no loss (but some overhead) r flow control: feed (send) at receiver’s speed r congestion control: throttle sender when network overloaded r does not provide: delay, bandwidth guarantees r Unreliable transport: loss r does not provide: m m m Reliability Flow/congestion control Delay or bandwidth guarantees Others (e. g. RDP) Q: Would we ever choose UDP? Transport Layer 19

Choosing transport service… Reliability, congestion, flow controls r TCP: yes, UDP: no r Overhead, Choosing transport service… Reliability, congestion, flow controls r TCP: yes, UDP: no r Overhead, slowdown r Often required m m E. g. file transfer, e-mail Not always, e. g. , audiocast Setup overhead? r TCP: yes, UDP: no r Ignore in long-lived app m E. g. file transfer, telnet r Sometimes significant m E. g. DNS queries Bandwidth & Delay Guarantee r Not in TCP, UDP r Ok for “elastic” appl. Use whatever bandwidth & delay they get m E. g. e-mail, Web r Phone & other apps require minimal Quality of Service m Size and complexity r Code size, complexity are sometimes critical (…UDP) m E. g. , dust computer, sensor Transport Layer 20

Chapter 3 outline r 3. 1 Transport-layer services r 3. 2 Multiplexing and demultiplexing Chapter 3 outline r 3. 1 Transport-layer services r 3. 2 Multiplexing and demultiplexing r 3. 3 Connectionless transport: UDP r 3. 4 Principles of reliable data transfer r 3. 5 Connection-oriented transport: TCP m m segment structure reliable data transfer flow control connection management r 3. 6 Principles of congestion control r 3. 7 TCP congestion control Transport Layer 21

UDP: User Datagram Protocol r “no frills, ” “bare bones” transport protocol r “best UDP: User Datagram Protocol r “no frills, ” “bare bones” transport protocol r “best effort” service, UDP segments may be: m lost m delivered out of order to app r connectionless: m no handshaking m each UDP segment handled independently of others [RFC 768] Why is there a UDP? r no connection setup (delay) r simple: no connection state at sender, receiver r no retransmission r no congestion control: UDP can blast away as fast as desired (or more…) r smaller segment header Transport Layer 22

UDP: header and more r UDP often used for streaming multimedia apps m loss UDP: header and more r UDP often used for streaming multimedia apps m loss tolerant m rate sensitive Length, in bytes of UDP segment, including header r other UDP uses m DNS (why? ) m SNMP – Simple Network Management Protocol r UDP Congestion/flow control? m m By application; flexible Limited buffer port [notes] r Reliability over UDP? m m Checksum detects (most) corruptions [hidden foils] Requires application-specific error recovery! 32 bits source port # dest port # length checksum Application data (message) UDP segment format Transport Layer 23

Chapter 3 outline r 3. 1 Transport-layer services r 3. 2 Multiplexing and demultiplexing Chapter 3 outline r 3. 1 Transport-layer services r 3. 2 Multiplexing and demultiplexing r 3. 3 Connectionless transport: UDP r 3. 4 Principles of reliable data transfer r 3. 5 Connection-oriented transport: TCP m m segment structure reliable data transfer flow control connection management r 3. 6 Principles of congestion control r 3. 7 TCP congestion control Transport Layer 26

Principles of Reliable data transfer r important in app. , transport, link layers r Principles of Reliable data transfer r important in app. , transport, link layers r top-10 list of important networking topics! r characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt) Transport Layer 27

RDT Protocol : Events r Finite state machine r Inputs: m rdt_send(m) – message RDT Protocol : Events r Finite state machine r Inputs: m rdt_send(m) – message m from application m rdt_rcv(p) – packet p from (unreliable) channel r Outputs: m udt_send(p) – packet p to (unreliable) channel m deliver_data(m) – message m to application m ready( ) – to receive another message from application r Execution: m Sequence of input, output events m Outputs defined by protocol Transport Layer 29

Reliable data transfer: plan We’ll: r Present few (improving) versions of RDT protocol m Reliable data transfer: plan We’ll: r Present few (improving) versions of RDT protocol m m From strong to weak assumptions on input events (channel) Later, also add more inputs/outputs… r Consider only unidirectional message transfer m but bidirectional packets (&control info) flow r Define protocol by pseudo-code or state diagram: m States S={S 1, S 2, …} Event e 1 causing transition S 1 S 2 Action a 1 taken on state transition m Events E={e 1, e 2, …} m Actions A={a 1, a 2, …} state m Transition rules, Event e 2 S 1 S 2 S E A S, e. g. : Action a 2 Event e 3 Action a 3 S 3 Transport Layer 30

Rdt 1. 0: reliable transfer over a reliable channel r underlying channel (udt) perfectly Rdt 1. 0: reliable transfer over a reliable channel r underlying channel (udt) perfectly reliable m no bit errors m no loss of packets m No need to setup and teardown connection (always up) r separate state machines for sender, receiver: m sender sends data into underlying channel m receiver read data from underlying channel Wait for call from above rdt_send(data) packet = make_pkt(data) udt_send(packet) sender Wait for call from below rdt_rcv(packet) extract (packet, data) deliver_data(data) receiver Transport Layer 31

Rdt 2. 0: udt channel with bit errors r underlying channel may flip bits Rdt 2. 0: udt channel with bit errors r underlying channel may flip bits in packet m recall: UDP checksum to detect bit errors m Assume: errors detected, no packet loss / reordering r the question: how to recover from errors: m acknowledgements (ACKs): receiver explicitly tells sender that pkt received OK m negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors m sender retransmits pkt on receipt of NAK r new mechanisms in rdt 2. 0 (beyond rdt 1. 0): m m m error detection receiver feedback: control msgs (ACK, NAK) rcvr->sender `ready` output [why? ] Transport Layer 32

rdt 2. 0: FSM specification rdt_send(data) sndpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && is. rdt 2. 0: FSM specification rdt_send(data) sndpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && is. NAK(rcvpkt) Wait for call from ACK or udt_send(sndpkt) above NAK rdt_rcv(rcvpkt) && is. ACK(rcvpkt) ready sender receiver rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK) Wait for call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt, data) deliver_data(data) udt_send(ACK) Transport Layer 33

rdt 2. 0: operation with no errors rdt_send(data) sndpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) rdt 2. 0: operation with no errors rdt_send(data) sndpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && is. NAK(rcvpkt) Wait for call from ACK or udt_send(sndpkt) above NAK rdt_rcv(rcvpkt) && is. ACK(rcvpkt) ready rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK) Wait for call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt, data) deliver_data(data) udt_send(ACK) Transport Layer 34

rdt 2. 0: error scenario rdt_send(data) sndpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && is. rdt 2. 0: error scenario rdt_send(data) sndpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && is. NAK(rcvpkt) Wait for call from ACK or udt_send(sndpkt) above NAK rdt_rcv(rcvpkt) && is. ACK(rcvpkt) ready rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK) Wait for call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt, data) deliver_data(data) udt_send(ACK) Transport Layer 35

Analysis of RDT 2. 0 r Goals of RDT 2. 0: m Delivered messages Analysis of RDT 2. 0 r Goals of RDT 2. 0: m Delivered messages are prefix of sent messages m Eventually, all messages get delivered, `ready` for more r Assumptions of RDT 2. 0: m Every packet sent, is eventually delivered m But: possibly corrupted (always detected) m 1 -to-1 mapping from packets delivered to sent r Goals are not met. Why? Transport Layer 36

rdt 2. 0 has a fatal flaw! What happens if ACK/NAK corrupted? r 1 rdt 2. 0 has a fatal flaw! What happens if ACK/NAK corrupted? r 1 st : add checksum… but r sender still doesn’t know what happened at receiver! r can’t just retransmit: possible duplicate What to do? r sender ACKs/NAKs receiver’s ACK/NAK? What if this ACK/NAK garbled? r Retransmit on garbled m Danger: duplicates Handling duplicates: r sender adds sequence number to each pkt r sender retransmits current pkt if ACK/NAK garbled r receiver discards (doesn’t deliver) duplicate pkt RDT 2. 1, aka `stop & wait` protocol Sender sends one packet, then waits for receiver’s response; use (1 -bit) sequence number Transport Layer 37

RDT 2. 1 in pseudo-code… EDC: Error Detection Code (e. g. checksum) Sender: state=ready; RDT 2. 1 in pseudo-code… EDC: Error Detection Code (e. g. checksum) Sender: state=ready; i=0; On rdt_send(m): if state=ready then { p=; udt_send(p); state=wait; } On rdt_rcv(p’) r If p’=<“ack”, EDC(“ack”)> then { i=i+1 mod 2; state=ready; } else udt_send(p); Receiver: i=0; On rdt_rcv(p) r If p= then { c=EDC(“ack”); udt_send(“ack”, c); if i=j then { deliver(m); i=i+1 mod 2; } } else rdt_sent(“nack”); Transport Layer 40

Analysis of RDT 2. 1 [sketch] r Goals, Assumptions: same as RDT 2. 0 Analysis of RDT 2. 1 [sketch] r Goals, Assumptions: same as RDT 2. 0 m Every packet sent, is eventually delivered m But: possibly corrupted (always detected) m 1 -to-1 mapping from packets delivered to sent r Define `legitimate` states: [E=Even, O=Odd] 1. RE/RO: Ready (to receive E/O message); channel empty 2. ME/MO: Wait for E/O Ack; channel has new E/O message 3. AE/AO: Wait for E/O Ack; channel has E/O Ack 4. MER/MOR: Wait for E/O Ack; channel has E/O message which was already delivered 5. WEAO: Wait for E Ack but channel has O Ack 6. WOAE: Wait for O Ack but channel has E Ack m Prove by induction: system stays in legitimate state! Transport Layer 41

Analysis of RDT 2. 1 [sketch, cont’] m Claim: system always in legitimate state Analysis of RDT 2. 1 [sketch, cont’] m Claim: system always in legitimate state m Base: initial state is (legitimate) RE m Consider moves from legitimate states: 1. RE/RO: only to ME/MO 2. ME/MO: If received Ok, move to AE/AO; otherwise, to WEAO/WOAE 3. AE/AO: If received Ok, move to RO/RE; otherwise, to MER/MOR 4. MER/MOR: same as ME/MO! 5. WEAO/WOAE: move to MER/MOR m If received correctly… and also if corrupted! m System always stays in legit state!! □ Transport Layer 42

RDT 2. 1: Discussion r Nack/Ack, 1 bit counter r Handles (detectable) packet corruptions RDT 2. 1: Discussion r Nack/Ack, 1 bit counter r Handles (detectable) packet corruptions r Assume: m No processor faults m No connection setup / teardown (always up) m All error in packets detected (by checksum) m No packet loss / reordering / duplication m Duplication of message is no problem m But duplicate ACK can cause loss m RDT 2. 2: handle duplication: add seq # to ack… Transport Layer 43

RDT 2. 2: duplicate-tolerant protocol r Same functionality as rdt 2. 1, using ACKs RDT 2. 2: duplicate-tolerant protocol r Same functionality as rdt 2. 1, using ACKs only r Instead of NAK, receiver sends ACK for last pkt received OK m Explicitly include seq # of pkt being ACKed m Resilient to packet (incl. Ack) duplication • Exercise: why RDT 2. 1 fails if Ack is duplicated? m Important: allows sender to retransmit… see later! r Duplicate ACK at sender results in same action as NAK: retransmit current pkt Transport Layer 44

RDT 2. 2: pseudo-code… Sender: state=ready; i=0; On rdt_send(m): if state=ready then { p=<m, RDT 2. 2: pseudo-code… Sender: state=ready; i=0; On rdt_send(m): if state=ready then { p=; udt_send(p); state=wait; } On rdt_rcv(p’) r Let c=EDC (“ack”, i); If p’=<“ack”, i, c> then { i=i+1 mod 2; state=ready; } else udt_send(p); Receiver: i=0; On rdt_rcv(p’) r If p’= then { p=<“ack”, i, EDC(“ack”, i)>; udt_send(p); deliver(m); i=i+1 mod 2; } else rdt_sent(p); Transport Layer 46

RDT 3. 0: Alternating Bit Protocol Handles errors, dup and loss, but no reordering RDT 3. 0: Alternating Bit Protocol Handles errors, dup and loss, but no reordering Allow packet loss: channel can also lose packets (data or ACKs) m checksum, seq. #, ACKs, retransmissions are not enough Q: how to deal with loss? m m 1 st Idea: sender waits maximal round-trip time (RTT), then retransmits Pro: simple, no duplicates Good for fixed, known RTT What for variable RTT? ? Better idea: sender waits “reasonable” amount of time for ACK (

Alternating Bit Protocol (RDT 3. 0): pseudo-code for sender… state=ready; i=0; On rdt_send(m): if Alternating Bit Protocol (RDT 3. 0): pseudo-code for sender… state=ready; i=0; On rdt_send(m): if state=ready then { p=; udt_send(p); state=wait; sleep(“timeout”, T); } On rdt_rcv(p’) If p’=<“ack”, i, EDC(“ack”, i)> then { i=i+1 mod 2; state=ready; abort(“timeout”); } On Wake. Up(“timeout”): {udt_send(p); sleep(“timeout”, T); } Recipient: as in RDT 2. 2! Transport Layer 49

Alternating Bit (rdt 3. 0) in action Transport Layer 51 Alternating Bit (rdt 3. 0) in action Transport Layer 51

Alternating bit (rdt 3. 0) in action Transport Layer 52 Alternating bit (rdt 3. 0) in action Transport Layer 52

Alternating bit (rdt 3. 0): problems r No pipelining (at most one packet on Alternating bit (rdt 3. 0): problems r No pipelining (at most one packet on link) m Performance problem r Assumes FIFO (no reordering) m But datagram networks may not ensure FIFO! r No setup and teardown of connection m Teardown required to allow appl to take corrective/alternate action if not connected m And to avoid keeping state indefinitely m Also: to handle processor faults Transport Layer 53

Rest of Transport Layer… r TCP Overview r TCP Reliability r TCP Timeouts r Rest of Transport Layer… r TCP Overview r TCP Reliability r TCP Timeouts r TCP Flow Control r 3. 6 Principles of congestion control r 3. 7 TCP congestion control r TCP Fairness Transport Layer 54

TCP: Overview r point-to-point: m one sender, one receiver r reliable, in-order byte steam: TCP: Overview r point-to-point: m one sender, one receiver r reliable, in-order byte steam: m no “message boundaries” r connection-oriented: m Handshake before data exchange m Teardown (to free resources, detect failure) RFCs: 793, 1122, 1323, 2018, 2581 r pipelined: m TCP congestion and flow control set window size m flow control: sender will not overwhelm receiver r send & receive buffers r full duplex data: m bi-directional data flow in same connection Transport Layer 55

TCP segment structure 32 bits U (URG): urgent data (generally not used) A (ACK): TCP segment structure 32 bits U (URG): urgent data (generally not used) A (ACK): this is an ack of ack-number P (PSH): push data (don’t aggregate) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) source port # dest port # sequence number acknowledgement number head not UA P R S F len used checksum Receive window Urg data pnter Options (variable length) counting by bytes of data (not segments!) # segments rcvr willing to accept application data (variable length) Transport Layer 56

TCP Services r TCP connection-based transport service: m Setup connection m Use connection m TCP Services r TCP connection-based transport service: m Setup connection m Use connection m Close connection r TCP connection services: m Flow control: don’t send faster than receiver can absorb [later] m Congestion control: slow down if network gets congested [later] m Fairness: fair sharing of Net resources among connections [later] m Reliable, pipelined stream service – next! Transport Layer 57

TCP reliable data transfer r TCP: reliability on top of IP’s unreliable service m TCP reliable data transfer r TCP: reliability on top of IP’s unreliable service m m m Packet loss / error Duplicates Out of order Setup/teardown Loss of state in proc. r Pipelined segments m hybrid of GBN & SR m But a single retransmission timer m Substantial flexibility r Cumulative acks m Resend (duplicate) ACK instead of NACK r Retransmissions are triggered by: m m timeout events duplicate ACKs (later) r Initially consider simplified TCP sender: m m ignore duplicate acks ignore flow control, congestion control r Window of N bytes Transport Layer 58

Reliable (TCP) Connection r What is a reliable (TCP) connection? r Reliable connection management Reliable (TCP) Connection r What is a reliable (TCP) connection? r Reliable connection management m Setup m Teardown r Reliable communication in connection m Delivery m FIFO… r Connection fails if and only if packets cannot pass Transport Layer 59

Reliable connection: More Interface… r Connection status at both ends: m Link Up/Down indicators Reliable connection: More Interface… r Connection status at both ends: m Link Up/Down indicators [Sender: also Ready/Wait] m Open/Close Connection commands m rdt_send, deliver only when link Up [sender: and Ready] Open/Close send side Ready / Wait Link up / down (Ok/Fault) Open/Close receive side Transport Layer 60

TCP Reliability r Reliable Connection Management m Setup (SYN, SYN-ACK), tear-down (later) r Reliable TCP Reliability r Reliable Connection Management m Setup (SYN, SYN-ACK), tear-down (later) r Reliable Communication in Connection m Counters, timeout, retransmit [like RDT 3. 0] Up B (Server) Ack 1 Syn X Syn. Ack A (Client) 1 2 Up 2 Timeout Transport Layer 61

Reliable Connection Management r Three basic states: m Up m Down m Syn (in Reliable Connection Management r Three basic states: m Up m Down m Syn (in progress) m Teardown states - later r Send, receive only when connection is Up r Connection setup (SYN): m Completes in finite time m But may fail r Connection closure (FIN): m Completes in finite time Transport Layer 62

TCP Connection States r Three basic states: m Up m Down m Syn (in TCP Connection States r Three basic states: m Up m Down m Syn (in progress) r Send, receive only when connection Up Syn Syn B X Syn Up Up X X X Syn X A Syn Up Syn. Up Up Syn Transport Layer 63

TCP Connection Teardown r Ordered and Forced Teardown mechanisms r Forced Teardown: m When TCP Connection Teardown r Ordered and Forced Teardown mechanisms r Forced Teardown: m When `fatal` error detected: m Send RST signal and abort (break) connection m Same when receiving RST signal • If sequence number in RST is within `window` [why? ] r Ordered Teardown: m Ensure all messages sent are received m Ensure peer will also close connection Transport Layer 64

TCP Connection Tear-Down Closing a connection: client closes socket: client. Socket. close(); client close TCP Connection Tear-Down Closing a connection: client closes socket: client. Socket. close(); client close Step 1: client end system Step 2 closing FIN timed wait FIN, replies with ACK. Closes connection, sends FIN ACK sends TCP FIN control segment to server Step 2: server receives Step 1 server ACK closed Transport Layer 65

TCP Connection Tear-Down (cont. ) Step 3: client receives FIN, client replies with ACK. TCP Connection Tear-Down (cont. ) Step 3: client receives FIN, client replies with ACK. Enters “timed wait” - will respond with ACK to received FINs (WHY? ? ) Step 4: server, receives ACK. Connection closed. Step 3 Exercise: Modify to allow (also) server to initiate Tear. Down (`simultaneous FINs`) Step 1 FIN ACK timed wait m closing server Step 2 closing FIN ACK closed (Step 4) closed Transport Layer 66

TCP Reliable Connection Closure r Fin, Fin-Wait: new states m Fin : closure begun; TCP Reliable Connection Closure r Fin, Fin-Wait: new states m Fin : closure begun; receive OK but don’t send m Fin-Wait: wait (30 sec), to ensure peer is done • Only by closure initiator (usually client) r Initiator: send FIN, wait for ACK and FIN r Responder: ACK (to FIN), last messages, FIN m Done when receiving ACK to FIN Fin B Up FIN data Ack A Up Fin Ack Down FIN Ack Fin-Wait Down 30 sec Transport Layer 67

TCP Connection: client lifecycle RST or timeout; send RST Transport Layer 68 TCP Connection: client lifecycle RST or timeout; send RST Transport Layer 68

TCP Connection: server lifecycle Receive Ack Transport Layer 69 TCP Connection: server lifecycle Receive Ack Transport Layer 69

Reliable Connection Management r Send, receive only when connection Up r Setup (SYN): complete/fail Reliable Connection Management r Send, receive only when connection Up r Setup (SYN): complete/fail in finite time r Closure (FIN): completes/fail in finite time r Reset (RST): only due to disconnection r Each Up at A (B) maps to unique Syn at B (A) Syn X X A Syn Up X X B Syn Up Syn. Up Up Syn Transport Layer 70

Mapping Between Connections r Each Up at A (B) maps to unique Syn at Mapping Between Connections r Each Up at A (B) maps to unique Syn at B (A) r Use mapping functions MAPA, MAPB r b=MAPA(a): map from A’s Up to B’s Syn r In example: MAPA is identity, i. e. MAPA(a)=a r But: MAPB(1)=1, MAPB(2)=4 ! r In general: MAP is monotonous, one-to-one Syn X A Syn Up (1) Syn (2) Syn (3) X Up(2) X Up (3) X Syn(2) Syn Up(2)Syn (4) Transport Layer (3) X B Syn (1) Up(1) 72

Stateful Connection Management r Reliable connection management is easy… r If we count Syn Stateful Connection Management r Reliable connection management is easy… r If we count Syn periods: m Each party maintains Syn periods counter m Send your Syn counter with SYN m Echo other party’s Syn counter in SYN-ACK m Ignore SYN-ACK with old echoed counter Syn (1) Syn (2) Syn (3) B X Syn, 4 Up(1) Up(2) Syn, 1 Syn, 2; 3 Syn, 1; 1 Fin, 1 Syn, 2 Ack, 3 X X Ack, 1 Syn, 3 (1) Up (3) X Syn(2) Syn Up(2)Syn (4) (3) X A Syn Up (1) Ack, 2 Transport Layer 73

Stateless Connection Management r Stateful connection management is easy r But servers, clients may Stateless Connection Management r Stateful connection management is easy r But servers, clients may lose state r And… too much state – should cnn. com keep state for each client (forever)? r And… How to identify repeating client? m IP address – which (DHCP, NAPT)? m IP + port - Fails for NAPT; ports are temporary r Need Stateless Connection Management r Use random numbers instead of counters! Transport Layer 74

Stateless Connection Management Use random seq# to identify Syn period m Select, send your Stateless Connection Management Use random seq# to identify Syn period m Select, send your seq# with SYN, SYN-ACK m Echo other party’s seq# with SYN-ACK, ACK m Ignore SYN responses with wrong (old) echoed identifier B 37 66; A 37 A 66 F 37 88 13: A 37 X A 91 42 A Syn Up (1) Syn (2) Syn (3) X Up(2) 91; A 42 A 63 X 34 Up (3) X Syn(2) Syn Up(2)Syn (4) (3) X Syn (1) Up(1) Transport Layer 75

Stateless Three Way Handshake Setup connection r client: initiator r server: responding r Each Stateless Three Way Handshake Setup connection r client: initiator r server: responding r Each choose random seq# m Match SYN, SYNACK m Seq# to ensure noloss of data r Use flags in TCP header: SYN, ACK, RST, FIN Three way handshake: Step 1: client sends TCP SYN segment to server m Specify client seq# m no data Step 2: server receives SYN, replies with SYN-ACK segment m Allocate buffers m Ack client’s seq# Send server’s seq# m no data Step 3: client receives SYN-ACK, validates #, Acks server # and (optionally) sends data m Transport Layer 76

Reliable Communication r Whenever a connection terminates (even with failure) m Sequence m No Reliable Communication r Whenever a connection terminates (even with failure) m Sequence m No received is prefix of sequence sent gaps m No reordering m No insertions m No duplications r Well-terminated connections: m At one or (usually) both ends • Can’t guarantee to always detect at both ends m All bytes delivered • Sequence received = sequence sent • No truncations Transport Layer 77

TCP Reliability in Connection r Sequence numbers kept by both parties r 32 bit TCP Reliability in Connection r Sequence numbers kept by both parties r 32 bit counters, initialized randomly m When overflow, `wrap` to begin from zero r Send with each packet sent to peer: m Sequence number of first byte sent in this packet m Seq. number of next byte expected from peer m All bytes till this one were received Ok (ACK) B SYN, 37 66; A 37 “Ready”, 66, A 39 71; A 43 “Hi”, 37, A 66 “Wait”, 39, A 71 A Transport Layer 78

Loss and Retransmissions r Packet or Ack is lost m Often due to congested Loss and Retransmissions r Packet or Ack is lost m Often due to congested router r TCP retransmits lost packet r Loss detected by timeout m Another method later B SYN, 37 A 66; A 37 “Ready”, 66, A 39 “Hi”, 37, A 66 X “Wait”, 39, A 71 71; A 43 “Wait”, 39, A 71 Timeout Transport Layer 79

Premature Timeout Duplicates r What if packet is delayed, but not lost? m Often Premature Timeout Duplicates r What if packet is delayed, but not lost? m Often due to congested router… r Delayed too long TCP retransmits it duplication r No harm – TCP discards duplicate packets m Send B SYN, 37 another ACK, in case first one was lost 66; A 37 “Ready”, 66, A 39 71; A 43 “Hi”, 37, A 66 “Wait”, 39, A 71 A Timeout Transport Layer 80

Stop-and-wait [RDT 3. 0]: inefficiency sender receiver first packet bit transmitted, t = 0 Stop-and-wait [RDT 3. 0]: inefficiency sender receiver first packet bit transmitted, t = 0 last packet bit transmitted, t = L / R RTT first packet bit arrives last packet bit arrives, send ACK arrives, send next packet, t = RTT + L / R Transport Layer 81

Pipeline / Window for Efficiency r Protocol as described is inefficient: m Transmission rate Pipeline / Window for Efficiency r Protocol as described is inefficient: m Transmission rate = 10 Mb/sec m Max segment size (MSS) = 1250 B= 10000 bits m Round Trip Time (RTT) = 100 msec Includes transmit, propagation, queuing m File size = 125 KB = 1 million bits = 100 segments m Time to send= 100 segments * 100 msec = 10 sec r Idea: send file without waiting for ACK? m Transmission time= (1 Mb)/(10 Mb/sec)=100 msec m RTT = 200 msec m Send 100 segments in pipeline / window Transport Layer 82

Pipelined protocols Pipelining: sender allows multiple, “in-flight”, yet-to-be -acknowledged pkts m m Need `real` Pipelined protocols Pipelining: sender allows multiple, “in-flight”, yet-to-be -acknowledged pkts m m Need `real` sequence numbers (not 1 bit) also deals with out-of-order (old) packets TCP: count bytes, begin with random numbers from setup buffering at sender and (optionally) receiver Pipelining improves link utilization Transport Layer 83

Pipelining: increased utilization sender receiver first packet bit transmitted, t = 0 last bit Pipelining: increased utilization sender receiver first packet bit transmitted, t = 0 last bit transmitted, t = L / R RTT first packet bit arrives last packet bit arrives, send ACK last bit of 2 nd packet arrives, send ACK last bit of 3 rd packet arrives, send ACK arrives, send next packet, t = RTT + L / R Transport Layer 84

TCP seq. #’s and ACKs with Pipeline Seq. #’s: m byte stream “number” of TCP seq. #’s and ACKs with Pipeline Seq. #’s: m byte stream “number” of first byte in segment’s data ACKs: m seq # of next byte expected from other side m cumulative ACK Q: how receiver handles out-of-order segments m A: ignore old pkts; buffer or ignore future packets Host B Host A User sends ‘Req’ Client ACKs receipt of ‘Ok’ Seq=4 2, ACK = 79, da ta = ‘R eq’ host ACKs receipt of k’ ‘Req’, sends a = ‘O , dat 45 back ‘Ok’ CK= 79, A q= Se Seq=4 5, ACK =81 simple telnet scenario Transport Layer time 85

TCP: retransmission scenarios Host A 2, 8 by tes da t Seq=92 timeout a TCP: retransmission scenarios Host A 2, 8 by tes da t Seq=92 timeout a =100 X ACK loss Seq=9 2, 8 by tes da ta 100 Sendbase = 100 Send. Base = 120 = ACK Send. Base = 100 time Host B Seq=9 Send. Base = 120 lost ACK scenario 2, 8 by Seq= 100, 2 tes da ta 0 byte s data 00 =1 20 CK CK=1 A A Seq=92 timeout Seq=9 timeout Host A Host B time 2, 8 by tes da ta 20 K=1 AC premature timeout Transport Layer 86

TCP retransmission scenarios (more) Host A Host B Seq=9 timeout 2, 8 by Send. TCP retransmission scenarios (more) Host A Host B Seq=9 timeout 2, 8 by Send. Base = 120 Seq=1 tes da ta 100 CK= A 00, 20 bytes data X loss 120 = ACK time Cumulative ACK scenario Transport Layer 87

TCP sender events (simplified): TCP_send(m): r segment=<seq#, m>; m seq # is byte-stream number TCP sender events (simplified): TCP_send(m): r segment=; m seq # is byte-stream number of first data byte in segment r start timer m if not already running m Timer is for oldest unacked segment Timeout: r retransmit segment that caused timeout r restart timer Ack rcvd: r If acknowledges previously unacked segments m m update what is known to be acked start timer if there are outstanding segments Transport Layer 88

Duplicate Ack & Fast Retransmit r Time-out period often relatively long: m long delay Duplicate Ack & Fast Retransmit r Time-out period often relatively long: m long delay before resending lost packet r Detect lost segments via duplicate ACKs. m m Sender often sends many segments back-toback If segment is lost, there will likely be many duplicate ACKs. r If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost: m m fast retransmit: resend segment before timer expires Why not two? Exercise… Transport Layer 90

Fast retransmit algorithm: On TCP_RCV(ACK, y) // and valid checksum if (y > Send. Fast retransmit algorithm: On TCP_RCV(ACK, y) // and valid checksum if (y > Send. Base) then { remove y-Send. Base bytes from Unacked; Send. Base = y; if (Next. Seq. Num>Send. Base) then start timer; } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y } a duplicate ACK for already ACKed segment fast retransmit Transport Layer 91

TCP ACK generation [RFC 1122, RFC 2581] Event at Receiver TCP Receiver action Arrival TCP ACK generation [RFC 1122, RFC 2581] Event at Receiver TCP Receiver action Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed Delayed ACK. Wait 200 ms (up to 500 ms) for next segment. If no next segment, send ACK Arrival of in-order segment with expected seq #. One other segment has ACK pending Immediately send single cumulative ACK, ACKing both in-order segments Arrival of higher-than-expected sequence number (gap) Immediately send duplicate ACK, indicating seq. # of next expected byte; usually, buffer data Transport Layer 92

Pipelining in TCP: Summary r Cumulative & Delayed Acknowledgements m Ack with seq # Pipelining in TCP: Summary r Cumulative & Delayed Acknowledgements m Ack with seq # n all bytes up to n received m Send Ack only after receiving 2 MSS or 200 msec r Recipient usually buffers out-of-order bytes m As long as they are inside the window m Optional selective-ack mechanism [RFC 2018; we ignore] r Sender retransmits m On timeout, or `fast retransmit` (3 dup ack) m Usually: resend only oldest (first) un. ACKed packet Transport Layer 95

Rest of Transport Layer… r TCP Overview r TCP Reliability r TCP Flow Control Rest of Transport Layer… r TCP Overview r TCP Reliability r TCP Flow Control r 3. 6 Principles of congestion control r TCP Timeouts r 3. 7 TCP congestion control r TCP Fairness Transport Layer 96

TCP Header: Rcv. Window r Rcv. Window: m TCP’s Source Port Destination Port Sequence TCP Header: Rcv. Window r Rcv. Window: m TCP’s Source Port Destination Port Sequence Number Acknowledge Number flow control m Maximal number of un-acked segments peer can send Hdr Len RS V Flags Checksum Rcv. Window Urgent pointer Options (variable length) Payload (Application Data) Transport Layer 97

TCP Flow Control r Problem: application on receiver may be slower than network no TCP Flow Control r Problem: application on receiver may be slower than network no place for incoming data r Flow Control: prevent receiver buffer overflow m m m By limiting amount of (unacked) segments in transit Send: Rcv. Window = (|buffer|-|used|)/MSS (Max Segment Size): indicated by initiator at SYN Rcv. Window (bytes) Received From Net Receiver Buffer Ignore segments received out of order Read by application used Transport Layer 98

Window Size (1) r Window size is critical for performance m If Rcv. Window Window Size (1) r Window size is critical for performance m If Rcv. Window is too small, performance suffers m Even if network is available, application clears buffers r Example: consider simple network below m Client and server connected by two routers m W = Window size (in bits) m R = Transmission rate of all links m RTT = Round Trip Time (RTT >> MSS/R) Transport Layer 99

Window Size (2) W/R RTT Ack W/R Transport Layer 100 Window Size (2) W/R RTT Ack W/R Transport Layer 100

Average Rate W/R Average Rate ≤ [R*(W/R)+0*(RTT-W/R)]/RTT = W/RTT Ack W/R Average Rate ≤ Average Rate W/R Average Rate ≤ [R*(W/R)+0*(RTT-W/R)]/RTT = W/RTT Ack W/R Average Rate ≤ min(R, W/RTT) Transport Layer 101

Optimal Receiver Window r Average Rate = min{R, W/RTT} r Flow control is not Optimal Receiver Window r Average Rate = min{R, W/RTT} r Flow control is not bottleneck, as long as W R * RTT r Receiver. Buffer Bandwidth x Delay r If window too small: slows down connection r Fat links: huge Bandwidth x Delay (e. g. satellite) r Aggressive flow control: send Rcv. Window larger than available buffers m By estimating drain rate; risk of buffer overflow m Works well when delay does not change much Transport Layer 102

TCP Buffers - Critical Resource r TCP Receiver Buffers: critical for performance r But TCP Buffers - Critical Resource r TCP Receiver Buffers: critical for performance r But large buffers limit number of connections m 10, 000 connections, 10 KB each 100 MB m Costs and performance implications m Abused by SYN-flooding Do. S attack (Net. Security course) r Servers should minimize open connections m Client should close connection (and wait 30 sec) m Many servers initiate close, but don’t wait 30 sec • If Ack is lost, client may timeout (abnormal close) Transport Layer 103

Lecture 7 outline r TCP Overview r TCP Flow Control r 3. 6 Principles Lecture 7 outline r TCP Overview r TCP Flow Control r 3. 6 Principles of congestion control r TCP Timeouts r 3. 7 TCP congestion control r TCP Fairness Transport Layer 105

Principles of Congestion Control Congestion: r informally: “too many sources sending too much data Principles of Congestion Control Congestion: r informally: “too many sources sending too much data too fast for network to handle” m Cf. to flow control (limited by receiver’s buffers) r manifestations: m lost packets (buffer overflow at routers) m long delays (queueing in router buffers) r a top-10 problem! m m Main cause for packet loss Main cause for delay and jitter (changing delay) Transport Layer 106

Congestion scenario 1: infinite buffers r two senders, two receivers one router r Thruput Congestion scenario 1: infinite buffers r two senders, two receivers one router r Thruput limit: in-rate, capacity λout=min(λin, c/2) r Assume no loss, retransmissions Host A Host B lout lin : original data unlimited shared output link buffers r Congestion: λin c/2 m m Infinite delays maximal throughput r What if buffers are finite? Transport Layer 107

Congestion scenario 2: finite buffers r one router, finite buffers λout=λin r l'in≥ lin Congestion scenario 2: finite buffers r one router, finite buffers λout=λin r l'in≥ lin since sender retransmits packet if… m m Packet or Ack lost due to buffer overflow (or noise) Excessive delay: if timeout<|queue| ∙ transmit time + Ack. Time r c/2 ≥ l'in≥ l‘out≥lout since delivery is only in order, no dups Host A lout l : original data in l'in : data+retransmit Host B l‘out finite shared buffers Transport Layer 108

Congestion scenario 2: finite buffers r one router, finite buffers λout=λin r sender retransmits Congestion scenario 2: finite buffers r one router, finite buffers λout=λin r sender retransmits packet if… m Lost due to buffer overflow (or noise) m Excessive delay (> timeout) r No congestion – queue usually not full m No re-transmission, loss: m Happens if lout=l’ out=l'in= lin << C/2 (capacity) – for both senders… r Congestion – queues often/usually full m Packet loss: whenever queue full (i. e. , usually…) l'in≥ l‘out m Cannot deliver retransmission, out-of-order l‘out>lout = lin • Especially with Go-Back-N ! m m Excessive delay (at least |Q|*Transmit. Time) Overhead retransmissions due to loss, time-out Transport Layer 109

Causes/costs of congestion: scenario 3 r four senders Q: what happens as l in Causes/costs of congestion: scenario 3 r four senders Q: what happens as l in and l increase ? r multihop paths in r timeout/retransmit Host A lin : original data lout l'in : original data, plus retransmitted data finite shared output link buffers Host B Transport Layer 110

Causes/costs of congestion: scenario 3 H o st A l o u t H Causes/costs of congestion: scenario 3 H o st A l o u t H o st B Another “cost” of congestion: r when packet dropped, any “upstream transmission capacity used for that packet wasted! r Causes sender to retransmit… vicious cycle! Transport Layer 111

Approaches towards congestion control Two broad approaches towards congestion control: End-end congestion control: r Approaches towards congestion control Two broad approaches towards congestion control: End-end congestion control: r no explicit feedback from network r congestion inferred from end-system observed loss, delay r approach taken by TCP Network-assisted congestion control: r routers provide feedback to end systems: m To source, e. g. ICMP Source Quench packet m Or via destination, often piggyback on message; may use flow control r From single `congestion bit` (SNA) to explicit rate (ATM ABR) – in book r Main issue: router efficiency; avoid state Transport Layer 112

Lecture 7 outline r TCP Overview r TCP Timeouts r TCP Flow Control r Lecture 7 outline r TCP Overview r TCP Timeouts r TCP Flow Control r 3. 6 Principles of congestion control r 3. 7 TCP congestion control r TCP Fairness Transport Layer 113

TCP Congestion Control r end-end control (no network assistance) r Assume all senders comply TCP Congestion Control r end-end control (no network assistance) r Assume all senders comply m Protocols must be `TCP friendly` r sender limits transmission: Last. Byte. Sent-Last. Byte. Acked min(Cong. Win, Rcv. Win) How does sender perceive congestion? r loss event = timeout or 3 duplicate acks r Sender reduces rate (Cong. Win) after loss Two modes: m `Slow start` - with Cong. Win rate = Bytes/sec exponential build-up RTT= r Cong. Win is dynamic, function Round m Congestion Trip avoidance (AIMD) of perceived congestion Time r Roughly, Transport Layer 114

TCP Slow Start MSS=Maximal Segment Size r Slow start: When connection begins, Cong. Win TCP Slow Start MSS=Maximal Segment Size r Slow start: When connection begins, Cong. Win = 1 MSS m Example: MSS = 500 bytes & RTT = 200 msec m initial rate = 20 kbps =(8*500)/0. 2 r Available bandwidth may be >> MSS/RTT m Need to increase congestion window (and rate) m By how much? Rapid, exponential build-up m Until we detect possible congestion (by loss)… Transport Layer 115

TCP Slow Start r Exponential rate build- Host A m double Cong. Win every TCP Slow Start r Exponential rate build- Host A m double Cong. Win every RTT m by incrementing Cong. Win for every ACK received r Summary: initial rate is RTT up until first loss event: Host B one segme nt two segme nts four segme nts slow (Cong. Win = 1 MSS) but ramps up rapidly r Loss: Change to Congestion Avoidance (AIMD)… time Transport Layer 116

TCP Congestion Avoidance (AIMD: Additive Increase, Multiplicative Decrease) Additive Increase: increase Cong. Win by TCP Congestion Avoidance (AIMD: Additive Increase, Multiplicative Decrease) Additive Increase: increase Cong. Win by 1 MSS every RTT in the absence of loss Multiplicative Decrease: events cut Cong. Win in half after 3 duplicate Acks (of possibly lost packet) [Time. Out: see later…] Long-lived TCP connection Transport Layer 117

Response to Time. Out vs. 3 dup. ACKs Philosophy: r After 3 dup ACKs: Response to Time. Out vs. 3 dup. ACKs Philosophy: r After 3 dup ACKs: m Cong. Win is cut in half (multiplicative decrease) m window then grows linearly (additive increase) r After Timeout: m Restart connection m Cong. Win set to 1 MSS; m Exponential build up m Up to a threshold, then grows linearly (AIMD) • Duplicate ACK sent if segments received out of order (maybe some lost) network delivers some segments • Timeout happens if no segments get thru no connection need drastic response (restart) • Restart is like start except threshold… • What’s the threshold? Transport Layer 118

Restart Threshold Restart is like slow start, except exponential ramp-up ends at threshold (not Restart Threshold Restart is like slow start, except exponential ramp-up ends at threshold (not failure), then switch to congestion avoidance (AIMD) Idea: slow-down before reaching rate (Cong. Win) which (may have) caused congestion Implementation: r Variable Threshold r At loss event, Threshold is set to 1/2 of Cong. Win just before loss event Transport Layer 119

TCP Round Trip Time and Timeouts Q: how to set TCP timeout value? RTT= TCP Round Trip Time and Timeouts Q: how to set TCP timeout value? RTT= Round Trip Time r longer than RTT m but RTT varies r too short: premature timeout m unnecessary retransmissions r too long: slow reaction to segment loss Q: how to estimate RTT? r Sample. RTT: measured time from segment transmission until ACK receipt m ignore retransmissions r Sample. RTT will vary, want estimated RTT “smoother” m average several recent measurements, not just current Sample. RTT Timeout. Interval = Estimated. RTT + Safety. Margin Transport Layer 122

Difficulties to `measure delay` r Delayed (accumulated) Acks r For retransmitted segments, can’t tell Difficulties to `measure delay` r Delayed (accumulated) Acks r For retransmitted segments, can’t tell whether acknowledgement is response to original transmission or retransmission m Don’t estimate delay from retransmissions r Network conditions may change suddenly m Yet don’t overreact to impact of burst traffic r Solution applicable to many situations of estimating dynamic quantities! Transport Layer 123

TCP Round Trip Time and Timeout Estimated. RTT = (1 - )*Estimated. RTT + TCP Round Trip Time and Timeout Estimated. RTT = (1 - )*Estimated. RTT + *Sample. RTT r Exponential weighted moving average r influence of past sample decreases exponentially fast r typical value: = 0. 125 Usually: symmetric, no queuing delay Transport Layer 124

TCP Timeout: Safety Margin Setting the timeout r Estimted. RTT plus “safety margin” m TCP Timeout: Safety Margin Setting the timeout r Estimted. RTT plus “safety margin” m large variation in Estimated. RTT -> larger safety margin (usually changes in queuing delay) r First: estimate how much Sample. RTT deviates from Estimated. RTT: Dev. RTT = (1 - )*Dev. RTT + *|Sample. RTT-Estimated. RTT| (typically, = 0. 25) Then set timeout interval: Timeout. Interval = Estimated. RTT + Safety. Margin = Estimated. RTT + 4*Dev. RTT Transport Layer 125

Summary: TCP Congestion Control r Initially: no (or: infinite) Threshold r When Cong. Win Summary: TCP Congestion Control r Initially: no (or: infinite) Threshold r When Cong. Win is below Threshold, sender in slow-start, rapid accelerate phase, window grows exponentially (every Ack increments Cong. Win) r When Cong. Win is above Threshold, sender is in congestion-avoidance phase, window grows linearly. r When a triple duplicate ACK occurs, Threshold set to Cong. Win/2 and Cong. Win set to Threshold (multiplicative decrease). r When timeout occurs, Threshold set to Cong. Win/2 and Cong. Win is set to 1 MSS (restart) Transport Layer 126

Lecture 7 outline r TCP Overview r TCP Timeouts r TCP Flow Control r Lecture 7 outline r TCP Overview r TCP Timeouts r TCP Flow Control r 3. 6 Principles of congestion control r 3. 7 TCP congestion control r TCP Fairness Transport Layer 127

TCP Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, TCP Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K TCP connection 1 TCP connection 2 bottleneck router capacity R New (non-TCP) protocols should be `friendly, well behaved` towards TCP… Transport Layer 128

Why is TCP fair? Two competing sessions: r Additive increase gives slope of 1, Why is TCP fair? Two competing sessions: r Additive increase gives slope of 1, as throughout increases r multiplicative decreases throughput proportionally Connection 2 throughput R Total bandwidth Equal bandwidth share line limit line loss: decrease window by factor of 2 congestion avoidance: additive increase D B C A Connection 1 throughput R Transport Layer 129

Steady state throughput area Whenever the total bandwidth exceeds R, one or both senders Steady state throughput area Whenever the total bandwidth exceeds R, one or both senders identify congestion (by receiving 3 dup acks), and reduces rate by half. Connection 2 throughput R Total bandwidth Equal bandwidth share line limit line Threshold line Connection 1 throughput R Transport Layer 130

Fairness (more) Fairness and non-TCP r Multimedia apps often do not use TCP m Fairness (more) Fairness and non-TCP r Multimedia apps often do not use TCP m Admission control instead of fairness r May use UDP: m pump audio/video e. g. at constant rate, tolerate packet loss r Or: new transport protocol (e. g. nonstandard TCP) r Many research areas: (e. g. TCP friendly) Fairness and parallel TCP connections r App may open many parallel connections r Web browsers do this m Esp. pipelined, nonpersistent r Example: link of rate R with 9 connections; m m new app asks for 1 TCP connection, gets rate R/10 new app asks for 9 TCP connections, gets R/2 ! Transport Layer 131

Summary: TCP Services r Reliable Connection management m Setup, teardown r Reliable connection-based communication Summary: TCP Services r Reliable Connection management m Setup, teardown r Reliable connection-based communication r Flow control r Congestion control r Fairness r Efficiency improvements: m Pipelining / Window m Fast retransmit m Others… Transport Layer 132

Chapter 3: Summary r Transport layer services: m Multiplexing, demultiplexing m Reliable data transfer Chapter 3: Summary r Transport layer services: m Multiplexing, demultiplexing m Reliable data transfer m Flow control m Congestion control m Fairness r In the Internet m UDP m TCP Next: r Application layer: HTTP, SMTP, DNS Transport Layer 133