Lec3 Transport layer
传输层
OVERVIEW
Understand priciples behind transport layer services:
- multiplexing, demultiplexing 复用(源段)和解复用(目标段)
- reliable data transfer 可靠数据传输原理
- flow control 流量控制
- congestion control 堵塞控制
Learn about Internet transport layer protocols:
- UDP: connection-oriented
- TCP: connection-oriented reliable transport
- TCP congestion control
Summary
- principles behind transport layer services
- multiplexing, demultiplexing
- reliable data transfer
- flow control
- congestion control
- instantiation, implemetation in the Internet
- UDP
- TCP
Transport-layer services
Service and protocols
- provide logical communication between application processes running on different hosts
- transport protocols actions in end systems:
- sender:breaks applications messages into segments which as messages passes to network layer
- receiver: reassembles segments into messages, passes to application layer
- two transport protocols available to Internet applications
- TCP,UDP
Transport vs. network layer services and protocols
- network layer: logical communication between host
- Similar to postal service that send a letter to a postal address
主机与进程之间的区别
- transport layer: extends network service to provide a logical communication between processes
- relies on, enhances, network layer services 是一种对网络层的细分
Transport Layer Action
传送端:
- 通过应用层的报文
- 决定字段头的值
- 创建字段
- 将字段传送给IP
接收端:
- 从IP接受字段
- 查看段头的值
- 提供应用层报文
- 由套接字解复用报文到应用
Two principal Internet transport protocols
- TCP: 传输控制协议
- 可靠, in-order delivery字节流
- 多复用解复用
- 流量控制
- congestion setup 阻塞控制
- connection setup
- UDP: 用户数据报协议
- 不可靠, unordered delivery
- no-frills extension of “best-effort” IP
两种服务无法改变延迟,带宽
(吞吐量)
Multiplexing/demultiplexing 复用和解复用
在传送端复用,在接收端解复用
- Multiplexing: 从多个套接字那里处理数据, 添加传输头
- 用头信息来匹配接到的字段到正确的套接字
How demultiplexing works
- host receives IP datagrams
- each datagram has source(源段) IP address, destination(目标段) IP address
- each datagram carrie one transport-layer segment 数据报包含传输层字段
- each segment has source, destination port number
- host uses IP addresses & port numbers to direct segment to appropriate socket
主机通过 源端IP地址和目标端口号 来指导字段匹配套接字
Connectionless demultiplexing 无线连接复解用
- 创建套接字的时候必须规定host-local 端口#:
1
DatagramSocket mySocket1 = new DatagramSocket(12534);
- 创建数据报来传输到UDP套接字时必须规范
- 目标端的IP地址
- 目标端的端口#
当接收主机接收到 UDP字段:
- 检查字段目标源端口
- direct UDP segment to socket with that port#
IP/UDP datagrams with same dest.port#, but different source IP address and/or source port numbers will be directed to same socket at receiving host
目标端端口一样, 源端IP和端口被引导到同一个接受主机的套接字
- example
Connection-oriented demultiplexing 面向连接的解复用
- TCP socket identified by 4-tuple:
- source IP address
- source port number
- dest IP address
- dest port address
- server may support many simultaneous TCP sockets:
- each socket identified by its own sockets
- each socket associated with a different connecting client
- example
Summary
- multiplexing, demultiplexing: based on segment, datagram header field values
- UDP: demultiplexing using destination port number(only)
- TCP: demultiplexing using 4-tuple: source and destination IP addresses, and port number
Connection-oriented transport: UDP
UDP:User Datagram Protocol
- “no frills,” “bare bones” Internet transport protocol
- “best effort” service, UDP segments may be:
- lost
- delivered out-of-order
- connectionless:
- no handshaking between sender, receiver
- each UDP segment handled independently of others 彼此独立
Wbhy is there a UDP
- no connection establishment(which can add RTT delay)
- simple: connection state at sender, receiver
- small header size
- no congestion control
- UDP can blast away as fast as desired
- can function in the face of congestion
UDP: User Datagram protocol
- UDP use:
- streaming multimedia apps(loss tolerant, rate sensitive) 流媒体app
- DNS
- SNMP
- HTTP/3
- 如果需要在UDP上稳定传输
- add needed reliability at application layer
- add congestion control at application layer
- RFC768
UDP segment header
![](https://pic.imgdb.cn/item/617bad302ab3f51d91fcccd5.png)
UDP checksum**
用来检测在传输字段里的错误
Internet checksum
Goal: detect errors in transmitted segment
Sender:
- treat contents of UDP segment as sequence of 16-bit integers
- checksum: addition (one’s complement sum) of segment content
- checksum value put into UDP checksum field
Receiver
- compute checksum of received segment
- check if computed chechsum equals checksum field value:
- not equal - error detected
- equal - no error detecrted. But maybe errors notetheless? More later ….
- Example
- weak proteciton!
Summary: UDP
- “no frills” protocol:
- segments may be lost, delivered out of order
- best effort service: “send and hope for the best”
- UDP has its plusses:
- no setup/handshaking needed(no RTT incurred)
- can function when network service is compromised
- helps with reliability(checksum)
- Build additional functionality on top of UDP in application layer(e.g., HTTP/3)
Principle of reliable data transfer
需要在进入非可靠channel之前加安全协议
可靠传输协议的复杂度由非可靠channel的特征决定
接收端和发送端彼此不知道对方的状态除非经由报文沟通
protocol(rdt): interfaces
Reliable data transfer: getting started
We will:
- incrementally develop sender, receiver sides of rdt protocol
逐步开放发送端和接收端的rdt协议 - consider only unidirectional data transfer
- 但是控制信息会双向流动
- use finite state machines(FSM) to specify sender, receiver
rdt1.0: reliable transfer over a reliable channel
发送方:对数据进行接收封装打走
接收方:接受packet解封装然后传给上层用户 - underlying channel perfectly reliable底层通道非常可靠
- no bit errors
- no loss of packets
- separate FSMs for sender, receiver:
- sender sends data into underlying channel
- receiver reads data from underlying channel
rdt2.0:
channel with bit errors
发送方留副本,接收方检验
- underlying channel may flip bits in packet
- checksum(e.g., Internet checksum) to detect bit errors
在1.0基础上加伤checksum
- checksum(e.g., Internet checksum) to detect bit errors
- the question: how to recover from errors?
channel with bit errors
- acknowledgements(ACKs): receiver explicitily tells sender that pkt receiverd OK
- negative acknowledgements(NAKs): receiver explicitly tells sender that pkt had errors
- sender retransmits pkt on receipt of NAK 收到错误消息后重新发送
- stop and wait
发送端发送一个包裹, 然后等待接收端反应
FSM specifications
Note: 接收端和发送端彼此不知道对方的状态除非经由某种方式沟通
- 所以需要协议来控制
- 发送方:
- 等待上层的调用
- 等待ACK 或者 NAK
- 接收方:
- 等待来自下层的调用
operation with no errors
corrupted packet scenario
has a fatal flaw!
when ACK/NAK corrupted happened
- sender doesn’t know what happened at receiver
- can’t just retransmit: possible duplicate
handling duplicates:
- sender retransmits current pkt if ACK/NAK corrupted
- sender adds sequence number to each pkt
- receiver discards(doesn’t deliver up) duplicate pkt
发送端发送一个包裹, 然后等待接收端反应
rdt2.1
sender, handling grabled ACK/NAKs 停止等待协议
发送方:初始化等待上层调用零 ——> 等待 ACK/NAK (没收到/NAK- 重发) ——> 等待上层调用1 ——> 等待 ACK/NAK (没收到/NAK- 重发)
接收方: 等待下层调用0 成功的话提取data 并返回ACK 如果重复数据 重新发送 ——> 等待下层调用1
discussion
sender:
- seq # added to pkt
- two seq. #s(0,1) will suffice. 只需要一位就能区分新老
- check if received ACK/NAK corrupted 如果崩溃了重新发送之前的
- twice as many states
- state must “remember” whether “expected” pkt should have seq # of 0 or 1
receiver:
- if packet is duplicate 重复了丢掉现在的 重新发送 ack/nak
- state indicates whether 0 or 1 is expected pkt seq#
- note: receiver can notknow if its last ACK/NAK received OK at sender
rdt2.2
a NAK-free protocol
- same functionality as rdt2.1, using ACKs only
- insted of NAK, receiver sends ACK for last pkt received OK
- receiver must explicitly include seq# pf pkt being ACKed
- duplicate ACK at sender results in same action as NAK: retransmit current pktTCP use this approach to be NAK-free
sender, receiver fragments
rdt3.0: channels with errors and loss
New channel assumption: underlying channel can also lose packets(data, ACKs)
- checksum, sequence #s, ACKs, retransmissions will be of help … but not quite enough
channel with errors and loss
Approach: sender waits “reasonable” amount of time for ACK
- 如果ACK没有被接收重新发送
- if pkt()or ACK just dlayed:
- retransmission will be duplicate, but seq #s already handles this!
- receiver must specify seq# of packet being ACKed
- use countdown timer to interrupt after “reasonable” amount of time
sender
in action
Principles of reliable data transfer - Pipelined protocols
rdt3.0
Performance of rdt3.0
- U sender: utilization - fraction of time sender busy sending
- example: 1Gbps link, 15 ms prop. delay, 8000 bit packet
- time to transmit packet into channel
- time to transmit packet into channel
stop-and-wait operation
RTT
Pipelined protocols operation
piplining: sender allows multiple, “in-flight”, yet-to-be-acknoledged packets
- range of sequence numbers must be increased
- buffering at sender and/or receiver
Pipelining: increased utilization
Go-Back-N: 回退n步协议
Sw = 1 Rw = 1 等停协议 rdt 3.0
流水线协议:
Sw > 1 Rw = 1 GBN协议
Sw > 1 Rw > 1 选择性重发协议
发送缓冲区: 发送方发送完之后将分组放在缓冲区中以用于鉴错重发超时重发
- 内存中的一个区域
- 连续向对方发送多个未经确认分组, 上线就是缓冲区大小
发送窗口: 发送缓冲区的一个范围
- 已发送但是未确认的分组构成的子集 <= 发送缓冲区的值
- 上限前沿和后沿距离达到发送缓冲区的值
- 确认后(ACK),后沿向前滑动 -> 缓冲区向前滑动
- 后沿极限是和前沿重合
sender
- sender: “window” of up to N, consecutive transmitted but unACKed pkts
- k-bit seq # in pkt header
- k-bit seq # in pkt header
- cumulative(积累) ACK: ACK(n) all packets up to, including seq # n
- on receiving ACK(n): move window forward to begin at n+1
- timer for oldest in-flight packet
- timeout(n): retransmit packet n and all higher seq # packets in window
接收缓冲区:
接收窗口:
- Rw = 1 一次次滑动只有接到对应分组才能继续滑动
- 对接到的顺序的窗口进行确认
- 只能顺序接收
- Rw > 1
- 给接收到的分组确认
- 接受窗口向前滑动一格
- 可以接受窗口内任何一个分组
- 只有最小的序列分组接收之后才能继续滑动
receiver
- ACK-only: always send ACK for correctly-received packet so far, with highest in-order seq#
- may generate duplicate ACKs
- need only remember rcv_base
- on receipt of out-of-order packet:
- can discard(don’t buffer) or buffer: an implementation decision
- re-ACK pkt with highest in-order seq#
in action
Selective repeat
- receiver indually acknowledges all correctly received pactet
- buffers packets, as needed, for eventual in-order delivery to upper layer
- sneder times-out/retransmits individually for unACKed packets
sender, receiver, windows
sender and receiver
- sender
- data from above:
- if next available seq# in window, send packet
- timeout(n):
- resend packet n, restart timer
- ACK(n) in [sendbase, sendbase+N]:
- mark packet n as received
- if n smallest unACKed packet, advance window base to next unACKed seq#
- data from above:
- receiver
- packet n in [rcvbase, rcvbase+N-1]
- send ACK(n)
- out-of-order: buffer
- in-order: deliver (also deliver buffered, in-order packets), advance window to next not-yet-received packet
- packetn in [rcvbase-N, rcvbase-1]
- ACK(n)
- otherwise
- ignore
- packet n in [rcvbase, rcvbase+N-1]
Selective Repeat in action
Connection-oriented transport: TCP
TCP: overview RFCs: 793, 1122, 2018, 5681, 7323
- point-to-point:
- one sender, one receiver
- reliable, in-order byte steam:
- no “message boundaries”
- full duplex data:
- bi-directional data flow in same connection
- MSS: maximum segment size
- cumulative ACKs
- pipelining
- TCP congestion and flow control set window size
- connection-oriented:
- handshaking (exchange of control messages) initializes sender, receiver state before data exchange
- flow controlled:
- sender will not overwhelm receiver
TCP segment structure
TCP sequence numbers, ACKs
Sequence numbers:
- byte stream “number” of first byte in segment’s data
Acknowledgements:
set TCP timeout
- longer than RTT, but RTT varies!
- too short: premature timeout, unnecessary retransmissions
- too long: slow reaction to segments loss
estimate RTT
- SampleRTT: mesured time from segment transmission until ACK receipt
- ignore retransimissions
- SampleRTT: will vary, want estimated RTT “smoother”
- average several recent measurements, not ust current SampleRTT
- 指数加权移动平均
- 过去样本呈指数衰减
- average several recent measurements, not ust current SampleRTT
- SampleRTT: mesured time from segment transmission until ACK receipt
timeout interval: EstimatedRTT plus “safety margin”
Overview
- TCP creates rdt service on top of IP’s unreliable service
- pipelined segments
- cumulative acks
- single retransmission timer
- retransmissions triggered by:
- timeout events
- duplicate acks
- simplified TCP sender:
- ignore duplicate acks
- ignore flow control, congestion control
TCP Sender(simplified)
event: data received from application
- create segment with seq #
- seq # is byte-stream number of first data byte in segment
- start timer if not already running
- think of timer as for oldest unACKed segment
- expiration interval: TimeOutInterval
event: timeout
- retransmit segment that caused timeout
- restart timer
event: ACK received
- if ACK acknowledges previously unACKed segments
- update what is known to be ACKed
- start timer if there are still unACKed segments
retransmission scenarios
![](https://pic.imgdb.cn/item/617ef3aa2ab3f51d910414ff.png)
累加ACK 覆盖了先前丢失的ACK
TCP Receiver: ACK generation
TCP fast retransmit
if sender receives 3 additional ACKs for same data(“triple duplicate ACKs”), resned unACKed segment with smallest seq#
- likely that unACKed segment lost, so don’t wait for timeout
Receipt of three duplicate ACKs indicates 3 segments received after a missing segment –lost segment is likely. So retransmit!
Flow control
当网络层传输信息的速度大于应用层从套接字缓冲区移除信息的速度
flow control
- receiver controls sender, so sender won’t overflow receiver’s buffer by transmitting too much, too fast
Application removing data from TCP socket buffers
- TCP receiver “advertises” free buffer space in rwnd field in TCP header
- RcvBuffer size set via socket options (typical dfault is 4096 bytes)
- many operating systems autoadjust RcvBuffer
- sender limits amount of unACKed(“in-flight”) data to received rwnd(free buffer space)
- guarantees receive buffer will not overflow
TCP connection management
before exchanging data, sender/receiver “handshake”:
- agree to establish connection(each knowing the other willing to establish connection)
Agreeing to eatablish a connection
2-way handshake in network
- variable delays
- retransmitted messages due to message loss
- message reordering
- can’t “see” other side
2-way handshake scenarios
TCP 3-way handshake
Closing a TCP connection
- client, sercver each close their side of connection
- send TCP segment with FIN bit = 1
- respond to received FIN with ACK
- on receiving FIN, ACK can be combined with own FIN
- simultaneous FIN exchanges can be handled
Principles of congestion control
Pirnciples of congestion control
Approaches towards congestion control
End-end congestion control
- no explicit(明确的) feedback from network
- congestion inferred from obesrved loss, delay
- approach taken by TCP
Network-assisted congestion control:
- routers provide direct feedback to sending/receiving hosts with flows passing through congested router
- may indicate congestion level or explicitly set sending rate
- TCP ECN, ATM, DECbit protocols
TCP congestion control
AIMD(Additice Increase & Multiplicative Decrease)
- approach: senders can increase sending rate until packet loss(congestion)occurs, then decrease sending rate on loss event
- Additive Increase: increase sending rate by 1 maximum segment size every RTT until loss detected
- Multiplicative Decrease: cut sending rate in half at each loss event
- sending rate is cut in half on loss detected by triple duplicate ACK(TCP Reno)
- Cut to 1 MSS(maximum segment size ) when loss detected by timeout(TCP Tahoe)
- AIMD sawtooth behavior: probing for bandwidth
- a distributed, asynchronous algorithm - has been shown to:
- optimize congested flow rates network wide!
- have desirable stability properties
- a distributed, asynchronous algorithm - has been shown to:
- details
- TCP sending behavior:
- roughly: send cwnd bytes, wait RTT for ACKs, then send more bytes
- TCP rate $\approx$ cwnd/RTT bytes/sec
- roughly: send cwnd bytes, wait RTT for ACKs, then send more bytes
- TCP sender limits transmission
- LastByteSent - LastByteAcked <= cwnd
- cwnd is dynamically adjusted in response to observed network congestion(implementing TCP congestion control)
- TCP sending behavior:
TCP slow start
- when connection begins, increase rate exponentially until first loss event:
- initially cwnd = 1MSS
- double cwnd every RTT
- done by inctementing cwnd for every ACK received
- summary: initial rate is slow, but ramps up exponentially fast
TCP: from slow start to congestion avoidance
TCP fairness
Fairness goal: if K TCP sessions share same bottleneck link of banwidth R, each should have average rate of R/K
Is TCP Fair?
Example: 两个竞争的TCP会话
- additive increase gives slope of 1, as throughout increases
- multiplicative decrease decreases throughput proportionally
Evolution of transport-layer functionality
Evolving transport-layer functionality
QUIC: Quick UDP Internet Connections
- application performance of HTTP
- deployed on many Google servers, apps(Chrome, mobile YouTube app)
- adopts approaches we’ve studied in this chapter for connection establishment, error control, congestion control
- error and congestion control: “Readers familiar with TCP’s loss detection and congestion control will find algorithms here that parallel well-known TCP ones.”
- connection establishment: reliability, congestion control, authentication, encryption, state established in one RTT
- multiple application-level “streams” multiplexed over single QUIC connection
- separate reliable data transfer, security
- common congestion control
- Connection establishment