文档地址
https://tools.ietf.org/id/draft-cardwell-iccrg-bbr-congestion-control-00.html
https://datatracker.ietf.org/doc/html/draft-cardwell-iccrg-bbr-congestion-control-00
虽说这只是一个draft,但是总体没差(主要是文档美观,看着心情好)。
Table of Contents
Introduction
Terminology
BBR.send_quantum: The maximum size of a data aggregate scheduled and transmitted together.
关于round trip:
BBR里说到持续一个rtt时间时,需要注意区分究竟是持续一个packet-timed round trip还是持续一个wall clock time round trip。前者是持续一个自然的rtt,也就是收到发出去的包的ack,后者则是计时经过rtt这么久。
在PROBE_BW阶段,只有6个pacing_gain为1的phase是需要保持1个wall clock time的round trip。
对于pacing_gain为1.25的带宽上探阶段,退出条件为:出现丢包或者inflight达到1.25BDP。
对于pacing_gain为0.75的排空阶段,退出条件为:持续时间超过一个wall clock time的rtt或者inflight小于等于BDP。
Desing Overview
Network Path Model
Target Operating Point
Control Parameters
State Machine Design Overview
State Transition Diagram
|
V
+---> Startup ----+
| | |
| V |
| Drain ----+
| | |
| V |
+---> ProbeBW -----+
| ^ | |
| | | |
| +----+ |
| |
+---- ProbeRTT <---+
Algorithm Organization
Initialization
BBROnConnectionInit():
BBRInit()
Per-ACK Steps
BBRUpdateOnACK():
BBRUpdateModelAndState()
BBRUpdateControlParameters()
BBRUpdateModelAndState():
BBRUpdateBtlBw()
BBRCheckCyclePhase()
BBRCheckFullPipe()
BBRCheckDrain()
BBRUpdateRTprop()
BBRCheckProbeRTT()
BBRUpdateControlParameters()
BBRSetPacingRate()
BBRSetSendQuantum()
BBRSetCwnd()
Per-Transmit Steps
BBROnTransmit():
BBRHandleRestartFromIdle()
Environment and Usage
Detailed Algorithm
Maintaining the Network Path Model
BBR.BtlBw
Delivery Rate Samples for Estimating BBR.BtlBw
有一篇专门讲计算Delivery Rate的文档
BBR.BtlBw Max Filter
BBR的瓶颈带宽过期时间是用的packet timed round trip,瓶颈带宽超过10个round才会过期;
Tracking Time for the BBR.BtlBw Max Filter
连接初始化:
BBRInitRoundCounting():
BBR.next_round_delivered = 0
BBR.round_start = false
BBR.round_count = 0
每发送一个包:
packet.delivered = BBR.delivered
每接收一个ACK:
BBRUpdateRound():
BBR.delivered += packet.size
if (packet.delivered >= BBR.next_round_delivered)
BBR.next_round_delivered = BBR.delivered
BBR.round_count++
BBR.round_start = true
else
BBR.round_start = false
BBR.BtlBw and Application-limited Delivery Rate Samples
Updating the BBR.BtlBw Max Filter
BBRUpdateBtlBw()
BBRUpdateRound()
if (rs.delivery_rate >= BBR.BtlBw || ! rs.is_app_limited)
BBR.BtlBw = update_windowed_max_filter(
filter=BBR.BtlBwFilter,
value=rs.delivery_rate,
time=BBR.round_count,
window_length=BtlBwFilterLen)
BBR.RTprop
Round-Trip Time Samples for Estimating BBR.RTprop
对于一个ack确认多个已发送数据包的情形,BBR取最后发送的那一个包的发送时间来计算RTT,以此来逼近minRtt。
The only divergence from RTT estimation for retransmission timeouts is in the case where a given acknowledgment ACKs more than one data packet. In order to be conservative and schedule long timeouts to avoid spurious retransmissions, the maximum among such potential RTT samples is typically used for computing retransmission timeouts; i.e., SRTT is typically calculated using the data packet with the earliest transmission time. By contrast, in order for BBR to try to reach the minimum amount of data in flight to fill the pipe, BBR uses the minimum among such potential RTT samples; i.e., BBR calculates the RTT using the data packet with the latest transmission time.
BBR.RTprop Min Filter
使用10s作为minRtt的失效时间。
Updating the BBR.RTprop Min Filter
记录每个已发送包的发送时间:
packet.send_time = Now()
每次收到ACK时记录接收时间:
packet.rtt = Now() - packet.send_time
更新minRtt:
BBRUpdateRTprop()
BBR.rtprop_expired =
Now() > BBR.rtprop_stamp + RTpropFilterLen
if (packet.rtt >= 0 and
(packet.rtt <= BBR.RTprop or BBR.rtprop_expired))
BBR.RTprop = packet.rtt
BBR.rtprop_stamp = Now()
BBR Control Parameters
Pacing Rate
计算每个包的发送时间:
next_send_time = Now() + packet.size / pacing_rate
设置初始发送速率:
BBRInitPacingRate():
nominal_bandwidth = InitialCwnd / (SRTT ? SRTT : 1ms)
BBR.pacing_rate = BBR.pacing_gain * nominal_bandwidth
每次收到ACK时更新pacing_rate:
BBRSetPacingRateWithGain(pacing_gain):
rate = pacing_gain * BBR.BtlBw
if (BBR.filled_pipe || rate > BBR.pacing_rate)
BBR.pacing_rate = rate
BBRSetPacingRate():
BBRSetPacingRateWithGain(BBR.pacing_gain)
Send Quantum
高带宽情况下实现精准的pacing可能会增加CPU的消耗,因此可以根据带宽调整每次发包的数据量:
BBRSetSendQuantum():
if (BBR.pacing_rate < 1.2 Mbps)
BBR.send_quantum = 1 * MSS
else if (BBR.pacing_rate < 24 Mbps)
BBR.send_quantum = 2 * MSS
else
BBR.send_quantum = min(BBR.pacing_rate * 1ms, 64KBytes)
Congestion Window
Initial Congestion Window
Target cwnd
inflight数总是不能超过target_cwnd。每次收到ACK后都需要更新target_cwnd:
BBRInflight(gain):
if (BBR.RTprop == Inf)
return InitialCwnd /* no valid RTT samples yet */
quanta = 3*BBR.send_quantum
estimated_bdp = BBR.BtlBw * BBR.RTprop
return gain * estimated_bdp + quanta
BBRUpdateTargetCwnd():
BBR.target_cwnd = BBRInflight(BBR.cwnd_gain)
Minimum cwnd for pipelining
为了使得管道被充分利用,BBR提出设置4MSS的cwnd下限。
Modulating cwnd in Loss Recovery
出现重传定时器超时情况时,BBR需要进入loss recovery状态,先将cwnd调整为1,然后再按照下面的Core cwnd Adjustment Mechanism进行调整:
BBR.prior_cwnd = BBRSaveCwnd()
cwnd = 1
进入Fast Recovery时,在第一个round内,保证cwnd与不丢包的发送速率相匹配,从第二个round开始则保证cwnd与2倍不丢包的发送速率相匹配:
BBR.prior_cwnd = BBRSaveCwnd()
cwnd = packets_in_flight + max(packets_delivered, 1)
BBR.packet_conservation = true
在Fast Recovery阶段,每次收到ACK时,
BBRModulateCwndForRecovery():
if (packets_lost > 0)
cwnd = max(cwnd - packets_lost, 1)
if (BBR.packet_conservation)
cwnd = max(cwnd, packets_in_flight + packets_delivered)
上面的packets_delivered表示每次收到ACK时新确认收到的包数,packets_lost为每次收到ACKS新确认的丢包数。
Fast Recovery持续一个round trip后:
BBR.packet_conservation = false
退出Loss Recovery或者Fast Recovery时,需要恢复cwnd:
BBR.packet_conservation = false
BBRRestoreCwnd()
BBRSaveCwnd():
if (not InLossRecovery() and BBR.state != ProbeRTT)
return cwnd
else
return max(BBR.prior_cwnd, cwnd)
BBRRestoreCwnd():
cwnd = max(cwnd, BBR.prior_cwnd)
Modulating cwnd in ProbeRTT
进入ProbeRTT状态时调整cwnd:
BBRModulateCwndForProbeRTT():
if (BBR.state == ProbeRTT)
cwnd = min(cwnd, BBRMinPipeCwnd)
Core cwnd Adjustment Mechanism
BBRSetCwnd():
BBRUpdateTargetCwnd()
BBRModulateCwndForRecovery()
if (not BBR.packet_conservation) {
if (BBR.filled_pipe)
cwnd = min(cwnd + packets_delivered, BBR.target_cwnd)
else if (cwnd < BBR.target_cwnd || BBR.delivered < InitialCwnd)
cwnd = cwnd + packets_delivered
cwnd = max(cwnd, BBRMinPipeCwnd)
}
BBRModulateCwndForProbeRTT()
State Machine
Initialization Steps
连接建立时:
BBRInit():
init_windowed_max_filter(filter=BBR.BtlBwFilter, value=0, time=0)
BBR.rtprop = SRTT ? SRTT : Inf
BBR.rtprop_stamp = Now()
BBR.probe_rtt_done_stamp = 0
BBR.probe_rtt_round_done = false
BBR.packet_conservation = false
BBR.prior_cwnd = 0
BBR.idle_restart = false
BBRInitRoundCounting()
BBRInitFullPipe()
BBRInitPacingRate()
BBREnterStartup()
STARTUP
Startup Dynamics
BBREnterStartup():
BBR.state = Startup
BBR.pacing_gain = BBRHighGain
BBR.cwnd_gain = BBRHighGain
Estimating When Startup has Filled the Pipe
初始化:
BBRInitFullPipe():
BBR.filled_pipe = false
BBR.full_bw = 0
BBR.full_bw_count = 0
在每个round开始时检查是否已经有3个round带宽没有上升25%:
BBRCheckFullPipe():
if BBR.filled_pipe or
not BBR.round_start or rs.is_app_limited
return // no need to check for a full pipe now
if (BBR.BtlBw >= BBR.full_bw * 1.25) // BBR.BtlBw still growing?
BBR.full_bw = BBR.BtlBw // record new baseline level
BBR.full_bw_count = 0
return
BBR.full_bw_count++ // another round w/o much growth
if (BBR.full_bw_count >= 3)
BBR.filled_pipe = true
为什么是3个round?这个和tcp的receive window autotuning机制有关:
BBR waits three rounds in order to have solid evidence that the sender is not detecting a delivery-rate plateau that was temporarily imposed by the receive window. Allowing three rounds provides time for the receiver’s receive-window autotuning to open up the receive window and for the BBR sender to realize that BBR.BtlBw should be higher: in the first round the receive-window autotuning algorithm grows the receive window; in the second round the sender fills the higher receive window; in the third round the sender gets higher delivery-rate samples. This three-round threshold was validated by YouTube experimental data.
Drain
为了减少STARTUP阶段引起的链路堆积,需要进入Drain阶段,在持续一个RTT或者inflight小于等于BDP时退出DRAIN。
BBREnterDrain():
BBR.state = Drain
BBR.pacing_gain = 1/BBRHighGain // pace slowly
BBR.cwnd_gain = bbr_high_gain // maintain cwnd
BBRCheckDrain():
if (BBR.state == Startup and BBR.filled_pipe)
BBREnterDrain()
if (BBR.state == Drain and packets_in_flight <= BBRInflight(1.0))
BBREnterProbeBW() // we estimate queue is drained
问题,为什么DRAIN阶段的pacing_gain是1/BBRHighGain?
ProbeBW
Gain Cycling Dynamics
pacing_gain按照[1.25,0.75,1,1,1,1,1,1]循环。cwnd_gain始终为2。
Gain Cycling Randomization
为了公平性,进入ProbeBW时,从0,2,3,4,5,6,7里随便选一个phase开始。不选pacing_gain为0.75这个phase是因为这个phase是为了减少链路堆积而设计的,如果进入ProbeBW时以0.75的pacing_gain这个phase开始,那么只会降低链路的利用率。
Gain Cycling Algorithm
进入ProbeBW时:
BBREnterProbeBW():
BBR.state = ProbeBW
BBR.pacing_gain = 1
BBR.cwnd_gain = 2
BBR.cycle_index = BBRGainCycleLen - 1 - random_int_in_range(0..6)
BBRAdvanceCyclePhase()
每次收到ACK时,检查是否需要advance cycle phase:
BBRCheckCyclePhase():
if (BBR.sate == ProbeBW and BBRIsNextCyclePhase()
BBRAdvanceCyclePhase()
BBRAdvanceCyclePhase():
BBR.cycle_stamp = Now()
BBR.cycle_index = (BBR.cycle_index + 1) % BBRGainCycleLen
pacing_gain_cycle = [5/4, 3/4, 1, 1, 1, 1, 1, 1]
BBR.pacing_gain = pacing_gain_cycle[BBR.cycle_index]
BBRIsNextCyclePhase():
is_full_length = (Now() - BBR.cycle_stamp) > BBR.RTprop
if (BBR.pacing_gain == 1)
return is_full_length
if (BBR.pacing_gain > 1)
return is_full_length and
(packets_lost > 0 or
prior_inflight >= BBRInflight(BBR.pacing_gain))
else // (BBR.pacing_gain < 1)
return is_full_length or
prior_inflight <= BBRInflight(1)
prior_in_flight和in_flight的区别:prior_in_flight是处理ack之前的in_flight,即prior_in_flight比in_flight要大一点。
Restarting From Idle
从Idle状态恢复时,BBR直接以估计带宽为pacing rate发送数据:
BBRHandleRestartFromIdle():
if (packets_in_flight == 0 and C.app_limited)
BBR.idle_start = true
if (BBR.state == ProbeBW)
BBRSetPacingRateWithGain(1)
ProbeRTT
每次收到ACK,BBR都需要检查是否进入ProbeRTT状态,
BBRCheckProbeRTT():
if (BBR.state != ProbeRTT and
BBR.rtprop_expired and
not BBR.idle_restart)
BBREnterProbeRTT()
BBRSaveCwnd()
BBR.probe_rtt_done_stamp = 0
if (BBR.state == ProbeRTT)
BBRHandleProbeRTT()
BBR.idle_restart = false
BBREnterProbeRTT():
BBR.state = ProbeRTT
BBR.pacing_gain = 1
BBR.cwnd_gain = 1
BBRHandleProbeRTT():
/* Ignore low rate samples during ProbeRTT: */
C.app_limited =
(BW.delivered + packets_in_flight) ? : 1
if (BBR.probe_rtt_done_stamp == 0 and
packets_in_flight <= BBRMinPipeCwnd)
BBR.probe_rtt_done_stamp =
Now() + ProbeRTTDuration
BBR.probe_rtt_round_done = false
BBR.next_round_delivered = BBR.delivered
else if (BBR.probe_rtt_done_stamp != 0)
if (BBR.round_start)
BBR.probe_rtt_round_done = true
if (BBR.probe_rtt_round_done and
Now() > BBR.probe_rtt_done_stamp)
BBR.rtprop_stamp = Now()
BBRRestoreCwnd()
BBRExitProbeRTT()
BBRExitProbeRTT():
if (BBR.filled_pipe)
BBREnterProbeBW()
else
BBREnterStartup()
如果连接是从idle状态恢复的(BBR.idle_restart is true),那么即使minRtt已经expire也不会进入ProbeRTT,因为连接处于idle状态时已经把链路排空了。
近期评论