Provided by: bpftune_0.0~git20250314.8fd59cc-1_amd64 

NAME
BPFTUNE-TCP-CONN - TCP connection bpftune plugin for auto-selection of congestion control algorithm
DESCRIPTION
The TCP connection algorithm tuner sets congestion control algorithm on TCP sockets. Linux uses cubic
by default, and it works well in a wide range of settings.
However, in situations where losses are observed, it can underestimate network capacity and as a
result throughput can drop excessively. In such cases, BBR is a good fit since it continuously
estimates bottleneck bandwidth and attempts to fit the congestion algorithm to it.
When we have limited information about a remote host - i.e. we have not had >
REMOTE_HOST_MIN_INSTANCES connections involving it, the only auto-selection involved is to use BBR in
cases where loss rates exceed 1/(2^DROP_SHIFT) (1.5%) of the packet sent rate - in such cases, BBR
performs better than other congestion control algorithms.
For cases where we connect multiple times we can try different algorithms to select the best.
In selecting the appropriate congestion control algorithm, a reinforcement learning-based method is
used whereby we choose the congestion control algorithm that best fits the optimal bandwidth delay
product (BDP):
BDP = BottleneckBandwidth * MinRoundTripTime
The algorithm works as follows; BPF maintains a map of metrics keyed by remote IP address. For each
remote IP address, we track the minimum RTT observed across all TCP connections and the max bandwidth
observed. The former tells us - as closely as we can determine - what the true RTT of the link is.
The latter estimates the bandwidth limit of the link. Knowing both of these allows us to determine
the optimum operating state for a congestion control algorithm, where we feed the pipe enough to reach
bandwidth limits but do not overwhelm it.
Tracking both of these allows us to determine that optimum BDP, so any loss function we use for
optimization should drive us towards congestion control algorithms that realize that optimal BDP by
being as close as possible to the minimum RTT and as close as possible to the maximum packet delivery
rate. We cannot use raw BDP alone because it is composed of the delivery rate and the RTT, so instead
the metric used is:
(current_min_rtt - overall_min_rtt)*S/overall_min_rtt +
(overall_max_delivery_rate - cong_alg_max_delivery_rate)*S/overall_max_delivery_rate
Both denominators are scaled by a scaling factor S to ensure integer division yields nonzero values.
See ../src/tcp_conn_tuner.h for the metric compuatation.
Note that while we use the current RTT for the connection, we use the maximum delivery rate observed
for the congestion algorithm to compare with the overall maximum. The reasoning here is that because
the delivery rate fluctuates so much for different connections (depending on service type etc), it is
too unstable to use it on a per-connection basis. RTT is less variable across connections so we can
use the current RTT in metric calcuation.
For a TCP connection with optimal BDP (minimum RTT + max delivery rate), the loss function yields 0.
Otherwise it yields a positive cost. This is used to update the cost for that congestion control
algorithm via the usual reinforcement learning algorithm, i.e.:
cong_alg_cost = cong_alg_cost +
learning_rate*(curr_cost - cong_alg_cost)
We use an epsilon-greedy approach, whereby the vast majority of the time the lowest-cost algorithm is
used, but 5% of the time we randomly select an algorithm. This ensures that if network conditions
change we can adapt accordingly - without this, we can get stuck and never discover that another
algorithm is doing better.
How does this work in practice? To benchmark this we tested iperf3 performance between network
namespaces on the same system, with a 10% loss rate imposed via netem. What we see is that bpftune
converges to using BBR:
IPAddress CongAlg Metric Count Greedy MinRtt MaxRtDlvr
192.168.168.1 cubic 2338876 9 9 3 1737
192.168.168.1 bbr 923173 61 59 3 10024
192.168.168.1 htcp 2318283 5 4 3 620
192.168.168.1 dctcp 3506360 3 1 9 160
Note that we selected the BBR congestion control algorithm 61 out of 78 times and its associated cost
was less than half of that of other algorithms. This due to it exhibiting the maximum delivery rate
and lowest RTT.
iperf3 performance also improved as a result of selecting BBR, from a baseline of 58MBit/Sec (running
the Linux default cubic algorithm) to 490MBit/Sec running bpftune and auto-selecting BBR.
So this approach appears to find the right answer and converge quickly under loss conditions; what
about normal network conditions?
We might worry that grounding our model in assumptions closely tied to BBR's design might unduly
favour BBR in all circumstances; do we see this in practice outside of conditions where BBR is
optimal?
Thankfully no; we see a convergence to dctcp as the optimal congestion control algorithm; again it has
the maximum delivery rate and minimum RTT:
IPAddress CongAlg Metric Count Greedy MinRtt MaxRtDlvr
192.168.168.1 cubic 1710535 6 4 3 8951
192.168.168.1 bbr 2309881 1 1 7 206
192.168.168.1 htcp 3333333 3 3 3 8784
192.168.168.1 dctcp 1466296 71 70 3 9377
Note however that it is a close-run thing; the metric for cubic is close and it matches dctcp for
minimum RTT (3us) and maximum delivery rate is close (9377 for dctcp, 8951 for cubic).
References:
BBR: Congestion-Based Congestion Control
<https://queue.acm.org/detail.cfm?id=3022184>
SEE ALSO
bpf(2), bpftune(8),
BPFTUNE-TCP-CONN(8)