Optimizing the TCP congestion avoidance parameters for gigabit networks
TCP is an old protocol. Its original RFC dates back from 1981. Over time it has been tweaked and tuned in order to keep up with ever increasing demands. In this post I describe how one small tweak can make a big change in performance when using TCP on a 10Gbit Ethernet link.
Using an experimental build of the ByteBlower I configured a TCP flow with settings that allow for maximum throughput:
- Initial Receive Window: 65535
- Receive Window Scale Factor: 8 (multiply with 256)
Running this test on a fast LAN produced the following graph:
As you can see the throughput increases very slowly. In fact it takes one full minute in order to reach the 6 Gbit/s peak. Why is the speed increasing so slowly?
The answer has to do with the congestion avoidance parameters used in ByteBlower’s current TCP implementation (before version 2.1, to be more precise).
ByteBlower implements TCP congestion avoidance according to RFC 5681 which uses a combination of four algorithms: slow start, congestion avoidance, fast retransmission and fast recovery. During slow start (the initial phase) the congestion window increases exponentially until a certain threshold is reached. This threshold is called the “slow-start threshold”. From then on the second algorithm, congestion avoidance, takes over. During congestion avoidance the congestion window grows linearly.
It turns out that in above graph the slow-start phase is already finished before the first measurement! The climbing line shows linear growth of the congestion window during the congestion avoidance phase. Also note that no loss has occurred, which means that the fast retransmission and fast recovery algorithms did not come into play.
Basically the problem is that we switched to the congestion avoidance phase too soon. We want to stay in slow-start a little longer. This can be achieved by increasing the initial slow-start threshold value.
ByteBlower uses an initial slow-start threshold of 65535. Historically this value was used by many TCP implementations because it is the largest possible advertised window (when not using the TCP Window Scale option). However, RFC 5681 does not impose this as a hard limit:
The initial value of ssthresh SHOULD be set arbitrarily high (e.g., to the size of the largest possible advertised window), but ssthresh MUST be reduced in response to congestion. Setting ssthresh as high as possible allows the network conditions, rather than some arbitrary host limit, to dictate the sending rate. In cases where the end systems have a solid understanding of the network path, more carefully setting the initial ssthresh value may have merit (e.g., such that the end host does not create congestion along the path).
Actually a little research reveals that more modern TCP implementations often set the slow-start threshold to infinite (meaning a very high value like 2^31). Using an infinite value means that the congestion window will grow exponentially until a loss occurs. This way the optimal value is found more quickly.
What happens if we increase the ByteBlower’s internal slow-start threshold value from 65535 to infinite (actually 2^31)? Here is the resulting graph:
As you can see the peak throughput is now reached almost instantly!
As said, I tried all of the above with an experimental ByteBlower software version. But there is good news: this feature will be added to the upcoming ByteBlower release 2.1 (expected around end of April). The ByteBlower GUI will allow you to configure the slow-start threshold above the current limit 65535 and mimic the exponentially growing congestion window.
I should mention that there will still be situations where a smaller value for the slow-start threshold value is useful. For example when starting a group of multiple TCP flows you might want each of them to increase their congestion windows more slowly in order to avoid overwhelming the network very quickly.
If you notice that your TCP traffic takes a long time before reaching the peak throughput then consider increasing the slow-start threshold value.