December 7, 2011

Proportional Rate Reduction for TCP

Community

This year marks the 25th anniversary of an event that was once known to all computing enthusiasts, but nowadays is just a dim memory: the NSFnet "congestion collapse". The collapse was actually not a single incident, but rather a period of time during which the network was so saturated with re-transmission requests that almost no actual content was successfully being transferred across the net.

Reflecting on the event many years later, Professor Dave Mills, who wrote much of the software in the early routers of the Internet, recalled:

The NSFnet routers ran my code, which was horribly overrun by supercomputer traffic. I found the best way to deal with the problem was to find the supercomputer elephants and shoot them. [ ... ]

The NSFnet meltdown occured primarily because the fuzzball routers used smart interfaces that retransmitted when either an error occured or the receiver ran dry of buffers. The entire network locked up for a time because all the buffers in all six machines filled up with retransmit traffic and nothing could get in or out.

If you're wondering what a "fuzzball router" is, you can read more about it here; essentially, it was a package of software that turned a DEC PDP-11 into a node on the Internet.

Even in 1986, the issue of network congestion was known to be complex and important; several years earlier, John Nagle had written, in RFC 896, that:

Congestion control is a recognized problem in complex networks. We have discovered that the Department of Defense's Internet Protocol (IP) , a pure datagram protocol, and Transmission Control Protocol (TCP), a transport layer protocol, when used together, are subject to unusual congestion problems caused by interactions between the transport and datagram layers. In particular, IP gateways are vulnerable to a phenomenon we call "congestion collapse", especially when such gateways connect networks of widely different bandwidth.

But though Nagle had clearly identified the problem and proposed several possible approaches, the NSFNet collapse brought attention to the problem, and stimulated a large and ongoing effort to implement a solution.

Congestion control is a surprisingly hard problem, but it is easy to state:

A program should use the network as efficiently as possible, so long as that does not impact other users.

From the point of view of the program, what this means is:

Send data as fast as possible, until and unless you get an indication that there is a problem on the network, then slow down. Don't forget to speed back up again after the problem disappears!

In practice, however, this has been exquisitely hard to implement. Over the last quarter century, almost no algorithm has been attacked as hard as this one; after all, if you get this right, all you do is make the Internet work! An indication of just how hard this is comes from a recent review of a set of tools for evaluation of congestion control algorithms:

There are 13 possible congestion control algorithms available in the current 2.6.22 kernel: Reno, Vegas, BIC, CUBIC, Westwood, HTCP, High speed, Hybla, Scalable, Yeah, VENO, Illinois, and Low Priority. Some algorithms are loss-based and focus on measuring packet loss as a way of detecting congestion (BIC, CUBIC, H-TCP, High speed, Scalable); others measure changes in delay to avoid congestion (Vegas) and the newer algorithms attempt to use both loss and delay variance to handle congestion (Yeah, Illinois, VENO).

Many of these algorithms are quite sophisticated, and we casual users of the modern Internet can relax, secure in the knowledge that, in general, congestion control works, and data is in fact transferred.

But a problem this hard is worth studying further, and so it was with considerable interest that I studied this recent work by a team at Google: Proportional Rate Reduction for TCP. Here's the abstract:

Packet losses increase latency for Web users. Fast recovery is a key mechanism for TCP to recover from packet losses. In this paper, we explore some of the weaknesses of the standard algorithm described in RFC 3517 and the non-standard algorithms implemented in Linux. We find that these algorithms deviate from their intended behavior in the real world due to the combined effect of short flows, application stalls, burst losses, acknowledgment (ACK) loss and reordering, and stretch ACKs. Linux suffers from excessive congestion window reductions while RFC 3517 transmits large bursts under high losses, both of which harm the rest of the flow and increase Web latency. Our primary contribution is a new design to control transmission in fast recovery called proportional rate reduction (PRR). PRR recovers from losses quickly, smoothly and accurately by pacing out retransmissions across received ACKs. In addition to PRR, we evaluate the TCP early retransmit (ER) algorithm which lowers the duplicate acknowledgment threshold for short transfers, and show that delaying early retransmissions for a short interval is effective in avoiding spurious retransmissions in the presence of a small degree of reordering. PRR and ER reduce the TCP latency of connections experiencing losses by 3-10% depending on the response size. Based on our instrumentation on Google Web and YouTube servers in U.S. and India, we also present key statistics on the nature of TCP retransmissions.

With a 25-year-old problem, advances come slowly. The Google team propose an incremental refinement of current algorithms, but by careful adjustments in critical areas they have identified several changes which they believe can make a substantial difference.

One problem with trying to improve the congestion control algorithms is that, at this point, there are several billion devices inter-operating on the Internet, and that's a lot of code to upgrade! The Google team are optimistic that their new algorithm will improve the overall network behavior, but first it has to be deployed. The latest word is that "this code is being deployed more widely on Google's servers; it also has been accepted for merging during the 3.2 development cycle."

In the field of computer science, some problems are simply solved, while others have proved stubbornly resistant. Congestion control is one of those tough problems; we keep chewing away at it, making progress slowly and carefully. It will be fascinating to follow the deployment of the new congestion control algorithms to see what the results are.