Sender side congestion avoidance detection

with round-trip times (RTT)

TCPState: Why do I care?

Buffer Bloat

Sub-optimal Speeds

Unusable VOIP

Gamers Lag-out

TCPState: What does it do?

Passively monitors connections

Detects Congestion Control state

     Slow start & Congestion Avoidance

Monitors Round Trip Time (RTT)

TCPState: How does it work?

Reads a "tcpdump" stream...

6:33:38.070777 IP > Flags [S], seq 2752874908, win 29200, options [mss 1452,sackOK,TS val 124077882 ecr 0,nop,wscale 7], length 0

6:33:38.070815 IP > Flags [S.], seq 2463797398, ack 2752874909, win 14480, options [mss 1460,sackOK,TS val 3216011654 ecr 124077882,nop,wscale 7], length 0

6:33:38.126658 IP > Flags [.], ack 1, win 229, options [nop,nop,TS val 124077938 ecr 3216011654], length 0

This is a 3 way handshake

TCPState: What was all that?

Basic TCP connection information-

6:33:38.126658 IP > Flags [.], ack 1, win 229, options [nop,nop,TS val 124077938 ecr 3216011654], length 0

Time                         Source Address              Destination          Flags,  Ack sent, Window Size, TCP Options (TS val & ecr),                                 length

TCPState: What TCPDUMP information is used?


     Source/Destination address


     TCP Options

          TS val (Transmitting host time stamp)

          ecr (Last seen time stamp from the remote host)

TCPState: How to detect state?

1. After 3-way handshake, Slow Start is assumed (not always the case)

2. While in Slow Start, Transition to Congestion Avoidance 

     a. If the count of packets (TCPDump entries), is fewer than the previous RTT

     b. senderPacketsPerRtt <= lastSenderPacketsPerRtt

3. While in Congestion Avoidance, Transition to Slow Start

     a. If the number of sequences sent is greater then the previous RTT sequence count squared.

     b. sequenceCountPerRtt > lastSequenceCountPerRtt * lastSequenceCountPerRtt

TCPState: How to RTT is calculated?

Two RTT measurements are used: Initial and On-going

Initial: The difference in the time between the SYN-ACK and ACK (3-way handshake)

On-going (Requires TCP Time stamp option to be enabled):

     1. Receiver transmits an ACK with local time (TCP Option: TS val)

     2. Sender receives packet and echos the time stamp back (TCP Option: ecr)

     3. TCPState subtracts the time of event #1 from event #2

Note: RTT updates are shown when there is a 50+% variance in the RTT

TCPState: What do I get?

00:50:11.450030 : Handshake [Initial SYN]

00:50:11.693338 : Slow Start [Completion ACK]

00:50:11.693338 : Handshake RTT: 243,263 microseconds

00:50:11.757824 : Estimated RTT: 57,304 microseconds

00:50:12.045537 : Congestion Avoidance [Window Stabilized]

00:50:12.136465 : Estimated RTT: 89,987 microseconds

00:50:12.750778 : Estimated RTT: 459,172 microseconds

00:50:12.807348 : Estimated RTT: 56,516 microseconds

00:50:13.166491 : Estimated RTT: 122,550 microseconds

00:50:13.206913 : Estimated RTT: 61,031 microseconds

00:50:13.286681 : Estimated RTT: 127,131 microseconds

00:50:13.644705 : Estimated RTT: 58,571 microseconds

00:50:18.045444 : Estimated RTT: 90,860 microseconds

00:50:42.469547 : Closing [Sender FIN]

00:50:42.474204 : Closed [Receiver FIN]

TCPState: Prove it!

Sender-side JProbe(s) Log:




          -Name depends on Congestion Control in use

TCPState: What does JProbe get us?

Apr 16 14:50:11 advip kernel: [57901.315806] bictcp_cong_avoid

Apr 16 14:50:11 advip kernel: [57901.315971] tcp_slow_start

We get the time and control function used

TCPState: How does that prove anything?

Synchronize clocks and merge TCPDump & JProbe

19:49:40.657956 : Handshake [Initial SYN]

19:49:40.718236 : Slow Start [Completion ACK]

19:49:40.718236 : Handshake RTT: 60,231 microseconds

19:49:40.781933--------tcp_slow_start (09:49:40.398704[08482])

19:49:40.781933--------tcp_slow_start (09:49:40.398705[84034])

19:49:40.781933--------bictcp_cong_avoid (09:49:40.398705[84208])

19:49:40.963454 : Congestion Avoidance [Window Stabilized]

19:49:41.298996 : Estimated RTT: 90,428 microseconds

19:50:11.093473 : Closing [Sender FIN]

19:50:11.098121 : Closed [Receiver FIN]

19:50:11.154345--------tcp_cong_avoid_ai (09:50:11.399007[76566])

TCPState: Sprinkle in some static analysis...


Match(Slow Start): 0 microseconds after sender state change [Completion ACK]

Match(Congestion Avoidance): 245,217 microseconds after sender state change [Window Stabilized]

Total microseconds in wrong state: 245,217 (Permitted: 1,524,819 microseconds of 30,496,389 total connection time)

Accuracy: 99.196%

Note: This test permitted no more than 5% error

TCPState: Did someone say testing?


     VMWare running a Linux 4.4 kernel 

     DSL with 50Mbit downstream and  20Mbit upstream


     OpenVZ VM running Linux 2.6.x kernel

     Bandwidth Unknown (>> 20Mbit)

TCPState: Congestion Controls




HighSpeed TCP










TCPState: Test Structure

Each Congestion Control module was used for:

1. A 30 second iperf run

2. A 60 second iperf run

3. A 600 second iperf run

TCPState: Test Measurements

TCPState was graded on time out-of-state

     The amount of time its predicted state did not match the actual state of the sender

For example:


On a 100 second test, TCPState was allowed a 5% margin of error. 

     To pass, TCPState must be accurate for no less than 95 seconds of the complete connection

Scores were calculates as a percentage of the total time:  96 / 100 = 96%

TCPState: Scores

                                        30 Seconds                     60 Seconds                  90 Seconds

BIC                                  99.263%                          99.809%                        99.972%

CUBIC                             99.43%                            99.715%                        99.971%

DCTCP                            99.454%                         99.618%                        99.971%

HighSpeed TCP             99.24%                            99.413%                        99.971%

HTCP                              99.25%                            99.517%                        99.971%

Hybla                              99.084%                          99.713%                        99.921%

Illinois                             96.492%                          98.887%                        99.961%

Reno                               99.436%                          99.62%                          99.962%

Scalable                         98.855%                          99.626%                        99.951%

Vegas                             99.427%                          99.71%                          99.951%

Veno                               99.43%                            99.622%                        99.952%

Westwood                     99.251%                          99.705%                        99.971%

Yeah                               99.254%                          99.615%                        99.961%

TCPState: Perfect?!

Not quite....

     More robust testing required

          Different networks

          More congestion (real or simulated)

          Compound TCP (Windows)




TCPState: Application

Written in Java

     Easier for me to model and explore

     Approx. 926 lines of code

          Most of this is cruft for testing and validation

JProbes in C

TCPState: Source Code

TCPState: Results Data


Thank you!

Create a presentation like this one
Share it on social medias
Share it on your own
Share it on social medias
Share it on your own

How to export your presentation

Please use Google Chrome to obtain the best export results.

How to export your presentation


by idahoincolorado


Public - 4/30/16, 8:27 PM