It can be done
通过 packetdrill 测试 TCP Nagle,本次构造模拟的是客户端场景。
基础脚本
基础脚本为 TCP 三次握手,构造模拟的是客户端场景,相关脚本说明详见《TCP 基础之三次握手续》。
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > S 0:0(0) <...>
+0.01 < S. 0:0(0) ack 1 win 10000 <mss 1000>
+0 > . 1:1(0) ack 1
TCP Nagle
TCP Nagle 算法是什么?用一句简单的话描述就是:在任意时刻,最多只能有一个未被 ACK 确认的小包。
实验测试
基础脚本中设置了 TCP_NODELAY,实际也就是关闭了 Nagle 算法。修改脚本,尝试连续写入两个 100 字节的数据包。
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > S 0:0(0) <...>
+0.01 < S. 0:0(0) ack 1 win 10000 <mss 1000>
+0 > . 1:1(0) ack 1
+0.1 write(3, ..., 100) = 100
+0 write(3, ..., 100) = 100
+0 `sleep 100`
#
执行脚本,并通过 tcpdump 抓取数据包,现象如下,客户端连续发送了两个 100 字节的数据包,由于没有收到 ACK,因此第一个数据包在不断尝试重传。
# packetdrill tcp_nagle_001.pkt
#
# tcpdump -i any-nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... forfull protocol decode
listening onany, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
22:31:45.124870 tun0 Out IP 192.168.219.34.55810>192.0.2.1.8080: Flags [S], seq 1588170278, win 65535, options [mss 1460,sackOK,TS val 3595831910 ecr 0,nop,wscale 8], length 0
22:31:45.134999 tun0 In IP 192.0.2.1.8080>192.168.219.34.55810: Flags [S.], seq 0, ack 1588170279, win 10000, options [mss 1000], length 0
22:31:45.135043 tun0 Out IP 192.168.219.34.55810>192.0.2.1.8080: Flags [.], ack 1, win 65535, length 0
22:31:45.235154 tun0 Out IP 192.168.219.34.55810>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
22:31:45.235171 tun0 Out IP 192.168.219.34.55810>192.0.2.1.8080: Flags [P.], seq 101:201, ack 1, win 65535, length 100: HTTP
22:31:45.448577 tun0 Out IP 192.168.219.34.55810>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
22:31:45.892570 tun0 Out IP 192.168.219.34.55810>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
22:31:46.756575 tun0 Out IP 192.168.219.34.55810>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
#
修改脚本,使得客户端连接使用 nagle 算法,也就是不设置 TCP_NODELAY,仍然尝试连续写入两个 100 字节的数据包。
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > S 0:0(0) <...>
+0.01 < S. 0:0(0) ack 1 win 10000 <mss 1000>
+0 > . 1:1(0) ack 1
+0.1 write(3, ..., 100) = 100
+0 write(3, ..., 100) = 100
+0 `sleep 100`
#
执行脚本,并在一定时间后强行退出,同时通过 tcpdump 抓取数据包,现象如下。
可以看到在第一个 100 字节的小数据包发出后,因为没有得到 ACK 确认,所以第二个 100 字节的小数据包始终未能正常发送,最后强行退出脚本,关闭连接时该数据段才和 FIN 一起合并发送(见最后一个数据包)。
# packetdrill tcp_nagle_002.pkt
^Ctcp_nagle_002.pkt:13: error executing `sleep 100` command: got signal 2 (Interrupt)
#
# tcpdump -i any-nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... forfull protocol decode
listening onany, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
22:41:46.936863 tun0 Out IP 192.168.58.251.59226>192.0.2.1.8080: Flags [S], seq 218153149, win 65535, options [mss 1460,sackOK,TS val 2553113564 ecr 0,nop,wscale 8], length 0
22:41:46.946948 tun0 In IP 192.0.2.1.8080>192.168.58.251.59226: Flags [S.], seq 0, ack 218153150, win 10000, options [mss 1000], length 0
22:41:46.946968 tun0 Out IP 192.168.58.251.59226>192.0.2.1.8080: Flags [.], ack 1, win 65535, length 0
22:41:47.047062 tun0 Out IP 192.168.58.251.59226>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
22:41:47.260567 tun0 Out IP 192.168.58.251.59226>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
22:41:47.716577 tun0 Out IP 192.168.58.251.59226>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
22:41:48.580568 tun0 Out IP 192.168.58.251.59226>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
22:41:49.197787 ? Out IP 192.168.58.251.59226>192.0.2.1.8080: Flags [FP.], seq 101:201, ack 1, win 65535, length 100: HTTP
#
扩展测试
修改脚本,在使用 nagle 算法的情况下,尝试连续写入 1 个 100 字节和 1 个 MSS 大小也就是 1000 字节的数据包。
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > S 0:0(0) <...>
+0.01 < S. 0:0(0) ack 1 win 10000 <mss 1000>
+0 > . 1:1(0) ack 1
+0.1 write(3, ..., 100) = 100
+0 write(3, ..., 1000) = 1000
+0 `sleep 100`
#
执行脚本,并在一定时间后强行退出,同时通过 tcpdump 抓取数据包,现象如下。
可以看到在第一个 100 字节的小数据包发出后,虽然没有得到 ACK 确认,但是第二个 1000 字节(MSS 大小)的数据包可以正常发送。
# packetdrill tcp_nagle_003.pkt
^Ctcp_nagle_003.pkt:13: error executing `sleep 100` command: got signal 2 (Interrupt)
#
# tcpdump -i any-nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... forfull protocol decode
listening onany, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
22:52:35.696885 tun0 Out IP 192.168.13.103.51662>192.0.2.1.8080: Flags [S], seq 731055498, win 65535, options [mss 1460,sackOK,TS val 2616717475 ecr 0,nop,wscale 8], length 0
22:52:35.706997 tun0 In IP 192.0.2.1.8080>192.168.13.103.51662: Flags [S.], seq 0, ack 731055499, win 10000, options [mss 1000], length 0
22:52:35.707028 tun0 Out IP 192.168.13.103.51662>192.0.2.1.8080: Flags [.], ack 1, win 65535, length 0
22:52:35.807138 tun0 Out IP 192.168.13.103.51662>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
22:52:35.807153 tun0 Out IP 192.168.13.103.51662>192.0.2.1.8080: Flags [P.], seq 101:1101, ack 1, win 65535, length 1000: HTTP
22:52:36.020586 tun0 Out IP 192.168.13.103.51662>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
22:52:36.452583 tun0 Out IP 192.168.13.103.51662>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
22:52:37.316571 tun0 Out IP 192.168.13.103.51662>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
#
修改脚本,在使用 nagle 算法的情况下,尝试连续写入 2 个 100 字节的数据包,同时关闭连接。
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > S 0:0(0) <...>
+0.01 < S. 0:0(0) ack 1 win 10000 <mss 1000>
+0 > . 1:1(0) ack 1
+0.1 write(3, ..., 100) = 100
+0 write(3, ..., 100) = 100
+0 close(3) = 0
+0 `sleep 100`
#
执行脚本,并在一定时间后强行退出,同时通过 tcpdump 抓取数据包,现象如下。
可以看到在第一个 100 字节的小数据包发出后,虽然没有得到 ACK 确认,但是第二个 100 字节的数据包可以和 FIN 一起发出。
# packetdrill tcp_nagle_004.pkt
^Ctcp_nagle_004.pkt:16: error executing `sleep 100` command: got signal 2 (Interrupt)
#
# tcpdump -i any-nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... forfull protocol decode
listening onany, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
22:45:08.816874 tun0 Out IP 192.168.202.173.36094>192.0.2.1.8080: Flags [S], seq 3702960120, win 65535, options [mss 1460,sackOK,TS val 2389888012 ecr 0,nop,wscale 8], length 0
22:45:08.826980 tun0 In IP 192.0.2.1.8080>192.168.202.173.36094: Flags [S.], seq 0, ack 3702960121, win 10000, options [mss 1000], length 0
22:45:08.826999 tun0 Out IP 192.168.202.173.36094>192.0.2.1.8080: Flags [.], ack 1, win 65535, length 0
22:45:08.927074 tun0 Out IP 192.168.202.173.36094>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
22:45:08.927092 tun0 Out IP 192.168.202.173.36094>192.0.2.1.8080: Flags [FP.], seq 101:201, ack 1, win 65535, length 100: HTTP
22:45:09.140559 tun0 Out IP 192.168.202.173.36094>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
22:45:09.572574 tun0 Out IP 192.168.202.173.36094>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
22:45:10.440580 tun0 Out IP 192.168.202.173.36094>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
#
3. 如果 TCP_CORK 选项没有设置,设置了 TCP_NODELAY 选项,则允许发送。
基础脚本中已测试,客户端连续发送了两个 100 字节的数据包。
修改脚本,在使用 nagle 算法的情况下,尝试连续写入 2 个 100 字节大小的数据包,并在一定时间后收到对第一个数据包的 ACK 数据包。
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > S 0:0(0) <...>
+0.01 < S. 0:0(0) ack 1 win 10000 <mss 1000>
+0 > . 1:1(0) ack 1
+0.1 write(3, ..., 100) = 100
+0 write(3, ..., 100) = 100
+0.01 < . 1:1(0) ack 101 win 10000
+0 `sleep 100`
#
执行脚本,并在一定时间后强行退出,同时通过 tcpdump 抓取数据包,现象如下。
可以看到在第一个 100 字节的小数据包发出后,第二个 100 字节的数据包不能发出,但在收到对第一个数据包的 ACK 确认后,第二个数据包可以发出。
# packetdrill tcp_nagle_005.pkt
^Ctcp_nagle_005.pkt:16: error executing `sleep 100` command: got signal 2 (Interrupt)
#
# tcpdump -i any-nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... forfull protocol decode
listening onany, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
21:38:37.444865 tun0 Out IP 192.168.65.54.37824>192.0.2.1.8080: Flags [S], seq 932173338, win 65535, options [mss 1460,sackOK,TS val 114121758 ecr 0,nop,wscale 8], length 0
21:38:37.454949 tun0 In IP 192.0.2.1.8080>192.168.65.54.37824: Flags [S.], seq 0, ack 932173339, win 10000, options [mss 1000], length 0
21:38:37.454973 tun0 Out IP 192.168.65.54.37824>192.0.2.1.8080: Flags [.], ack 1, win 65535, length 0
21:38:37.555095 tun0 Out IP 192.168.65.54.37824>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
21:38:37.565177 tun0 In IP 192.0.2.1.8080>192.168.65.54.37824: Flags [.], ack 101, win 10000, length 0
21:38:37.565198 tun0 Out IP 192.168.65.54.37824>192.0.2.1.8080: Flags [P.], seq 101:201, ack 1, win 65535, length 100: HTTP
21:38:37.780604 tun0 Out IP 192.168.65.54.37824>192.0.2.1.8080: Flags [P.], seq 101:201, ack 1, win 65535, length 100: HTTP
21:38:38.212618 tun0 Out IP 192.168.65.54.37824>192.0.2.1.8080: Flags [P.], seq 101:201, ack 1, win 65535, length 100: HTTP
#
继续修改脚本,在使用 nagle 算法的情况下,尝试连续写入 3 个 100 字节大小的数据包,并在一定时间后收到对第一个数据包的 ACK 数据包。
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > S 0:0(0) <...>
+0.01 < S. 0:0(0) ack 1 win 10000 <mss 1000>
+0 > . 1:1(0) ack 1
+0.1 write(3, ..., 100) = 100
+0 write(3, ..., 100) = 100
+0 write(3, ..., 100) = 100
+0.01 < . 1:1(0) ack 101 win 10000
+0 `sleep 100`
#
执行脚本,并在一定时间后强行退出,同时通过 tcpdump 抓取数据包,现象如下。
可以看到在第一个 100 字节的小数据包发出后,第二个和第三个 100 字节的数据包都不能发出,但在收到对第一个数据包的 ACK 确认后,原来的第二三个数据包合并发出。
# packetdrill tcp_nagle_006.pkt
^Ctcp_nagle_006.pkt:17: error executing `sleep 100` command: got signal 2 (Interrupt)
#
# tcpdump -i any-nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... forfull protocol decode
listening onany, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
21:41:15.336857 tun0 Out IP 192.168.144.154.55528>192.0.2.1.8080: Flags [S], seq 2947873166, win 65535, options [mss 1460,sackOK,TS val 2974714415 ecr 0,nop,wscale 8], length 0
21:41:15.346940 tun0 In IP 192.0.2.1.8080>192.168.144.154.55528: Flags [S.], seq 0, ack 2947873167, win 10000, options [mss 1000], length 0
21:41:15.346962 tun0 Out IP 192.168.144.154.55528>192.0.2.1.8080: Flags [.], ack 1, win 65535, length 0
21:41:15.447061 tun0 Out IP 192.168.144.154.55528>192.0.2.1.8080: Flags [P.], seq 1:101, ack 1, win 65535, length 100: HTTP
21:41:15.457109 tun0 In IP 192.0.2.1.8080>192.168.144.154.55528: Flags [.], ack 101, win 10000, length 0
21:41:15.457134 tun0 Out IP 192.168.144.154.55528>192.0.2.1.8080: Flags [P.], seq 101:301, ack 1, win 65535, length 200: HTTP
21:41:15.672561 tun0 Out IP 192.168.144.154.55528>192.0.2.1.8080: Flags [P.], seq 101:301, ack 1, win 65535, length 200: HTTP
21:41:16.100599 tun0 Out IP 192.168.144.154.55528>192.0.2.1.8080: Flags [P.], seq 101:301, ack 1, win 65535, length 200: HTTP
#
往期推荐
原文始发于微信公众号(Echo Reply):Wireshark & Packetdrill | TCP Nagle 算法
- 左青龙
- 微信扫一扫
-
- 右白虎
- 微信扫一扫
-
评论