凡事往简单处想,往认真处行。
实验目的
基于 packetdrill TCP 三次握手脚本,测试 Win 字段的由来。
基础脚本
# cat tcp_3hs_000.pkt
// TCP 基础之三次握手
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 10000 <mss 1460>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4
实验测试一
因为 < 属于构造数据包,所以 SYN 的 Win 理解较为简单。
+0 < S 0:0(0) win 10000 <mss 1460>
// +0 本行代码执行时间相对于上一行代码的偏移时间。
// < ,表示输入或注入的数据包。
// S ,表示是 SYN 数据包。
// 0:0(0) ,表示开始序号:结束序号(数据包长度)。
// win 10000,表示接收窗口10000字节。
// <> 表示 TCP options,mss 1460,表示设置mss 1460字节大小。
前4个字段没有什么可以变化的,win 接收窗口是否可以不用定义,尝试修改脚本并执行,提示语义错误,入向数据包必须定义 window。
# cat tcp_3hs_win_000.pkt
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) <mss 1460>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4
# packetdrill tcp_3hs_win_000.pkt
tcp_3hs_win_000.pkt:6: semantic error: window must be specified for inbound packets
#
window 取值范围自然是[1,65535],也可以写个65536测试下,提示超出范围。
# cat tcp_3hs_win_001.pkt
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 65536 <mss 1460>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4
# packetdrill tcp_3hs_win_001.pkt
tcp_3hs_win_001.pkt:6: semantic error: TCP window value out of range
#
实验测试二
而 > 也就是 SYN/ACK 属于协议栈自动响应,那么它的 Win 值是如何确认的呢?
首先仍基于基础脚本,通过抓包结果,查看实际 TCP Win 的值是多少。连续尝试执行了三次脚本,tcpdump 捕获结果可以看到 SYN/ACK 中都是 win 64240。
# packetdrill tcp_3hs_000.pkt
# packetdrill tcp_3hs_000.pkt
# packetdrill tcp_3hs_000.pkt
#
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
22:38:31.029045 ? In IP 192.0.2.1.55593 > 192.168.250.140.8080: Flags [S], seq 0, win 10000, options [mss 1460], length 0
22:38:31.029067 ? Out IP 192.168.250.140.8080 > 192.0.2.1.55593: Flags [S.], seq 781521628, ack 1, win 64240, options [mss 1460], length 0
22:38:31.039193 ? In IP 192.0.2.1.55593 > 192.168.250.140.8080: Flags [.], ack 1, win 10000, length 0
22:38:31.039277 ? Out IP 192.168.250.140.8080 > 192.0.2.1.55593: Flags [F.], seq 1, ack 1, win 64240, length 0
22:38:31.039288 ? In IP 192.0.2.1.55593 > 192.168.250.140.8080: Flags [R.], seq 1, ack 1, win 10000, length 0
22:38:43.497042 ? In IP 192.0.2.1.55299 > 192.168.254.86.8080: Flags [S], seq 0, win 10000, options [mss 1460], length 0
22:38:43.497064 ? Out IP 192.168.254.86.8080 > 192.0.2.1.55299: Flags [S.], seq 2546121855, ack 1, win 64240, options [mss 1460], length 0
22:38:43.507167 ? In IP 192.0.2.1.55299 > 192.168.254.86.8080: Flags [.], ack 1, win 10000, length 0
22:38:43.507237 ? Out IP 192.168.254.86.8080 > 192.0.2.1.55299: Flags [F.], seq 1, ack 1, win 64240, length 0
22:38:43.507246 ? In IP 192.0.2.1.55299 > 192.168.254.86.8080: Flags [R.], seq 1, ack 1, win 10000, length 0
22:38:50.429058 ? In IP 192.0.2.1.60131 > 192.168.7.214.8080: Flags [S], seq 0, win 10000, options [mss 1460], length 0
22:38:50.429087 ? Out IP 192.168.7.214.8080 > 192.0.2.1.60131: Flags [S.], seq 861267356, ack 1, win 64240, options [mss 1460], length 0
22:38:50.439225 ? In IP 192.0.2.1.60131 > 192.168.7.214.8080: Flags [.], ack 1, win 10000, length 0
22:38:50.439329 ? Out IP 192.168.7.214.8080 > 192.0.2.1.60131: Flags [F.], seq 1, ack 1, win 64240, length 0
22:38:50.439342 ? In IP 192.0.2.1.60131 > 192.168.7.214.8080: Flags [R.], seq 1, ack 1, win 10000, length 0
win 64240 是如何得到的?很自然这个 TCP 接收窗口的大小取决于本端的处理逻辑了,这就只有深入 Linux 内核代码才能得知答案了。对于我对 Linux 内核代码掌握的皮毛程度,以下简述相关值的决定过程,感兴趣的同学可以自行查找资料学习,或者待我手撕源码的神功达成后再来一一讲解,希望有这么一天。🤣🤣🤣
客户端 SYN
在 packetdrill 上述脚本测试中实际 SYN 是 < 属于构造数据包,所以 Win 10000 来自于自定义的值。
以下简述客户端 SYN Win 构建过程中的几个相关函数,包括函数 tcp_connect_init 负责初始化 TCP 连接,其中涉及调用 tcp_select_initial_window 函数进行初始化窗口。
static void tcp_connect_init(struct sock *sk)
{
...
tcp_select_initial_window(sk, tcp_full_space(sk),
tp->advmss - (tp->rx_opt.ts_recent_stamp ? tp->tcp_header_len - sizeof(struct tcphdr) : 0),
&tp->rcv_wnd,
&tp->window_clamp,
sock_net(sk)->ipv4.sysctl_tcp_window_scaling,
&rcv_wscale,
rcv_wnd);
...
接下来进入 tcp_select_initial_window 函数,可见,__space 来自于 tcp_full_space(sk),一般取值为 tcp_rmem 默认值的 1/2,之后再设置 space 为 MSS 值的整数倍,最后与 U16_MAX 值比较取小 ,即一般情况下会是 64240 的窗口大小。
/* Determine a window scaling and initial window to offer.
* Based on the assumption that the given amount of space
* will be offered. Store the results in the tp structure.
* NOTE: for smooth operation initial space offering should
* be a multiple of mss if possible. We assume here that mss >= 1.
* This MUST be enforced by all callers.
*/
void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
__u32 *rcv_wnd, __u32 *window_clamp,
int wscale_ok, __u8 *rcv_wscale,
__u32 init_rcv_wnd)
{
/* 确认空间大小,使其不会是负数。*/
unsigned int space = (__space < 0 ? 0 : __space);
/* If no clamp set the clamp to the max possible scaled window */
/* 如果 clamp 没有设置,则将 clamp 设置为 65535 * (2^14) = 1073741824,确保TCP窗口大小可以扩大到的理论最大值。*/
/* 之后 space 值使用min()函数取space和*window_clamp的最小值。*/
if (*window_clamp == 0)
(*window_clamp) = (U16_MAX << TCP_MAX_WSCALE);
space = min(*window_clamp, space);
/* Quantize space offering to a multiple of mss if possible. */
/* 确保 space 是 mss 的整数倍 */
if (space > mss)
space = rounddown(space, mss);
/* NOTE: offering an initial window larger than 32767
* will break some buggy TCP stacks. If the admin tells us
* it is likely we could be speaking with such a buggy stack
* we will truncate our initial window offering to 32K-1
* unless the remote has sent us a window scaling option,
* which we interpret as a sign the remote TCP is not
* misinterpreting the window field as a signed quantity.
*/
/* 根据 ipv4.sysctl_tcp_workaround_signed_windows是否设置,相应设置接收窗口大小rcv_wnd。*/
if (sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows)
(*rcv_wnd) = min(space, MAX_TCP_WINDOW);
else
(*rcv_wnd) = min_t(u32, space, U16_MAX);
/* 如果指定了init_rcv_wnd的值,则设置接收窗口大小rcv_wnd的min值。
if (init_rcv_wnd)
*rcv_wnd = min(*rcv_wnd, init_rcv_wnd * mss);
/* 计算接收窗口 rcv_wscale。*/
*rcv_wscale = 0;
if (wscale_ok) {
/* Set window scaling on max possible window */
space = max_t(u32, space, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]);
space = max_t(u32, space, sysctl_rmem_max);
space = min_t(u32, space, *window_clamp);
*rcv_wscale = clamp_t(int, ilog2(space) - 15,
0, TCP_MAX_WSCALE);
}
/* Set the clamp no higher than max representable value */
/* 根据计算出的接收窗口扩大系数rcv_wscale来限制window_clamp的最大值。*/
(*window_clamp) = min_t(__u32, U16_MAX << (*rcv_wscale), *window_clamp);
}
EXPORT_SYMBOL(tcp_select_initial_window);
服务器端 SYN/ACK
> 也就是 SYN/ACK 属于协议栈自动响应,所以 Win 64240 大致处理逻辑如下。
以下简述服务器端 SYN/ACK Win 构建过程中的几个函数,涉及 tcp_v4_conn_request -> tcp_conn_request -> tcp_openreq_init_rwin 。
tcp_openreq_init_rwin 函数如下,其主要功能在于选择函数 tcp_select_initial_window 所需的参数,再调用其初始化接收窗口相关信息 。
void tcp_openreq_init_rwin(struct request_sock *req,
const struct sock *sk_listener,
const struct dst_entry *dst)
{
struct inet_request_sock *ireq = inet_rsk(req);
const struct tcp_sock *tp = tcp_sk(sk_listener);
/* 调用 tcp_full_space 函数获取监听套接字的接收缓冲区总大小,赋值给full_space */
int full_space = tcp_full_space(sk_listener);
u32 window_clamp;
__u8 rcv_wscale;
u32 rcv_wnd;
int mss;
/* 计算mss,基于目标路径的通告mss和监听套接字的限制。*/
mss = tcp_mss_clamp(tp, dst_metric_advmss(dst));
/* 读取监听套接字的window_clamp值。*/
window_clamp = READ_ONCE(tp->window_clamp);
/* Set this up on the first call only */
/* 如果有window_clamp值就用它,否则用目标路径的Window大小,作为请求套接字的窗口限制值。*/
req->rsk_window_clamp = window_clamp ? : dst_metric(dst, RTAX_WINDOW);
/* limit the window selection if the user enforce a smaller rx buffer */
/* 如果用户锁定设置了较小的接收缓冲区大小,那么需要限制窗口选择在该缓冲区大小之内。*/
if (sk_listener->sk_userlocks & SOCK_RCVBUF_LOCK &&
(req->rsk_window_clamp > full_space || req->rsk_window_clamp == 0))
req->rsk_window_clamp = full_space;
/* bpf 设置窗口相关。*/
rcv_wnd = tcp_rwnd_init_bpf((struct sock *)req);
if (rcv_wnd == 0)
rcv_wnd = dst_metric(dst, RTAX_INITRWND);
else if (full_space < rcv_wnd * mss)
full_space = rcv_wnd * mss;
/* tcp_full_space because it is guaranteed to be the first packet */
tcp_select_initial_window(sk_listener, full_space,
mss - (ireq->tstamp_ok ? TCPOLEN_TSTAMP_ALIGNED : 0),
&req->rsk_rcv_wnd,
&req->rsk_window_clamp,
ireq->wscale_ok,
&rcv_wscale,
rcv_wnd);
ireq->rcv_wscale = rcv_wscale;
}
EXPORT_SYMBOL(tcp_openreq_init_rwin);
实验测试三
对于 SYN/ACK 中 Win 值测试,首先尝试修改 tcp_rmem 的大小为 65536,也就是设置 full_space 的值为 tcp_rmem 的 1/2 ,即 32768 。
tcp_rmem 默认值 131072
# sysctl -a | grep tcp_rmem
net.ipv4.tcp_rmem = 4096 131072 6291456
#
tcp_rmem 修改值 65536
# sysctl -q net.ipv4.tcp_rmem="4096 65536 6291456"
# sysctl -a | grep tcp_rmem
net.ipv4.tcp_rmem = 4096 65536 6291456
#
packetdrill 继续尝试执行脚本,tcpdump 捕获结果可以看到 SYN/ACK 中 win 32120。因为 space = min(*window_clamp, space) ,space 值 32768 小,之后确保 space 为 MSS 值的整数倍为 32120,最后再与 U16_MAX 值比较取小 ,rcv_wnd 即为 32120 。
# packetdrill tcp_3hs_000.pkt
#
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
20:44:14.653044 ? In IP 192.0.2.1.51855 > 192.168.63.230.8080: Flags [S], seq 0, win 10000, options [mss 1460], length 0
20:44:14.653057 ? Out IP 192.168.63.230.8080 > 192.0.2.1.51855: Flags [S.], seq 3972782229, ack 1, win 32120, options [mss 1460], length 0
20:44:14.663146 ? In IP 192.0.2.1.51855 > 192.168.63.230.8080: Flags [.], ack 1, win 10000, length 0
20:44:14.663202 ? Out IP 192.168.63.230.8080 > 192.0.2.1.51855: Flags [F.], seq 1, ack 1, win 32120, length 0
20:44:14.663210 ? In IP 192.0.2.1.51855 > 192.168.63.230.8080: Flags [R.], seq 1, ack 1, win 10000, length
实验测试四
继续 SYN/ACK 中 Win 值测试,首先恢复 tcp_rmem 的大小为 131072,之后尝试修改监听套接字的 window_clamp 值,但实际无法修改,报错 unknown symbol: 'TCP_WINDOW_CLAMP'
,貌似已经废除。。。那么继续修改 dst_metric 中 Win 来影响 req->rsk_window_clamp 取值。
# cat tcp_3hs_win_002.pkt
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 getsockopt(3, SOL_TCP, TCP_WINDOW_CLAMP, [0], [4]) = 0
+0 < S 0:0(0) win 10000 <mss 1460>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4
#
# packetdrill tcp_3hs_win_002.pkt
tcp_3hs_win_002.pkt:6: runtime error in getsockopt call: unknown symbol: 'TCP_WINDOW_CLAMP'
#
通过 packetdrill pkt 文件中修改 window 值为 32768,执行脚本后,tcpdump 捕获结果可以看到 SYN/ACK 中也为 win 32120。因为 space = min(*window_clamp, space) ,window_clamp 值 32768 小,之后确保 space 为 MSS 值的整数倍为 32120,最后再与 U16_MAX 值比较取小 ,rcv_wnd 即为 32120 。
# cat tcp_3hs_win_003.pkt
`ip route change 192.0.2.0/24 dev tun0 window 32768`
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 10000 <mss 1460>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4
#
# packetdrill tcp_3hs_win_003.pkt
#
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
11:02:51.097035 ? In IP 192.0.2.1.34619 > 192.168.45.48.8080: Flags [S], seq 0, win 10000, options [mss 1460], length 0
11:02:51.097055 ? Out IP 192.168.45.48.8080 > 192.0.2.1.34619: Flags [S.], seq 619636816, ack 1, win 32120, options [mss 1460], length 0
11:02:51.107151 ? In IP 192.0.2.1.34619 > 192.168.45.48.8080: Flags [.], ack 1, win 10000, length 0
11:02:51.107209 ? Out IP 192.168.45.48.8080 > 192.0.2.1.34619: Flags [F.], seq 1, ack 1, win 32120, length 0
11:02:51.107217 ? In IP 192.0.2.1.34619 > 192.168.45.48.8080: Flags [R.], seq 1, ack 1, win 10000, length 0
实验测试五
继续 SYN/ACK 中 Win 值测试,此次通过修改 init_rcv_wnd 值来影响 rcv_wnd 的取值,取 init_rcv_wnd * mss 小值。
通过 packetdrill pkt 文件中修改 initrwnd 值为 8,执行脚本后,tcpdump 捕获结果可以看到 SYN/ACK 中 win 11680,因为 init_rcv_wnd * mss 为 8 * 1460 = 11680 ,rcv_wnd 即为 11680 。
# cat tcp_3hs_win_004.pkt
`ip route change 192.0.2.0/24 dev tun0 initrwnd 8`
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 10000 <mss 1460>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4
# packetdrill tcp_3hs_win_004.pkt
#
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
11:21:54.989040 ? In IP 192.0.2.1.33625 > 192.168.59.145.8080: Flags [S], seq 0, win 10000, options [mss 1460], length 0
11:21:54.989063 ? Out IP 192.168.59.145.8080 > 192.0.2.1.33625: Flags [S.], seq 3877902910, ack 1, win 11680, options [mss 1460], length 0
11:21:54.999160 ? In IP 192.0.2.1.33625 > 192.168.59.145.8080: Flags [.], ack 1, win 10000, length 0
11:21:54.999220 ? Out IP 192.168.59.145.8080 > 192.0.2.1.33625: Flags [F.], seq 1, ack 1, win 11680, length 0
11:21:54.999229 ? In IP 192.0.2.1.33625 > 192.168.59.145.8080: Flags [R.], seq 1, ack 1, win 10000, length 0
往期推荐
原文始发于微信公众号(Echo Reply):Wireshark & Packetdrill | TCP 三次握手之 Win 字段
- 左青龙
- 微信扫一扫
-
- 右白虎
- 微信扫一扫
-
评论