茴字的4种写法--502的四种常见原因详解


背景

通常 Nginx 反向代理 upstream 遇到连接超时的时候,在默认情况下会等待 TCP 重试,直到重试次数用尽才会重试下一台 server ,以至于客户端请求遇上这种情况时,会导致 3秒 ~ 63秒,如果 客户端 / CDN 侧没有主动超时,甚至会有更长的响应时间,非常影响用户体验。同样地,在 Nginx 服务端也会因为长期挂起连接,而带来不小的系统消耗,甚至拖垮整机、影响集群,造成雪崩。

以下将主要说明四种常见的超时错误,以及产生原因。


场景

场景一: 110: Connection timed out

场景日志

该报错大致可以从以下两种日志看到

  • access log

    192.168.7.52:64609 80 - [20/Nov/2021:22:57:07 +0800] "GET / HTTP/1.1" 200 35 86 "-" "curl/7.64.1" - 192.168.7.35:8833, 192.168.7.33:8811 502, 200 7.017, 0.002 7.018 disaster.nestealin.com "- - -" "192.168.7.52"
  • error log

    2021/11/20 22:57:07 [error] 3942#0: *24 connect() failed (110: Connection timed out) while connecting to upstream, client: 192.168.7.52, server: disaster.nestealin.com, request: "GET / HTTP/1.1", upstream: "http://192.168.7.35:8833/", host: "disaster.nestealin.com"

具体还可以在 debug 日志中找到相关信息:

2021/11/20 22:57:00 [debug] 3942#0: accept on 0.0.0.0:80, ready: 0
2021/11/20 22:57:00 [debug] 3942#0: posix_memalign: 0000000002879080:512 @16
2021/11/20 22:57:00 [debug] 3942#0: *24 accept: 192.168.7.52:64609 fd:4
...
2021/11/20 22:57:00 [debug] 3942#0: *24 http header: "Host: disaster.nestealin.com"
...
2021/11/20 22:57:00 [debug] 3942#0: *24 connect to 192.168.7.35:8833, fd:6 #25
2021/11/20 22:57:00 [debug] 3942#0: *24 http upstream connect: -2
...
2021/11/20 22:57:07 [error] 3942#0: *24 connect() failed (110: Connection timed out) while connecting to upstream, client: 192.168.7.52, server: disaster.nestealin.com, request: "GET / HTTP/1.1", upstream: "http://192.168.7.35:8833/", host: "disaster.nestealin.com"
2021/11/20 22:57:07 [debug] 3942#0: *24 http next upstream, 2
2021/11/20 22:57:07 [debug] 3942#0: *24 free rr peer 3 4
2021/11/20 22:57:07 [warn] 3942#0: *24 upstream server temporarily disabled while connecting to upstream, client: 192.168.7.52, server: disaster.nestealin.com, request: "GET / HTTP/1.1", upstream: "http://192.168.7.35:8833/", host: "disaster.nestealin.com"

以上,大致可以看出 Nginx 在 22:57:00 接收客户端 http 请求,并在同一秒内向 upstream 服务转发请求,但中间间隔了7秒没有日志,直到 22:57:07 得到了个 error 信息 connect() failed (110: Connection timed out) ,中间发生了什么并不知晓。

抓包分析

  • access log

    192.168.7.52:54402 80 - [21/Nov/2021:18:18:37 +0800] "GET / HTTP/1.1" 200 35 86 "-" "curl/7.64.1" - 192.168.7.35:8833, 192.168.7.34:8822 502, 200 7.014, 0.002 7.015 disaster.nestealin.com "- - -" "192.168.7.52"
  • error log

    2021/11/21 18:18:37 [error] 6193#0: *249 connect() failed (110: Connection timed out) while connecting to upstream, client: 192.168.7.52, server: disaster.nestealin.com, request: "GET / HTTP/1.1", upstream: "http://192.168.7.35:8833/", host: "disaster.nestealin.com"
  • 抓包信息

    18:18:30.144430 IP localhost.localdomain.37503 > 192.168.7.35.8833: Flags [S], seq 484384411, win 29200, options [mss 1460], length 0
    18:18:31.145372 IP localhost.localdomain.37503 > 192.168.7.35.8833: Flags [S], seq 484384411, win 29200, options [mss 1460], length 0
    18:18:33.149337 IP localhost.localdomain.37503 > 192.168.7.35.8833: Flags [S], seq 484384411, win 29200, options [mss 1460], length 0

目前通过抓包信息可以得知,在 18:18:30 进行了第一次请求后端,发送 SYN 包,但后端一直没有回应,于是分别在 18:18:31 、18:18:33 两个时间点都做了重试发送 SYN 包,最终在 18:18:37 因为后端未回应,nginx 才进行下一台后端 ( next upstream ) 重试,最终完成客户端请求响应。

排查思路

  1. 超时状态码为 502 ,而非客户端主动断开 499 ,问题在服务端侧,即 Nginx 及 upstream 服务;
  2. Nginx 向后端转发请求后,直到7秒后才有日志且提示为请求超时,检查 Nginx 超时配置并无关联,初步判断与 Nginx 无关;
  3. 开始检查系统内核 TCP 超时相关配置
    1. net.ipv4.tcp_synack_retries
    2. net.ipv4.tcp_syn_retries
    3. net.ipv4.tcp_fin_timeout
  4. 结合上面抓包结果,目前还在与后端 upstream 进行 TCP 握手的 SYN 阶段,排除 3.1 与 3.3 ,深入了解 3.2 信息;
  5. 通过 sysctl -a 可以知道 3.2 配置在 Ningx 机器上配置为 net.ipv4.tcp_syn_retries = 2 即 SYN 阶段超时会进行 2 次重试,与抓包结果相符;
  6. 大致定位与本机 TCP SYN 重试配置有关;

原因分析

实践验证里的唯一标准。

在查遍系统内核配置中,并未发现与 SYN 超时相关的配置,于是暂且对重试值进行修改,首先确认是否 Ningx 判断超时确实与该配置有关。

重试一次

net.ipv4.tcp_syn_retries = 1

Nginx 日志

192.168.7.52:52011 80 - [21/Nov/2021:00:12:05 +0800] "GET / HTTP/1.1" 200 35 86 "-" "curl/7.64.1" - 192.168.7.35:8833, 192.168.7.33:8811 502, 200 3.008, 0.002 3.009 disaster.nestealin.com "- - -" "192.168.7.52"

抓包请求

00:12:02.750466 IP localhost.localdomain.37375 > 192.168.7.35.8833: Flags [S], seq 642605387, win 29200, options [mss 1460], length 0
00:12:03.753312 IP localhost.localdomain.37375 > 192.168.7.35.8833: Flags [S], seq 642605387, win 29200, options [mss 1460], length 0
  • 本次共发生一次 SYN 重试,Nginx 侧超时仅为3秒就开始重试下一台。
  • 第一次请求 ( 00:12:02 ) 与重试 ( 00:12:03 ) 间隔了 1 秒。
重试五次

net.ipv4.tcp_syn_retries = 5

Nginx 日志 ( access log )

192.168.7.52:51875 80 - [21/Nov/2021:00:09:19 +0800] "GET / HTTP/1.1" 200 35 86 "-" "curl/7.64.1" - 192.168.7.35:8833, 192.168.7.33:8811 502, 200 63.106, 0.002 63.107 disaster.nestealin.com "- - -" "192.168.7.52"

抓包请求

00:08:16.820611 IP localhost.localdomain.37365 > 192.168.7.35.8833: Flags [S], seq 1405433760, win 29200, options [mss 1460], length 0
00:08:17.821363 IP localhost.localdomain.37365 > 192.168.7.35.8833: Flags [S], seq 1405433760, win 29200, options [mss 1460], length 0
00:08:19.825355 IP localhost.localdomain.37365 > 192.168.7.35.8833: Flags [S], seq 1405433760, win 29200, options [mss 1460], length 0
00:08:23.829367 IP localhost.localdomain.37365 > 192.168.7.35.8833: Flags [S], seq 1405433760, win 29200, options [mss 1460], length 0
00:08:31.845332 IP localhost.localdomain.37365 > 192.168.7.35.8833: Flags [S], seq 1405433760, win 29200, options [mss 1460], length 0
00:08:47.861366 IP localhost.localdomain.37365 > 192.168.7.35.8833: Flags [S], seq 1405433760, win 29200, options [mss 1460], length 0
  • 本次共发生 五次 SYN 重试,Nginx 侧超时为 63秒 才开始重试下一台。
  • 第一次请求 ( 00:08:16 ) 与第一次重试 ( 00:08:17 ) 间隔了 1 秒
  • 第一次重试 ( 00:08:17 ) 与第二次重试 ( 00:08:19 ) 间隔了 2 秒
  • 第二次重试 ( 00:08:19 ) 与第三次重试 ( 00:08:23 ) 间隔了 4 秒
  • 第三次重试 ( 00:08:23 ) 与第四次重试 ( 00:08:31 ) 间隔了 8 秒
  • 第四次重试 ( 00:08:31 ) 与第五次重试 ( 00:08:47 ) 间隔了 16 秒
  • 第五次重试 ( 00:08:47 ) 与 Nginx 打印超时日志 ( 00:09:19 ) 间隔了 32 秒
  • 总耗时 63 秒

目前已经明确知道,Nginx 的 502 超时时间与系统内核 net.ipv4.tcp_syn_retries 的 TCP SYN 重试配置有关。

同时,结合重试两次的抓包内容分析可知,系统内核对 TCP SYN 重试时间间隔是以 2 次幂来递增的。

即: 1秒 (2^0 ) 、2秒 ( 2^1 ) 、4秒 ( 2^2 ) 、8秒 ( 2^3 ) 、16秒 ( 2^4 ) …

综上所述,由于 Nginx 本机的 TCP SYN 重试配置为2,所以会对后端进行2次 SYN 重试,共计7秒,也就造成了 Nginx 需要等待7秒后才会对下一台 upstream server 进行重试。


场景二: 113: No route to host

场景日志

该报错大致可以从以下两种日志看到

  • access log

    192.168.7.52:63792 80 - [20/Nov/2021:22:37:39 +0800] "GET / HTTP/1.1" 200 35 86 "-" "curl/7.64.1" - 192.168.7.35:8833, 192.168.7.33:8811 502, 200 3.011, 0.002 3.013 disaster.nestealin.com "- - -" "192.168.7.52"
  • error log

    2021/11/20 22:37:39 [error] 3914#0: *9 connect() failed (113: No route to host) while connecting to upstream, client: 192.168.7.52, server: disaster.nestealin.com, request: "GET / HTTP/1.1", upstream: "http://192.168.7.35:8833/", host: "disaster.nestealin.com"

具体还可以在 debug 日志中找到相关信息:

2021/11/20 22:37:36 [debug] 3914#0: accept on 0.0.0.0:80, ready: 0
2021/11/20 22:37:36 [debug] 3914#0: posix_memalign: 000000000286FF00:512 @16
2021/11/20 22:37:36 [debug] 3914#0: *9 accept: 192.168.7.52:63792 fd:3
...
2021/11/20 22:37:36 [debug] 3914#0: *9 http header: "Host: disaster.nestealin.com"
...
2021/11/20 22:37:36 [debug] 3914#0: *9 epoll add connection: fd:9 ev:80002005
2021/11/20 22:37:36 [debug] 3914#0: *9 connect to 192.168.7.35:8833, fd:9 #10
...
2021/11/20 22:37:39 [error] 3914#0: *9 connect() failed (113: No route to host) while connecting to upstream, client: 192.168.7.52, server: disaster.nestealin.com, request: "GET / HTTP/1.1", upstream: "http://192.168.7.35:8833/", host: "disaster.nestealin.com"
2021/11/20 22:37:39 [debug] 3914#0: *9 http next upstream, 2
2021/11/20 22:37:39 [debug] 3914#0: *9 free rr peer 3 4
2021/11/20 22:37:39 [warn] 3914#0: *9 upstream server temporarily disabled while connecting to upstream, client: 192.168.7.52, server: disaster.nestealin.com, request: "GET / HTTP/1.1", upstream: "http://192.168.7.35:8833/", host: "disaster.nestealin.com"

以上,大致可以看出 Nginx 在 22:37:36 接收客户端 http 请求,并在同一秒内向 upstream 服务转发请求,但中间间隔了3秒没有日志,直到 22:37:39 得到了个 error 信息 connect() failed (113: No route to host) ,中间发生了什么并不知晓。

抓包分析

Nginx 日志 ( access log )

192.168.7.52:58889 80 - [23/Nov/2021:00:09:45 +0800] "GET / HTTP/1.1" 200 35 86 "-" "curl/7.64.1" - 192.168.7.35:8833, 192.168.7.33:8811 502, 200 3.006, 0.002 3.009 disaster.nestealin.com "- - -" "192.168.7.52"

error log

2021/11/23 00:09:45 [error] 21442#0: *478 connect() failed (113: No route to host) while connecting to upstream, client: 192.168.7.52, server: disaster.nestealin.com, request: "GET / HTTP/1.1", upstream: "http://192.168.7.35:8833/", host: "disaster.nestealin.com"

debug error log

2021/11/24 12:42:34 [debug] 24043#0: *629 http process request header line
...
2021/11/24 12:42:34 [debug] 24043#0: *629 http script var: "disaster.nestealin.com"
...
2021/11/24 12:42:34 [debug] 24043#0: *629 connect to 192.168.7.35:8833, fd:5 #630
...
2021/11/24 12:42:37 [error] 24043#0: *629 connect() failed (113: No route to host) while connecting to upstream, client: 192.168.7.55, server: disaster.nestealin.com, request: "GET / HTTP/1.1", upstream: "http://192.168.7.35:8833/", host: "disaster.nestealin.com"
2021/11/24 12:42:37 [debug] 24043#0: *629 http next upstream, 2
2021/11/24 12:42:37 [debug] 24043#0: *629 free rr peer 3 4
2021/11/24 12:42:37 [warn] 24043#0: *629 upstream server temporarily disabled while connecting to upstream, client: 192.168.7.55, server: disaster.nestealin.com, request: "GET / HTTP/1.1", upstream: "http://192.168.7.35:8833/", host: "disaster.nestealin.com"
2021/11/24 12:42:37 [debug] 24043#0: *629 free rr peer failed: 00000000027FA028 0
2021/11/24 12:42:37 [debug] 24043#0: *629 close http upstream connection: 5

抓包过程如下

12:42:34.847441 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
12:42:35.849338 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
12:42:36.851311 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
12:42:37.853542 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
12:42:38.855311 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
12:42:39.857319 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28

通过抓包信息过滤可得:

  • 第一次请求后,因为本机 ARP 列表不存在服务端 MAC 地址信息,所以发起 ARP 请求;( 12:42:34 )
  • 随后每隔1秒都会发起一次 ARP 请求,共6次;( 12:42:34 - 12:42:39 )
  • 而 Nginx 判断后端主机不可达的时间点为 12:42:37
  • 为什么超时后还有 ARP 请求?

备注:

  1. 查看本机当前 ARP 列表

    arp -n

  2. 删除本机单条 ARP 记录

    arp -d ${IP地址}

  • 直接对后端 curl 检测

    [root@test ~]# date ; curl http://192.168.7.35:8833 -v ; date
    Sun Nov 28 16:02:45 CST 2021
    * About to connect() to 192.168.7.35 port 8833 (#0)
    *   Trying 192.168.7.35...
    * No route to host
    * Failed connect to 192.168.7.35:8833; No route to host
    * Closing connection 0
    curl: (7) Failed connect to 192.168.7.35:8833; No route to host
    Sun Nov 28 16:02:48 CST 2021

    抓包结果

    16:02:45.492425 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
    16:02:46.493310 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
    16:02:47.495344 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
    16:02:48.497477 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
    16:02:49.499292 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
    16:02:50.501340 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28

curl 请求后 3秒 超时,通过抓包结果已知该大致与 arp 响应超时有些许关系,查找相关配置基本锁定在 mcast_solicitretrans_time_ms 这两项内核参数。

在默认超时时间 ( retrans_time_ms ) 不变的前提下,尝试修改最大重试次数至8次

Linux 系统下一切皆文件,可找到对应内核配置文件直接修改即可。

cd /proc/sys/net/ipv4/neigh/eth0

# 查看默认值
cat mcast_solicit
>>>
3

# 将重试次数修改至8
echo "8" > mcast_solicit

亦可通过配置文件来修改网卡 ( eth0 ) arp 最大重试次数

vim /etc/sysctl.conf

# arp 最大重试5次
net.ipv4.neigh.eth0.mcast_solicit = 5

# 保存退出后,进行加载
sysctl -p

引申内容:

  1. 在把记录标记为不可达之前, 用多播/广播方式解析地址的最大次数,默认值 3

    net.ipv4.neigh.eth0.mcast_solicit = 3

  2. 重发一个arp请求前的等待的毫秒数,默认值 1000ms

    net.ipv4.neigh.${网络接口}.retrans_time_ms = 1000

  3. 查看当前内核参数

    sysctl -a

  4. 最终修改结果,是以内核文件值为准 ( 当有手动echo修改内核文件时,配置文件修改的部分不会生效! )

再次请求

[root@test ~]# date ; curl http://192.168.7.35:8833 -v ; date
Sun Nov 28 16:08:41 CST 2021
* About to connect() to 192.168.7.35 port 8833 (#0)
*   Trying 192.168.7.35...
* Connection timed out
* Failed connect to 192.168.7.35:8833; Connection timed out
* Closing connection 0
curl: (7) Failed connect to 192.168.7.35:8833; Connection timed out
Sun Nov 28 16:08:48 CST 2021

抓包结果

16:08:41.059939 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:08:42.061321 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:08:43.063341 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:08:44.065328 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:08:45.067302 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:08:46.069329 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:08:47.071314 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:08:48.073288 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28

这次是 7秒 超时,但 curl 结果有些变化,不是 No route to host 而是变成了 Connection timed out 有点像上一个场景的问题。

参照上一个场景,修改 tcp_syn_retries=5 让 TCP 的 syn 超时拉长至 63 秒。

继续请求测试

[root@test ~]# date ; curl http://192.168.7.35:8833 -v ; date
Sun Nov 28 16:17:40 CST 2021
* About to connect() to 192.168.7.35 port 8833 (#0)
*   Trying 192.168.7.35...
* No route to host
* Failed connect to 192.168.7.35:8833; No route to host
* Closing connection 0
curl: (7) Failed connect to 192.168.7.35:8833; No route to host
Sun Nov 28 16:17:48 CST 2021

抓包结果

16:17:40.397450 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:17:41.399361 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:17:42.401302 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:17:43.403311 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:17:44.405311 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:17:45.407330 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:17:46.409310 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:17:47.411319 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:17:48.413517 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:17:49.415348 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:17:50.417352 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:17:51.419326 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:17:52.421316 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:17:53.423333 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:17:54.425327 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28
16:17:55.427313 ARP, Request who-has 192.168.7.35 tell localhost.localdomain, length 28

这次是 8秒 超时,并且 curl 结果”恢复”成 No route to host 了。

此时再通过 Nginx 请求测试,可以确认 “No route to host” 带来的 502 超时问题与该内核配置有关。

  • access log
192.168.7.52:56549 80 - [22/Nov/2021:00:48:46 +0800] "GET / HTTP/1.1" 200 35 86 "-" "curl/7.64.1" - 192.168.7.35:8833, 192.168.7.34:8822 502, 200 8.010, 0.003 8.014 disaster.nestealin.com "- - -" "192.168.7.52"

小结

  1. 系统判断目标路由不可达的依据是由两种内核超时的短板效应决定的

    以下超时条件任意一项达成,即可触发 No route to host.

    1. ARP 超时
    2. Syn 超时
  2. ARP 超时有两个关键内核参数:

    1. mcast_solicit – ARP请求最大重试次数
    2. retrans_time_ms – ARP请求单次超时时间

场景三: 111: Connection refused

场景日志

  • access log

    192.168.7.52:62812 80 - [28/Nov/2021:21:14:48 +0800] "GET / HTTP/1.1" 200 35 86 "-" "curl/7.64.1" - 192.168.7.35:8833, 192.168.7.34:8822 502, 200 0.001, 0.002 0.003 disaster.nestealin.com "- - -" "192.168.7.52"
  • error log

    2021/11/28 21:14:48 [error] 24043#0: *699 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.7.52, server: disaster.nestealin.com, request: "GET / HTTP/1.1", upstream: "http://192.168.7.35:8833/", host: "disaster.nestealin.com"
  • debug error log

    2021/11/28 21:14:48 [debug] 24043#0: *699 http process request header line
    2021/11/28 21:14:48 [debug] 24043#0: *699 http header: "Host: disaster.nestealin.com"
    ...
    2021/11/28 21:14:48 [debug] 24043#0: *699 connect to 192.168.7.35:8833, fd:10 #700
    2021/11/28 21:14:48 [debug] 24043#0: *699 http upstream connect: -2
    ...
    2021/11/28 21:14:48 [error] 24043#0: *699 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.7.52, server: disaster.nestealin.com, request: "GET / HTTP/1.1", upstream: "http://192.168.7.35:8833/", host: "disaster.nestealin.com"
    2021/11/28 21:14:48 [debug] 24043#0: *699 http next upstream, 2
    2021/11/28 21:14:48 [debug] 24043#0: *699 free rr peer 3 4
    2021/11/28 21:14:48 [warn] 24043#0: *699 upstream server temporarily disabled while connecting to upstream, client: 192.168.7.52, server: disaster.nestealin.com, request: "GET / HTTP/1.1", upstream: "http://192.168.7.35:8833/", host: "disaster.nestealin.com"
    2021/11/28 21:14:48 [debug] 24043#0: *699 free rr peer failed: 00000000027FA028 0
    2021/11/28 21:14:48 [debug] 24043#0: *699 close http upstream connection: 10

抓包分析

  • 抓包结果

    21:14:48.737691 IP localhost.localdomain.38023 > 192.168.7.35.8833: Flags [S], seq 4109790988, win 29200, options [mss 1460], length 0
    21:14:48.738432 IP 192.168.7.35.8833 > localhost.localdomain.38023: Flags [R.], seq 0, ack 4109790989, win 0, length 0

通过抓包信息过滤可得:

  • 第一次请求时,已经找到路由并向目标主机、端口发送 Syn 包
  • 但随后,对端立刻恢复了一个 Rst 数据包就此关闭连接
  • Nginx 也随即认为连接异常断开,随即马上重试下一台后端

其实很显然这是一个因为后端没有监听该端口,导致 Nginx 请求后立刻收到 Rst 重置包的问题。


场景四: 14077410:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure

场景日志

108.162.215.99:45784 443 - [23/Aug/2021:20:12:55 +0800] "GET / HTTP/2.0" 502 552 588 "https://abc.nestealin.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36" 112.94.5.214 104.21.47.158:443, 172.67.148.224:443, [2606:4700:3031::6815:2f9e]:443, [2606:4700:3033::ac43:94e0]:443 502, 502, 502, 502 0.004, 0.025, 0.000, 0.000 0.030 abc.nestealin.com "- - -" "112.94.5.214"

原因及解决方式


相关文档

Centos 6系统下,ARP超时与重试次数说明

内核参数列表

从Pingmesh引申到Linux中TCP重传等待时间(RTO)

TCP/IP重传超时–RTO

RTO对tcp超时的影响

linux下超时重传时间(RTO)的实现探究

RTO的计算方法(基于RFC6298和Linux 3.10)

SYN和RTO

Socket FAQ 集锦

TCP 的那些事儿(上)– TCP的状态机

Nginx的超时timeout配置详解

nginx-proxy超时参数解释

nginx快速定位异常

Linux 建立 TCP 连接的超时时间分析

nginx的proxy_next_upstream使用中的一个坑

Linux中arp表的老化机制


文章作者: NesTeaLin
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 NesTeaLin !
  目录