Bug 1448170
Summary: | RHEL6.9: sunrpc reconnect logic now may trigger a SYN storm when a TCP connection drops and a burst of RPC commands hit the transport | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Dave Wysochanski <dwysocha> | |
Component: | kernel | Assignee: | Dave Wysochanski <dwysocha> | |
kernel sub component: | NFS | QA Contact: | Yongcheng Yang <yoyang> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | urgent | CC: | alau, apanagio, arawat, bfields, bjellema, dhoward, dwysocha, fsorenso, jinjian.1, jiyin, jreznik, khuynh, kolga, m.c.dixon, mvermaes, nmurray, ptalbert, redhat, ssahsrab, tbecker, tgummels, tthakur, woodard, xzhou, yoyang | |
Version: | 6.9 | Keywords: | Patch, Regression, Reproducer, ZStream | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | kernel-2.6.32-704.el6 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1450850 (view as bug list) | Environment: | ||
Last Closed: | 2018-06-19 04:56:31 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1374441 | |||
Bug Blocks: | 1450850 | |||
Attachments: |
Description
Dave Wysochanski
2017-05-04 16:40:48 UTC
Confirmed 0fdea1e8a2853f79d39b8555cc9de16a7e0ab26f returns the reconnect behavior back to RHEL6.8 with only one SYN packet sent. [root@rhel6u9-node1 ~]# uname -r 2.6.32-696.1.1.el6.sf01836153.1.x86_64 [root@rhel6u9-node1 ~]# tshark -ntad -r /tmp/tcpdump3.pcap -R 'tcp.flags.syn == 1 || tcp.flags.fin == 1' Running as user "root" and group "root". This could be dangerous. 1 2017-05-04 13:09:04.253093 192.168.122.18 -> 192.168.122.35 TCP 74 887 > 2049 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=1168942 TSecr=0 WS=128 2 2017-05-04 13:09:04.254084 192.168.122.35 -> 192.168.122.18 TCP 74 2049 > 887 [SYN, ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 SACK_PERM=1 TSval=4103485175 TSecr=1168942 WS=64 [root@rhel6u9-node1 ~]# Might need to take this patch too commit 8b71798c0d389d4cadc884fc7d68c61ee8cd4f45 Author: Trond Myklebust <Trond.Myklebust> Date: Thu Sep 26 10:18:04 2013 -0400 SUNRPC: Only update the TCP connect cookie on a successful connect Signed-off-by: Trond Myklebust <Trond.Myklebust> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 208a763..9928ba1 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1511,6 +1511,7 @@ static void xs_tcp_state_change(struct sock *sk) transport->tcp_copied = 0; transport->tcp_flags = TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID; + xprt->connect_cookie++; xprt_wake_pending_tasks(xprt, -EAGAIN); } @@ -2164,7 +2165,6 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) case 0: case -EINPROGRESS: /* SYN_SENT! */ - xprt->connect_cookie++; if (xprt->reestablish_timeout < XS_TCP_INIT_REEST_TO) xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO; } There's multiple missing patches (circa 2013) involving connect_cookie (at least commit 8b71798c0d389d4cadc884fc7d68c61ee8cd4f45 and 0a6605213040dd2fb479f0d1a9a87a1d7fa70904). At this point I don't think we should derail this bug as it's got a clear test case and commit which fixes it. As a separate effort we should consider backports of other patches for connect_cookie in RHEL6 as other issues may be present due to the omissions. Patch(es) committed on kernel repository and kernel is undergoing testing Patch(es) available on kernel-2.6.32-704.el6 Created attachment 1279835 [details]
Sample test case which tests NFSv3, v4.0, and v4.1 and checks for count of SYN packets
Created attachment 1279836 [details]
Sample output from test on patched kernel, but note NFSv4.1 fails due to TCP connection not getting dropped after 10 minutes for some reason, this might be a separate bug, not sure.
Created attachment 1279846 [details]
Sample output from test on unpatched kernel, and note failure on NFSv4.0 due to SYN packet count == 8 which is more than expected == 3. However, for some reason NFSv3 passed the test and I cannot understand why this is - there's no burst of SYNs - I didn't see this often so I'm not sure about this.
This is all in our kbase https://access.redhat.com/solutions/3018371, but FWIW, this bug can trigger DoS of an NFS mount point in multiple ways and we don't need iptables to be enabled for that to happen. In one of my reproduction environments I saw a partial DoS described as follows. The NFS transports TCP 3-way handshake runs into problems due to the multiple SYNs from the NFS client. In the below trace, the NFS server responding to the second SYN, which confuses the NFS client's TCP stack. The following sequence occurs 1) Frames 48-49: NFS client sends a duplicate SYN, the first one has Seq=3677241340, and the second one has Seq=3677245016 2) Frame 50: NFS server responds with Ack=3677241341, which is a response to the first SYN from the NFS client 3) Frame 51: NFS client responds with RST and Seq=3677241341, indicating the Ack packet in frame 50 is not understood, and the connection should be reset 4) Frame 52: NFS server responds with RST, ACK, indicating it has reset the connection 5) Frames 53-57: The sequence in 1-4 repeats 6) Frames 58-62: The sequence in 1-4 repeats ~~~ 48 2017-05-17 20:24:50.684597 192.168.122.18 -> 192.168.122.16 TCP 74 [TCP Port numbers reused] 815 > 2049 [SYN] Seq=3677241340 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=966741 TSecr=0 WS=128 49 2017-05-17 20:24:50.684833 192.168.122.18 -> 192.168.122.16 TCP 74 [TCP Port numbers reused] 815 > 2049 [SYN] Seq=3677245016 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=966742 TSecr=0 WS=128 50 2017-05-17 20:24:50.685595 192.168.122.16 -> 192.168.122.18 TCP 74 2049 > 815 [SYN, ACK] Seq=1365940280 Ack=3677241341 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=1741462612 TSecr=966741 WS=128 51 2017-05-17 20:24:50.685625 192.168.122.18 -> 192.168.122.16 TCP 54 815 > 2049 [RST] Seq=3677241341 Win=0 Len=0 52 2017-05-17 20:24:50.685654 192.168.122.16 -> 192.168.122.18 TCP 54 2049 > 815 [RST, ACK] Seq=0 Ack=3677245017 Win=0 Len=0 53 2017-05-17 20:24:50.689371 192.168.122.18 -> 192.168.122.16 TCP 74 [TCP Port numbers reused] 815 > 2049 [SYN] Seq=3677316102 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=966746 TSecr=0 WS=128 54 2017-05-17 20:24:50.689452 192.168.122.18 -> 192.168.122.16 TCP 74 [TCP Port numbers reused] 815 > 2049 [SYN] Seq=3677317394 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=966746 TSecr=0 WS=128 55 2017-05-17 20:24:50.689666 192.168.122.16 -> 192.168.122.18 TCP 74 2049 > 815 [SYN, ACK] Seq=1366005876 Ack=3677316103 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=1741462616 TSecr=966746 WS=128 56 2017-05-17 20:24:50.689688 192.168.122.18 -> 192.168.122.16 TCP 54 815 > 2049 [RST] Seq=3677316103 Win=0 Len=0 57 2017-05-17 20:24:50.689711 192.168.122.16 -> 192.168.122.18 TCP 54 2049 > 815 [RST, ACK] Seq=0 Ack=3677317395 Win=0 Len=0 58 2017-05-17 20:24:50.689766 192.168.122.18 -> 192.168.122.16 TCP 74 [TCP Port numbers reused] 815 > 2049 [SYN] Seq=3677322298 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=966746 TSecr=0 WS=128 59 2017-05-17 20:24:50.689826 192.168.122.18 -> 192.168.122.16 TCP 74 [TCP Port numbers reused] 815 > 2049 [SYN] Seq=3677323273 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=966747 TSecr=0 WS=128 60 2017-05-17 20:24:50.689968 192.168.122.16 -> 192.168.122.18 TCP 74 2049 > 815 [SYN, ACK] Seq=1366010498 Ack=3677322299 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=1741462617 TSecr=966746 WS=128 61 2017-05-17 20:24:50.689980 192.168.122.18 -> 192.168.122.16 TCP 54 815 > 2049 [RST] Seq=3677322299 Win=0 Len=0 62 2017-05-17 20:24:50.690001 192.168.122.16 -> 192.168.122.18 TCP 54 2049 > 815 [RST, ACK] Seq=0 Ack=3677323274 Win=0 Len=0 ~~~ Created attachment 1280516 [details]
Sample test case which tests NFSv3, v4.0, and v4.1 and checks for count of SYN packets, v2
Changes from previous
- only sleep 5 minutes is necessary for idle (see XS_IDLE_DISC_TO)
- State which NFS version is
Test still fails on NFSv4.1 as it expects a disconnect after idle. However, in 4.1 we never go idle after mount since SEQUENCE ops are sent every second to renew the clientid (still needs a good code-level explanation of difference between 4.0 and 4.1 in this regard).
If you are looking for another test case, we're affected by this. NFS server is Isilon. Seen the issue on at least two RHEL6.9 clients. Relatively easy to reproduce, client is lan-connected to isilon; disabling iptables INVALID blocking worked around the problem Another troubled client traverses a checkpoint firewall to get to Isilon. It's the only desktop linux user we have, so only client that traverses this firewall. It's not entirely clear to me what's going on here, since the firewall is dropping packets from the isilon to the client - not the other way around. Or, at least, it's not logging those drops, possible that it's dropping them for "invalid" reasons but not logging it. I don't manage the firewall, can't look closely. Anyway, I don't need a response, but if you want some tcpdumps let me know. One of my affected clients has now had the problem without the DROP INVALID rule in place. (In reply to Dan Pritts from comment #50) > One of my affected clients has now had the problem without the DROP INVALID > rule in place. Yes it's possible this bug can occur without any iptables. The main 'signature' is that the second part of the TCP handshake (the SYN,ACK coming from the NFS server) is rejected by the NFS client, either by a iptables rule, or by the TCP stack itself, thinking the packed is invalid. As a result the 3-way handshake won't complete and the NFS TCP connection remains down, possibly indefinitely. See https://bugzilla.redhat.com/show_bug.cgi?id=1448170#c40 for more info of a typical signature without iptables Moving to VERIFIED according to test logs of comment #47. Will include this case as regression test in the future. Does Linux V3.X also have this issue? I took a look at https://elixir.free-electrons.com/linux/v3.19.8/source/net/sunrpc/xprtsock.c. Seems also need to patch. (In reply to jinjian.1 from comment #57) > Does Linux V3.X also have this issue? > > I took a look at > https://elixir.free-electrons.com/linux/v3.19.8/source/net/sunrpc/xprtsock.c. > > Seems also need to patch. As far as I know 3.19.8 is not a supported Red Hat kernel. If you have a question about whether this bug applies to a specific Red Hat kernel, please open a support case. Was this bug tagged "Fixed In Version: kernel-2.6.32-704.el6" by mistake? The RPM changelog has this entry instead: * Wed May 17 2017 Denys Vlasenko <dvlasenk> [2.6.32-696.5.1.el6] - [fs] sunrpc: Ensure that we wait for connections to complete before retrying (Dave Wysochanski) [1450850 1448170] and RHEL 6.9 only appears to be up to 2.6.32-696.30.1.el6 right now. (In reply to Andrew Lau from comment #61) > Was this bug tagged "Fixed In Version: kernel-2.6.32-704.el6" by mistake? > No, it's not a mistake. > The RPM changelog has this entry instead: > > * Wed May 17 2017 Denys Vlasenko <dvlasenk> [2.6.32-696.5.1.el6] > - [fs] sunrpc: Ensure that we wait for connections to complete before > retrying (Dave Wysochanski) [1450850 1448170] > > and RHEL 6.9 only appears to be up to 2.6.32-696.30.1.el6 right now. That is the RHEL6.9.z kernel (backport of same patch). This bug is for the RHEL6.10 kernel (Y-stream). The Y-stream contains much more testing but takes longer to release. For critical bugs that affect many customers, we have z-stream backports that release much faster but have less testing. You're seeing the difference between the two release streams here. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1854 |