Bug 1763454
| Summary: | libslirp sends RST to app in response to arriving FIN when containerized socket is shutdown() with SHUT_WR | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Robb Manes <rmanes> |
| Component: | slirp4netns | Assignee: | Jindrich Novy <jnovy> |
| Status: | CLOSED ERRATA | QA Contact: | atomic-bugs <atomic-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 8.0 | CC: | ajia, ddarrah, dornelas, dwalsh, gscrivan, jligon, jnovy, lsm5, mheon, ptalbert, tsweeney |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | 8.2 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | slirp4netns-0.4.2-1.git21fdece.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-04-28 15:47:44 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1186913, 1734579 | ||
I ran gdb to determine at what point slirp4netns resets the connection. Replicating the above test and having it run in gdb with a breakpoint on outgoing traffic, and inspecting each sent client send until we hit the reset I was able to find the following:
$ gdb --args slirp4netns --configure --mtu=65520 --disable-host-loopback 15307 tap0
(gdb) b vendor/libslirp/src/slirp.c:1143
Breakpoint 1 at 0x407ae0: file vendor/libslirp/src/slirp.c, line 1143.
(gdb) run
Starting program: /home/rmanes/Source/slirp4netns/slirp4netns --configure --mtu=65520 --disable-host-loopback 15307 tap0
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.29-22.fc30.x86_64
warning: Loadable section ".note.gnu.property" outside of ELF segments
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Detaching after fork from child process 30547]
sent tapfd=5 for tap0
received tapfd=5
Starting slirp
* MTU: 65520
* Network: 10.0.2.0
* Netmask: 255.255.255.0
* Gateway: 10.0.2.2
* DNS: 10.0.2.3
* Recommended IP: 10.0.2.100
Eventually it breaks a few times, but we catch it immediately before RST is sent like this:
Breakpoint 1, slirp_send_packet_all (slirp=0x435490, buf=0x7ffffffece30, len=54) at vendor/libslirp/src/slirp.c:1143
1143 ssize_t ret = slirp->cb->send_packet(buf, len, slirp->opaque);
(gdb) s
libslirp_send_packet (pkt=0x7ffffffece30, pkt_len=54, opaque=0x7fffffffd210) at slirp4netns.c:27
27 struct libslirp_data *data = (struct libslirp_data *)opaque;
(gdb) s
28 return write(data->tapfd, pkt, pkt_len);
(gdb) s
0x00007ffff7e368e0 in write () from /lib64/libpthread.so.0
With this stack:
(gdb) bt
#0 0x00007ffff7e368e0 in write () from /lib64/libpthread.so.0
#1 0x0000000000407afd in slirp_send_packet_all (slirp=<optimized out>, buf=<optimized out>, len=54) at vendor/libslirp/src/slirp.c:1143
#2 0x0000000000407ef8 in if_encap (slirp=slirp@entry=0x435490, ifm=ifm@entry=0x456cb0) at vendor/libslirp/src/slirp.c:970
#3 0x0000000000411067 in if_start (slirp=0x435490) at vendor/libslirp/src/if.c:183
#4 0x0000000000411230 in if_output (so=so@entry=0x430150, ifm=ifm@entry=0x456cb0) at vendor/libslirp/src/if.c:128
#5 0x00000000004139f3 in ip_output (so=so@entry=0x430150, m0=m0@entry=0x456cb0) at vendor/libslirp/src/ip_output.c:81
#6 0x000000000040c35f in tcp_output (tp=tp@entry=0x42a190) at vendor/libslirp/src/tcp_output.c:462
#7 0x000000000040d08e in tcp_drop (tp=0x42a190, err=<optimized out>) at vendor/libslirp/src/tcp_subr.c:310
#8 0x00000000004088ce in soread (so=so@entry=0x430150) at vendor/libslirp/src/socket.c:211
#9 0x000000000040759f in slirp_pollfds_poll (slirp=slirp@entry=0x435490, select_error=<optimized out>, get_revents=get_revents@entry=0x404b80 <libslirp_get_revents>, opaque=opaque@entry=0x42ca00) at vendor/libslirp/src/slirp.c:628
#10 0x000000000040516d in do_slirp (tapfd=tapfd@entry=5, readyfd=readyfd@entry=-1, exitfd=exitfd@entry=-1, api_socket=api_socket@entry=0x0, cfg=cfg@entry=0x7fffffffd2b0) at slirp4netns.c:393
#11 0x0000000000404405 in parent (target_pid=<optimized out>, cfg=0x7fffffffd2b0, api_socket=0x0, exit_fd=<optimized out>, ready_fd=<optimized out>, sock=3) at main.c:295
#12 main (argc=<optimized out>, argv=<optimized out>) at main.c:741
And right here, we send the RST after this instruction:
(gdb) s
Single stepping until exit from function write,
which has no line number information.
slirp_send_packet_all (slirp=<optimized out>, buf=<optimized out>, len=54) at vendor/libslirp/src/slirp.c:1145
1145 if (ret < 0) {
When we step past the above instruction the RST is sent as it falls to just regular write(), here is the section that calls write() of the RST frame:
(gdb) s
28 return write(data->tapfd, pkt, pkt_len);
(gdb) p data
$1 = (struct libslirp_data *) 0x7fffffffd210
(gdb) p pkt
$2 = (const void *) 0x7ffffffece30
Obviously without context, this doesn't help us much, so I am going through the frames to determine the state of the socket, but I'm unfamiliar with libslirp and am having to learn as I go. It looks like in soread we make the decision to call tcp_drop, which in turn calls a RST.
(gdb) frame 7
#7 0x000000000040d08e in tcp_drop (tp=0x42a190, err=<optimized out>) at vendor/libslirp/src/tcp_subr.c:310
310 (void)tcp_output(tp);
(gdb) l
305 DEBUG_ARG("tp = %p", tp);
306 DEBUG_ARG("errno = %d", errno);
307
308 if (TCPS_HAVERCVDSYN(tp->t_state)) {
309 tp->t_state = TCPS_CLOSED;
310 (void)tcp_output(tp);
311 }
312 return (tcp_close(tp));
313 }
is there a reason why this bug is set to private? The bug must be fixed in: https://gitlab.freedesktop.org/slirp/libslirp @Robb, I think your reproducer is great and we should use it for reporting the upstream issue as well. Would you mind to report the issue at https://gitlab.freedesktop.org/slirp/libslirp/issues or make the bug not private so that I can link to it? This is my proposed fix: diff --git a/vendor/libslirp/src/socket.c b/vendor/libslirp/src/socket.c index d96d8c4..2f20028 100644 --- a/vendor/libslirp/src/socket.c +++ b/vendor/libslirp/src/socket.c @@ -195,7 +195,9 @@ int soread(struct socket *so) err = errno; if (nn == 0) { - if (getpeername(so->s, paddr, &alen) < 0) { + int shutdown_wr = so->so_state & SS_FCANTSENDMORE; + + if (!shutdown_wr && getpeername(so->s, paddr, &alen) < 0) { err = errno; } else { getsockopt(so->s, SOL_SOCKET, SO_ERROR, &err, &elen); Once the fix is merged, we will need to revendor the library into slirp4netns. (In reply to Giuseppe Scrivano from comment #5) > is there a reason why this bug is set to private? Bad habit of mine. Will adjust now. > @Robb, I think your reproducer is great and we should use it for reporting > the upstream issue as well. Would you mind to report the issue at > https://gitlab.freedesktop.org/slirp/libslirp/issues or make the bug not > private so that I can link to it? Definitely, thank you very much, and I'll link this BZ to the upstream issue once it's made. Do you want me to send you the issue once made directly so you can reference your commit to it? yes please, I will followup with a PR by the way, could you please confirm if the patch solves the issue for you as well? (In reply to Giuseppe Scrivano from comment #8) > by the way, could you please confirm if the patch solves the issue for you > as well? Can confirm patch resolves the issue: $ git diff diff --git a/vendor/libslirp/src/socket.c b/vendor/libslirp/src/socket.c index d96d8c4..3e02357 100644 --- a/vendor/libslirp/src/socket.c +++ b/vendor/libslirp/src/socket.c @@ -195,7 +195,9 @@ int soread(struct socket *so) err = errno; if (nn == 0) { - if (getpeername(so->s, paddr, &alen) < 0) { + int shutdown_wr = so->so_state & SS_FCANTSENDMORE; + + if (!shutdown_wr && getpeername(so->s, paddr, &alen) < 0) { err = errno; } else { getsockopt(so->s, SOL_SOCKET, SO_ERROR, &err, &elen); $ ./slirp4netns --configure --mtu=65520 --disable-host-loopback 67134 tap0 sent tapfd=5 for tap0 received tapfd=5 Starting slirp * MTU: 65520 * Network: 10.0.2.0 * Netmask: 255.255.255.0 * Gateway: 10.0.2.2 * DNS: 10.0.2.3 * Recommended IP: 10.0.2.100 $ ip a s 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tap0: <BROADCAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UNKNOWN group default qlen 1000 link/ether d2:9f:2a:16:cd:11 brd ff:ff:ff:ff:ff:ff inet 10.0.2.100/24 brd 10.0.2.255 scope global tap0 valid_lft forever preferred_lft forever inet6 fe80::d09f:2aff:fe16:cd11/64 scope link valid_lft forever preferred_lft forever $ python ~/Downloads/client.py b'Accepted.. sending 3 pulses and closing the socket...' b'Pulse 0' b'Pulse 1' b'Pulse 2' It would have thrown an exception there if not working properly. Upstream issue is also made: https://gitlab.freedesktop.org/slirp/libslirp/issues/12 Thank you very much! fix merged upstream: https://github.com/rootless-containers/slirp4netns/pull/160 Verified this bug as SanityOnly, because I can't reproduce the bug in slirp4netns-0.4.0-2.module+el8.2.0+4570+2418d40d.x86_64, and no any exception like 'Connection reset by peer' is thrown by slirp4netns-0.4.2-1.git21fdece.el8 and slirp4netns-0.4.2-3.git21fdece.module+el8.2.0+5658+9a15711d.x86_64. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1650 |
Description of problem: When podman runs a rootless application in a container serving as a TCP client, and when that client calls shutdown() with SHUT_WR to no longer send data, but be able to receive it only, if the serverside TCP connction sends a FIN to the client running in the container, the FIN will be received by the application as a RST. This is easily reproduced in upstream slirp4netns outside of podman. To reproduce, create a small python client application that calls shutdown but continues to read from the socket, ensuring it sends a FIN: import socket sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect(("myserver.example.com", 9999)) sock.shutdown(socket.SHUT_WR) while True: data = sock.recv(1024) print(data) On the server system, create a server application that blindly sends data: import time import socket sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind(("0.0.0.0", 9999)) sock.listen(128) while True: client, address = sock.accept() client.send("Accepted.. sending 3 pulses and closing the socket...") for i in range(2): client.send("Pulse " + str(i)) time.sleep(5) client.close() After the applications are in place, create a network namespace independant of root rootns and run slirp4netns on it: Make the namespace as a non-root user: $ unshare --user --map-root-user --net --mount Get the PID of the bash process in the namespace: $ echo $$ 15307 In a seperate terminal, run slirp4netns to set up the userspace TCP/IP stack for that PID, and create the tap devices: $ slirp4netns --configure --mtu=65520 --disable-host-loopback 15307 tap0 sent tapfd=5 for tap0 received tapfd=5 Starting slirp * MTU: 65520 * Network: 10.0.2.0 * Netmask: 255.255.255.0 * Gateway: 10.0.2.2 * DNS: 10.0.2.3 * Recommended IP: 10.0.2.100 Start the server application on the serverside: root@myserver # python server.py Perform packet captures to observe traffic inside of the slirp network namespace, and in the root namespace on the container host: # tcpdump -i eth0 port 9999 -w /tmp/host.pcap # nsenter -t 15307 -n tcpdump -i tap0 port 9999 /tmp/slirp.pcap Inside of the namespace as the non-root user, run the client application and observe exception by RST TCP connection: # python client.py Accepted.. sending 3 pulses and closing the socket... Pulse 0 Pulse 1 Traceback (most recent call last): File "client.py", line 7, in <module> data = sock.recv(1024) socket.error: [Errno 104] Connection reset by peer Inspect packet captures, see the host side packet capture shows a FIN being received (frame 12): $ tshark -te -r /tmp/host.pcap 1 1571428959.091848 10.3.117.71 → 10.10.93.144 TCP 60 43108 → 9999 [SYN] Seq=0 Win=64680 Len=0 MSS=1320 SACK_PERM=1 TSval=784810176 TSecr=0 WS=128 2 1571428959.160007 10.10.93.144 → 10.3.117.71 TCP 60 9999 → 43108 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1320 SACK_PERM=1 TSval=268662887 TSecr=784810176 WS=128 3 1571428959.160151 10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [ACK] Seq=1 Ack=1 Win=64768 Len=0 TSval=784810245 TSecr=268662887 4 1571428959.160657 10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [FIN, ACK] Seq=1 Ack=1 Win=64768 Len=0 TSval=784810245 TSecr=268662887 5 1571428959.228354 10.10.93.144 → 10.3.117.71 TCP 105 9999 → 43108 [PSH, ACK] Seq=1 Ack=1 Win=29056 Len=53 TSval=268662955 TSecr=784810245 6 1571428959.228440 10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [ACK] Seq=2 Ack=54 Win=64768 Len=0 TSval=784810313 TSecr=268662955 7 1571428959.228489 10.10.93.144 → 10.3.117.71 TCP 52 9999 → 43108 [ACK] Seq=54 Ack=2 Win=29056 Len=0 TSval=268662956 TSecr=784810245 8 1571428959.296303 10.10.93.144 → 10.3.117.71 TCP 59 9999 → 43108 [PSH, ACK] Seq=54 Ack=2 Win=29056 Len=7 TSval=268663023 TSecr=784810313 9 1571428959.296401 10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [ACK] Seq=2 Ack=61 Win=64768 Len=0 TSval=784810381 TSecr=268663023 10 1571428964.233675 10.10.93.144 → 10.3.117.71 TCP 59 9999 → 43108 [PSH, ACK] Seq=61 Ack=2 Win=29056 Len=7 TSval=268667961 TSecr=784810381 11 1571428964.233781 10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [ACK] Seq=2 Ack=68 Win=64768 Len=0 TSval=784815318 TSecr=268667961 12 1571428969.238628 10.10.93.144 → 10.3.117.71 TCP 52 9999 → 43108 [FIN, ACK] Seq=68 Ack=2 Win=29056 Len=0 TSval=268672966 TSecr=784815318 13 1571428969.238662 10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [ACK] Seq=2 Ack=69 Win=64768 Len=0 TSval=784820323 TSecr=268672966 But in the slirp packet capture, the connection is reset: $ tshark -te -r /tmp/slirp.pcap 1 1571428959.090986 10.0.2.100 → 10.10.93.144 TCP 74 51074 → 9999 [SYN] Seq=0 Win=65480 Len=0 MSS=65480 SACK_PERM=1 TSval=2681814203 TSecr=0 WS=128 2 1571428959.160236 10.10.93.144 → 10.0.2.100 TCP 58 9999 → 51074 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=65480 3 1571428959.160324 10.0.2.100 → 10.10.93.144 TCP 54 51074 → 9999 [ACK] Seq=1 Ack=1 Win=65480 Len=0 4 1571428959.160532 10.0.2.100 → 10.10.93.144 TCP 54 51074 → 9999 [FIN, ACK] Seq=1 Ack=1 Win=65480 Len=0 5 1571428959.160709 10.10.93.144 → 10.0.2.100 TCP 54 9999 → 51074 [ACK] Seq=1 Ack=2 Win=65535 Len=0 6 1571428959.228560 10.10.93.144 → 10.0.2.100 TCP 107 9999 → 51074 [PSH, ACK] Seq=1 Ack=2 Win=65535 Len=53 7 1571428959.228619 10.0.2.100 → 10.10.93.144 TCP 54 51074 → 9999 [ACK] Seq=2 Ack=54 Win=65427 Len=0 8 1571428959.296468 10.10.93.144 → 10.0.2.100 TCP 61 9999 → 51074 [PSH, ACK] Seq=54 Ack=2 Win=65535 Len=7 9 1571428959.296524 10.0.2.100 → 10.10.93.144 TCP 54 51074 → 9999 [ACK] Seq=2 Ack=61 Win=65420 Len=0 10 1571428964.233847 10.10.93.144 → 10.0.2.100 TCP 61 9999 → 51074 [PSH, ACK] Seq=61 Ack=2 Win=65535 Len=7 11 1571428964.233903 10.0.2.100 → 10.10.93.144 TCP 54 51074 → 9999 [ACK] Seq=2 Ack=68 Win=65413 Len=0 12 1571428969.238721 10.10.93.144 → 10.0.2.100 TCP 54 9999 → 51074 [RST, ACK] Seq=68 Ack=2 Win=65535 Len=0 The connection is either torn down too soon by slirp4netns due to the early shutdown that the client receives a RST, or some other related issue to receiving a FIN after shutdown() is called. While all data is received by the client application in this scenario, it does throw exceptions in client applications that are gracefully shutdown. Version-Release number of selected component (if applicable): Upstream slirp4netns, compiled from source with vendored libslirp. Actual customer is testing in RHEL7, I verified this exists in RHEL8 and in upstream compiled on Fedora30 and directly from github. How reproducible: Every time. Steps to Reproduce: Reproduction instructions are provided in the description. Actual results: When the server-side sends a FIN, the client receives a RST. Expected results: FIN should be carried to application, so it does not appear to the client that a RST has arrived from the server. Additional info: More debugging notes to follow.