Bug 1763454
Summary: | libslirp sends RST to app in response to arriving FIN when containerized socket is shutdown() with SHUT_WR | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Robb Manes <rmanes> |
Component: | slirp4netns | Assignee: | Jindrich Novy <jnovy> |
Status: | CLOSED ERRATA | QA Contact: | atomic-bugs <atomic-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 8.0 | CC: | ajia, ddarrah, dornelas, dwalsh, gscrivan, jligon, jnovy, lsm5, mheon, ptalbert, tsweeney |
Target Milestone: | rc | ||
Target Release: | 8.2 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | slirp4netns-0.4.2-1.git21fdece.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-04-28 15:47:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1186913, 1734579 |
Description
Robb Manes
2019-10-19 22:29:11 UTC
I ran gdb to determine at what point slirp4netns resets the connection. Replicating the above test and having it run in gdb with a breakpoint on outgoing traffic, and inspecting each sent client send until we hit the reset I was able to find the following: $ gdb --args slirp4netns --configure --mtu=65520 --disable-host-loopback 15307 tap0 (gdb) b vendor/libslirp/src/slirp.c:1143 Breakpoint 1 at 0x407ae0: file vendor/libslirp/src/slirp.c, line 1143. (gdb) run Starting program: /home/rmanes/Source/slirp4netns/slirp4netns --configure --mtu=65520 --disable-host-loopback 15307 tap0 Missing separate debuginfos, use: dnf debuginfo-install glibc-2.29-22.fc30.x86_64 warning: Loadable section ".note.gnu.property" outside of ELF segments [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [Detaching after fork from child process 30547] sent tapfd=5 for tap0 received tapfd=5 Starting slirp * MTU: 65520 * Network: 10.0.2.0 * Netmask: 255.255.255.0 * Gateway: 10.0.2.2 * DNS: 10.0.2.3 * Recommended IP: 10.0.2.100 Eventually it breaks a few times, but we catch it immediately before RST is sent like this: Breakpoint 1, slirp_send_packet_all (slirp=0x435490, buf=0x7ffffffece30, len=54) at vendor/libslirp/src/slirp.c:1143 1143 ssize_t ret = slirp->cb->send_packet(buf, len, slirp->opaque); (gdb) s libslirp_send_packet (pkt=0x7ffffffece30, pkt_len=54, opaque=0x7fffffffd210) at slirp4netns.c:27 27 struct libslirp_data *data = (struct libslirp_data *)opaque; (gdb) s 28 return write(data->tapfd, pkt, pkt_len); (gdb) s 0x00007ffff7e368e0 in write () from /lib64/libpthread.so.0 With this stack: (gdb) bt #0 0x00007ffff7e368e0 in write () from /lib64/libpthread.so.0 #1 0x0000000000407afd in slirp_send_packet_all (slirp=<optimized out>, buf=<optimized out>, len=54) at vendor/libslirp/src/slirp.c:1143 #2 0x0000000000407ef8 in if_encap (slirp=slirp@entry=0x435490, ifm=ifm@entry=0x456cb0) at vendor/libslirp/src/slirp.c:970 #3 0x0000000000411067 in if_start (slirp=0x435490) at vendor/libslirp/src/if.c:183 #4 0x0000000000411230 in if_output (so=so@entry=0x430150, ifm=ifm@entry=0x456cb0) at vendor/libslirp/src/if.c:128 #5 0x00000000004139f3 in ip_output (so=so@entry=0x430150, m0=m0@entry=0x456cb0) at vendor/libslirp/src/ip_output.c:81 #6 0x000000000040c35f in tcp_output (tp=tp@entry=0x42a190) at vendor/libslirp/src/tcp_output.c:462 #7 0x000000000040d08e in tcp_drop (tp=0x42a190, err=<optimized out>) at vendor/libslirp/src/tcp_subr.c:310 #8 0x00000000004088ce in soread (so=so@entry=0x430150) at vendor/libslirp/src/socket.c:211 #9 0x000000000040759f in slirp_pollfds_poll (slirp=slirp@entry=0x435490, select_error=<optimized out>, get_revents=get_revents@entry=0x404b80 <libslirp_get_revents>, opaque=opaque@entry=0x42ca00) at vendor/libslirp/src/slirp.c:628 #10 0x000000000040516d in do_slirp (tapfd=tapfd@entry=5, readyfd=readyfd@entry=-1, exitfd=exitfd@entry=-1, api_socket=api_socket@entry=0x0, cfg=cfg@entry=0x7fffffffd2b0) at slirp4netns.c:393 #11 0x0000000000404405 in parent (target_pid=<optimized out>, cfg=0x7fffffffd2b0, api_socket=0x0, exit_fd=<optimized out>, ready_fd=<optimized out>, sock=3) at main.c:295 #12 main (argc=<optimized out>, argv=<optimized out>) at main.c:741 And right here, we send the RST after this instruction: (gdb) s Single stepping until exit from function write, which has no line number information. slirp_send_packet_all (slirp=<optimized out>, buf=<optimized out>, len=54) at vendor/libslirp/src/slirp.c:1145 1145 if (ret < 0) { When we step past the above instruction the RST is sent as it falls to just regular write(), here is the section that calls write() of the RST frame: (gdb) s 28 return write(data->tapfd, pkt, pkt_len); (gdb) p data $1 = (struct libslirp_data *) 0x7fffffffd210 (gdb) p pkt $2 = (const void *) 0x7ffffffece30 Obviously without context, this doesn't help us much, so I am going through the frames to determine the state of the socket, but I'm unfamiliar with libslirp and am having to learn as I go. It looks like in soread we make the decision to call tcp_drop, which in turn calls a RST. (gdb) frame 7 #7 0x000000000040d08e in tcp_drop (tp=0x42a190, err=<optimized out>) at vendor/libslirp/src/tcp_subr.c:310 310 (void)tcp_output(tp); (gdb) l 305 DEBUG_ARG("tp = %p", tp); 306 DEBUG_ARG("errno = %d", errno); 307 308 if (TCPS_HAVERCVDSYN(tp->t_state)) { 309 tp->t_state = TCPS_CLOSED; 310 (void)tcp_output(tp); 311 } 312 return (tcp_close(tp)); 313 } is there a reason why this bug is set to private? The bug must be fixed in: https://gitlab.freedesktop.org/slirp/libslirp @Robb, I think your reproducer is great and we should use it for reporting the upstream issue as well. Would you mind to report the issue at https://gitlab.freedesktop.org/slirp/libslirp/issues or make the bug not private so that I can link to it? This is my proposed fix: diff --git a/vendor/libslirp/src/socket.c b/vendor/libslirp/src/socket.c index d96d8c4..2f20028 100644 --- a/vendor/libslirp/src/socket.c +++ b/vendor/libslirp/src/socket.c @@ -195,7 +195,9 @@ int soread(struct socket *so) err = errno; if (nn == 0) { - if (getpeername(so->s, paddr, &alen) < 0) { + int shutdown_wr = so->so_state & SS_FCANTSENDMORE; + + if (!shutdown_wr && getpeername(so->s, paddr, &alen) < 0) { err = errno; } else { getsockopt(so->s, SOL_SOCKET, SO_ERROR, &err, &elen); Once the fix is merged, we will need to revendor the library into slirp4netns. (In reply to Giuseppe Scrivano from comment #5) > is there a reason why this bug is set to private? Bad habit of mine. Will adjust now. > @Robb, I think your reproducer is great and we should use it for reporting > the upstream issue as well. Would you mind to report the issue at > https://gitlab.freedesktop.org/slirp/libslirp/issues or make the bug not > private so that I can link to it? Definitely, thank you very much, and I'll link this BZ to the upstream issue once it's made. Do you want me to send you the issue once made directly so you can reference your commit to it? yes please, I will followup with a PR by the way, could you please confirm if the patch solves the issue for you as well? (In reply to Giuseppe Scrivano from comment #8) > by the way, could you please confirm if the patch solves the issue for you > as well? Can confirm patch resolves the issue: $ git diff diff --git a/vendor/libslirp/src/socket.c b/vendor/libslirp/src/socket.c index d96d8c4..3e02357 100644 --- a/vendor/libslirp/src/socket.c +++ b/vendor/libslirp/src/socket.c @@ -195,7 +195,9 @@ int soread(struct socket *so) err = errno; if (nn == 0) { - if (getpeername(so->s, paddr, &alen) < 0) { + int shutdown_wr = so->so_state & SS_FCANTSENDMORE; + + if (!shutdown_wr && getpeername(so->s, paddr, &alen) < 0) { err = errno; } else { getsockopt(so->s, SOL_SOCKET, SO_ERROR, &err, &elen); $ ./slirp4netns --configure --mtu=65520 --disable-host-loopback 67134 tap0 sent tapfd=5 for tap0 received tapfd=5 Starting slirp * MTU: 65520 * Network: 10.0.2.0 * Netmask: 255.255.255.0 * Gateway: 10.0.2.2 * DNS: 10.0.2.3 * Recommended IP: 10.0.2.100 $ ip a s 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tap0: <BROADCAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UNKNOWN group default qlen 1000 link/ether d2:9f:2a:16:cd:11 brd ff:ff:ff:ff:ff:ff inet 10.0.2.100/24 brd 10.0.2.255 scope global tap0 valid_lft forever preferred_lft forever inet6 fe80::d09f:2aff:fe16:cd11/64 scope link valid_lft forever preferred_lft forever $ python ~/Downloads/client.py b'Accepted.. sending 3 pulses and closing the socket...' b'Pulse 0' b'Pulse 1' b'Pulse 2' It would have thrown an exception there if not working properly. Upstream issue is also made: https://gitlab.freedesktop.org/slirp/libslirp/issues/12 Thank you very much! fix merged upstream: https://github.com/rootless-containers/slirp4netns/pull/160 Verified this bug as SanityOnly, because I can't reproduce the bug in slirp4netns-0.4.0-2.module+el8.2.0+4570+2418d40d.x86_64, and no any exception like 'Connection reset by peer' is thrown by slirp4netns-0.4.2-1.git21fdece.el8 and slirp4netns-0.4.2-3.git21fdece.module+el8.2.0+5658+9a15711d.x86_64. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1650 |