Bug 1763454

Summary:	libslirp sends RST to app in response to arriving FIN when containerized socket is shutdown() with SHUT_WR
Product:	Red Hat Enterprise Linux 8	Reporter:	Robb Manes <rmanes>
Component:	slirp4netns	Assignee:	Jindrich Novy <jnovy>
Status:	CLOSED ERRATA	QA Contact:	atomic-bugs <atomic-bugs>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	8.0	CC:	ajia, ddarrah, dornelas, dwalsh, gscrivan, jligon, jnovy, lsm5, mheon, ptalbert, tsweeney
Target Milestone:	rc
Target Release:	8.2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	slirp4netns-0.4.2-1.git21fdece.el8	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-04-28 15:47:44 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1186913, 1734579

Description Robb Manes 2019-10-19 22:29:11 UTC

Description of problem:
When podman runs a rootless application in a container serving as a TCP client, and when that client calls shutdown() with SHUT_WR to no longer send data, but be able to receive it only, if the serverside TCP connction sends a FIN to the client running in the container, the FIN will be received by the application as a RST.

This is easily reproduced in upstream slirp4netns outside of podman. 

To reproduce, create a small python client application that calls shutdown but continues to read from the socket, ensuring it sends a FIN:

import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("myserver.example.com", 9999))
sock.shutdown(socket.SHUT_WR)
while True:
	data = sock.recv(1024)
	print(data)
	
On the server system, create a server application that blindly sends data:

import time
import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(("0.0.0.0", 9999))
sock.listen(128)
while True:
	client, address = sock.accept()
	client.send("Accepted.. sending 3 pulses and closing the socket...")
	for i in range(2):
	client.send("Pulse " + str(i))
	time.sleep(5)
	client.close()

After the applications are in place, create a network namespace independant of root rootns and run slirp4netns on it:

Make the namespace as a non-root user:
$ unshare --user --map-root-user --net --mount

Get the PID of the bash process in the namespace:
$ echo $$
15307

In a seperate terminal, run slirp4netns to set up the userspace TCP/IP stack for that PID, and create the tap devices:
$ slirp4netns --configure --mtu=65520 --disable-host-loopback 15307 tap0
sent tapfd=5 for tap0
received tapfd=5
Starting slirp
* MTU:             65520
* Network:         10.0.2.0
* Netmask:         255.255.255.0
* Gateway:         10.0.2.2
* DNS:             10.0.2.3
* Recommended IP:  10.0.2.100

Start the server application on the serverside:

root@myserver # python server.py

Perform packet captures to observe traffic inside of the slirp network namespace, and in the root namespace on the container host:

# tcpdump -i eth0 port 9999 -w /tmp/host.pcap

# nsenter -t 15307 -n tcpdump -i tap0 port 9999 /tmp/slirp.pcap

Inside of the namespace as the non-root user, run the client application and observe exception by RST TCP connection:

# python client.py 
Accepted.. sending 3 pulses and closing the socket...
Pulse 0
Pulse 1
Traceback (most recent call last):
  File "client.py", line 7, in <module>
	data = sock.recv(1024)
socket.error: [Errno 104] Connection reset by peer

Inspect packet captures, see the host side packet capture shows a FIN being received (frame 12):
$ tshark -te -r /tmp/host.pcap
	1 1571428959.091848  10.3.117.71 → 10.10.93.144 TCP 60 43108 → 9999 [SYN] Seq=0 Win=64680 Len=0 MSS=1320 SACK_PERM=1 TSval=784810176 TSecr=0 WS=128
	2 1571428959.160007 10.10.93.144 → 10.3.117.71  TCP 60 9999 → 43108 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1320 SACK_PERM=1 TSval=268662887 TSecr=784810176 WS=128
	3 1571428959.160151  10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [ACK] Seq=1 Ack=1 Win=64768 Len=0 TSval=784810245 TSecr=268662887
	4 1571428959.160657  10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [FIN, ACK] Seq=1 Ack=1 Win=64768 Len=0 TSval=784810245 TSecr=268662887
	5 1571428959.228354 10.10.93.144 → 10.3.117.71  TCP 105 9999 → 43108 [PSH, ACK] Seq=1 Ack=1 Win=29056 Len=53 TSval=268662955 TSecr=784810245
	6 1571428959.228440  10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [ACK] Seq=2 Ack=54 Win=64768 Len=0 TSval=784810313 TSecr=268662955
	7 1571428959.228489 10.10.93.144 → 10.3.117.71  TCP 52 9999 → 43108 [ACK] Seq=54 Ack=2 Win=29056 Len=0 TSval=268662956 TSecr=784810245
	8 1571428959.296303 10.10.93.144 → 10.3.117.71  TCP 59 9999 → 43108 [PSH, ACK] Seq=54 Ack=2 Win=29056 Len=7 TSval=268663023 TSecr=784810313
	9 1571428959.296401  10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [ACK] Seq=2 Ack=61 Win=64768 Len=0 TSval=784810381 TSecr=268663023
   10 1571428964.233675 10.10.93.144 → 10.3.117.71  TCP 59 9999 → 43108 [PSH, ACK] Seq=61 Ack=2 Win=29056 Len=7 TSval=268667961 TSecr=784810381
   11 1571428964.233781  10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [ACK] Seq=2 Ack=68 Win=64768 Len=0 TSval=784815318 TSecr=268667961
   12 1571428969.238628 10.10.93.144 → 10.3.117.71  TCP 52 9999 → 43108 [FIN, ACK] Seq=68 Ack=2 Win=29056 Len=0 TSval=268672966 TSecr=784815318
   13 1571428969.238662  10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [ACK] Seq=2 Ack=69 Win=64768 Len=0 TSval=784820323 TSecr=268672966
   
But in the slirp packet capture, the connection is reset:

$ tshark -te -r /tmp/slirp.pcap
	1 1571428959.090986   10.0.2.100 → 10.10.93.144 TCP 74 51074 → 9999 [SYN] Seq=0 Win=65480 Len=0 MSS=65480 SACK_PERM=1 TSval=2681814203 TSecr=0 WS=128
	2 1571428959.160236 10.10.93.144 → 10.0.2.100   TCP 58 9999 → 51074 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=65480
	3 1571428959.160324   10.0.2.100 → 10.10.93.144 TCP 54 51074 → 9999 [ACK] Seq=1 Ack=1 Win=65480 Len=0
	4 1571428959.160532   10.0.2.100 → 10.10.93.144 TCP 54 51074 → 9999 [FIN, ACK] Seq=1 Ack=1 Win=65480 Len=0
	5 1571428959.160709 10.10.93.144 → 10.0.2.100   TCP 54 9999 → 51074 [ACK] Seq=1 Ack=2 Win=65535 Len=0
	6 1571428959.228560 10.10.93.144 → 10.0.2.100   TCP 107 9999 → 51074 [PSH, ACK] Seq=1 Ack=2 Win=65535 Len=53
	7 1571428959.228619   10.0.2.100 → 10.10.93.144 TCP 54 51074 → 9999 [ACK] Seq=2 Ack=54 Win=65427 Len=0
	8 1571428959.296468 10.10.93.144 → 10.0.2.100   TCP 61 9999 → 51074 [PSH, ACK] Seq=54 Ack=2 Win=65535 Len=7
	9 1571428959.296524   10.0.2.100 → 10.10.93.144 TCP 54 51074 → 9999 [ACK] Seq=2 Ack=61 Win=65420 Len=0
   10 1571428964.233847 10.10.93.144 → 10.0.2.100   TCP 61 9999 → 51074 [PSH, ACK] Seq=61 Ack=2 Win=65535 Len=7
   11 1571428964.233903   10.0.2.100 → 10.10.93.144 TCP 54 51074 → 9999 [ACK] Seq=2 Ack=68 Win=65413 Len=0
   12 1571428969.238721 10.10.93.144 → 10.0.2.100   TCP 54 9999 → 51074 [RST, ACK] Seq=68 Ack=2 Win=65535 Len=0

The connection is either torn down too soon by slirp4netns due to the early shutdown that the client receives a RST, or some other related issue to receiving a FIN after shutdown() is called.

While all data is received by the client application in this scenario, it does throw exceptions in client applications that are gracefully shutdown.
	
Version-Release number of selected component (if applicable):
Upstream slirp4netns, compiled from source with vendored libslirp.  Actual customer is testing in RHEL7, I verified this exists in RHEL8 and in upstream compiled on Fedora30 and directly from github.

How reproducible:
Every time.

Steps to Reproduce:
Reproduction instructions are provided in the description.

Actual results:
When the server-side sends a FIN, the client receives a RST.

Expected results:
FIN should be carried to application, so it does not appear to the client that a RST has arrived from the server.

Additional info:
More debugging notes to follow.

Comment 1 Robb Manes 2019-10-19 22:29:41 UTC

I ran gdb to determine at what point slirp4netns resets the connection.  Replicating the above test and having it run in gdb with a breakpoint on outgoing traffic, and inspecting each sent client send until we hit the reset I was able to find the following:

$ gdb --args slirp4netns --configure --mtu=65520 --disable-host-loopback 15307 tap0

(gdb) b vendor/libslirp/src/slirp.c:1143
Breakpoint 1 at 0x407ae0: file vendor/libslirp/src/slirp.c, line 1143.

(gdb) run
Starting program: /home/rmanes/Source/slirp4netns/slirp4netns --configure --mtu=65520 --disable-host-loopback 15307 tap0
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.29-22.fc30.x86_64
warning: Loadable section ".note.gnu.property" outside of ELF segments
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Detaching after fork from child process 30547]
sent tapfd=5 for tap0
received tapfd=5
Starting slirp
* MTU:             65520
* Network:         10.0.2.0
* Netmask:         255.255.255.0
* Gateway:         10.0.2.2
* DNS:             10.0.2.3
* Recommended IP:  10.0.2.100

Eventually it breaks a few times, but we catch it immediately before RST is sent like this:

Breakpoint 1, slirp_send_packet_all (slirp=0x435490, buf=0x7ffffffece30, len=54) at vendor/libslirp/src/slirp.c:1143
1143        ssize_t ret = slirp->cb->send_packet(buf, len, slirp->opaque);
(gdb) s
libslirp_send_packet (pkt=0x7ffffffece30, pkt_len=54, opaque=0x7fffffffd210) at slirp4netns.c:27
27          struct libslirp_data *data = (struct libslirp_data *)opaque;
(gdb) s
28          return write(data->tapfd, pkt, pkt_len);
(gdb) s
0x00007ffff7e368e0 in write () from /lib64/libpthread.so.0

With this stack:

(gdb) bt
#0  0x00007ffff7e368e0 in write () from /lib64/libpthread.so.0
#1  0x0000000000407afd in slirp_send_packet_all (slirp=<optimized out>, buf=<optimized out>, len=54) at vendor/libslirp/src/slirp.c:1143
#2  0x0000000000407ef8 in if_encap (slirp=slirp@entry=0x435490, ifm=ifm@entry=0x456cb0) at vendor/libslirp/src/slirp.c:970
#3  0x0000000000411067 in if_start (slirp=0x435490) at vendor/libslirp/src/if.c:183
#4  0x0000000000411230 in if_output (so=so@entry=0x430150, ifm=ifm@entry=0x456cb0) at vendor/libslirp/src/if.c:128
#5  0x00000000004139f3 in ip_output (so=so@entry=0x430150, m0=m0@entry=0x456cb0) at vendor/libslirp/src/ip_output.c:81
#6  0x000000000040c35f in tcp_output (tp=tp@entry=0x42a190) at vendor/libslirp/src/tcp_output.c:462
#7  0x000000000040d08e in tcp_drop (tp=0x42a190, err=<optimized out>) at vendor/libslirp/src/tcp_subr.c:310
#8  0x00000000004088ce in soread (so=so@entry=0x430150) at vendor/libslirp/src/socket.c:211
#9  0x000000000040759f in slirp_pollfds_poll (slirp=slirp@entry=0x435490, select_error=<optimized out>, get_revents=get_revents@entry=0x404b80 <libslirp_get_revents>, opaque=opaque@entry=0x42ca00) at vendor/libslirp/src/slirp.c:628
#10 0x000000000040516d in do_slirp (tapfd=tapfd@entry=5, readyfd=readyfd@entry=-1, exitfd=exitfd@entry=-1, api_socket=api_socket@entry=0x0, cfg=cfg@entry=0x7fffffffd2b0) at slirp4netns.c:393
#11 0x0000000000404405 in parent (target_pid=<optimized out>, cfg=0x7fffffffd2b0, api_socket=0x0, exit_fd=<optimized out>, ready_fd=<optimized out>, sock=3) at main.c:295
#12 main (argc=<optimized out>, argv=<optimized out>) at main.c:741

And right here, we send the RST after this instruction:

(gdb) s
Single stepping until exit from function write,
which has no line number information.
slirp_send_packet_all (slirp=<optimized out>, buf=<optimized out>, len=54) at vendor/libslirp/src/slirp.c:1145
1145        if (ret < 0) {

When we step past the above instruction the RST is sent as it falls to just regular write(), here is the section that calls write() of the RST frame:

(gdb) s
28          return write(data->tapfd, pkt, pkt_len);

(gdb) p data
$1 = (struct libslirp_data *) 0x7fffffffd210

(gdb) p pkt
$2 = (const void *) 0x7ffffffece30

Obviously without context, this doesn't help us much, so I am going through the frames to determine the state of the socket, but I'm unfamiliar with libslirp and am having to learn as I go.  It looks like in soread we make the decision to call tcp_drop, which in turn calls a RST.

(gdb) frame 7
#7  0x000000000040d08e in tcp_drop (tp=0x42a190, err=<optimized out>) at vendor/libslirp/src/tcp_subr.c:310
310             (void)tcp_output(tp);

(gdb) l
305         DEBUG_ARG("tp = %p", tp);
306         DEBUG_ARG("errno = %d", errno);
307
308         if (TCPS_HAVERCVDSYN(tp->t_state)) {
309             tp->t_state = TCPS_CLOSED;
310             (void)tcp_output(tp);
311         }
312         return (tcp_close(tp));
313     }

Comment 5 Giuseppe Scrivano 2019-11-19 10:07:39 UTC

is there a reason why this bug is set to private?

The bug must be fixed in: https://gitlab.freedesktop.org/slirp/libslirp

@Robb, I think your reproducer is great and we should use it for reporting the upstream issue as well.  Would you mind to report the issue at https://gitlab.freedesktop.org/slirp/libslirp/issues or make the bug not private so that I can link to it?

This is my proposed fix:

diff --git a/vendor/libslirp/src/socket.c b/vendor/libslirp/src/socket.c
index d96d8c4..2f20028 100644
--- a/vendor/libslirp/src/socket.c
+++ b/vendor/libslirp/src/socket.c
@@ -195,7 +195,9 @@ int soread(struct socket *so)
 
             err = errno;
             if (nn == 0) {
-                if (getpeername(so->s, paddr, &alen) < 0) {
+                int shutdown_wr = so->so_state & SS_FCANTSENDMORE;
+
+                if (!shutdown_wr && getpeername(so->s, paddr, &alen) < 0) {
                     err = errno;
                 } else {
                     getsockopt(so->s, SOL_SOCKET, SO_ERROR, &err, &elen);

Once the fix is merged, we will need to revendor the library into slirp4netns.

Comment 6 Robb Manes 2019-11-19 14:48:30 UTC

(In reply to Giuseppe Scrivano from comment #5)
> is there a reason why this bug is set to private?

Bad habit of mine.  Will adjust now.

> @Robb, I think your reproducer is great and we should use it for reporting
> the upstream issue as well.  Would you mind to report the issue at
> https://gitlab.freedesktop.org/slirp/libslirp/issues or make the bug not
> private so that I can link to it?

Definitely, thank you very much, and I'll link this BZ to the upstream issue once it's made.  Do you want me to send you the issue once made directly so you can reference your commit to it?

Comment 7 Giuseppe Scrivano 2019-11-19 14:52:42 UTC

yes please, I will followup with a PR

Comment 8 Giuseppe Scrivano 2019-11-19 14:53:39 UTC

by the way, could you please confirm if the patch solves the issue for you as well?

Comment 9 Robb Manes 2019-11-19 15:05:36 UTC

(In reply to Giuseppe Scrivano from comment #8)
> by the way, could you please confirm if the patch solves the issue for you
> as well?

Can confirm patch resolves the issue:

$ git diff
diff --git a/vendor/libslirp/src/socket.c b/vendor/libslirp/src/socket.c
index d96d8c4..3e02357 100644
--- a/vendor/libslirp/src/socket.c
+++ b/vendor/libslirp/src/socket.c
@@ -195,7 +195,9 @@ int soread(struct socket *so)
 
             err = errno;
             if (nn == 0) {
-                if (getpeername(so->s, paddr, &alen) < 0) {
+                int shutdown_wr = so->so_state & SS_FCANTSENDMORE;
+
+                if (!shutdown_wr && getpeername(so->s, paddr, &alen) < 0) {
                     err = errno;
                 } else {
                     getsockopt(so->s, SOL_SOCKET, SO_ERROR, &err, &elen);


$ ./slirp4netns --configure --mtu=65520 --disable-host-loopback 67134 tap0
sent tapfd=5 for tap0
received tapfd=5
Starting slirp
* MTU:             65520
* Network:         10.0.2.0
* Netmask:         255.255.255.0
* Gateway:         10.0.2.2
* DNS:             10.0.2.3
* Recommended IP:  10.0.2.100


$ ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: tap0: <BROADCAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UNKNOWN group default qlen 1000
    link/ether d2:9f:2a:16:cd:11 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.100/24 brd 10.0.2.255 scope global tap0
       valid_lft forever preferred_lft forever
    inet6 fe80::d09f:2aff:fe16:cd11/64 scope link 
       valid_lft forever preferred_lft forever


$ python ~/Downloads/client.py 
b'Accepted.. sending 3 pulses and closing the socket...'
b'Pulse 0'
b'Pulse 1'
b'Pulse 2'

It would have thrown an exception there if not working properly.

Upstream issue is also made:

https://gitlab.freedesktop.org/slirp/libslirp/issues/12

Thank you very much!

Comment 10 Giuseppe Scrivano 2019-11-25 09:40:31 UTC

fix merged upstream: https://github.com/rootless-containers/slirp4netns/pull/160

Comment 15 Alex Jia 2020-03-30 04:27:47 UTC

Verified this bug as SanityOnly, because I can't reproduce the bug in slirp4netns-0.4.0-2.module+el8.2.0+4570+2418d40d.x86_64,
and no any exception like 'Connection reset by peer' is thrown by slirp4netns-0.4.2-1.git21fdece.el8 and
slirp4netns-0.4.2-3.git21fdece.module+el8.2.0+5658+9a15711d.x86_64.

Comment 17 errata-xmlrpc 2020-04-28 15:47:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1650