Bug 1205258

Summary: Busy loop in recv(MSG_PEEK|MSG_WAITALL)
Product: Red Hat Enterprise Linux 7 Reporter: Enrico Scholz <rh-bugzilla>
Component: kernelAssignee: Sabrina Dubroca <sdubroca>
kernel sub component: tcp QA Contact: Hangbin Liu <haliu>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: dan, fweimer, hsowa, jstancek, network-qe, rkhan, sdubroca
Version: 7.0   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-3.10.0-306.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 21:48:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Enrico Scholz 2015-03-24 14:18:42 UTC
Description of problem:

Kernel goes into a busy loop when it waits for more data in a recv(MSG_PEEK|MSG_WAITALL) call.

E.g.

------
#include <stdlib.h>
#include <netinet/ip.h>

int main(void)
{
	struct sockaddr_in	addr = {
		.sin_family	= AF_INET,
		.sin_port	= htons(1234),
		.sin_addr	= { INADDR_ANY }
	};
	int			conn;
	char			buf[16];

	int			s = socket(AF_INET, SOCK_STREAM, 0);

	bind(s, (void *)&addr, sizeof addr);
	listen(s, 1);

	conn = accept(s, NULL, 0);

	recv(conn, buf, sizeof buf, MSG_PEEK|MSG_WAITALL);
}
----
$ gcc x.c
$ a.out &

$ nc 127.0.0.1 1234
1234<enter>

--> 'a.out' consumes 100% CPU


'a.out' stays alive and consumes CPU when the 'nc' connection is closed unclean (e.g. no TCP FIN/RST).  This can be used for DDOS attacks.


Version-Release number of selected component (if applicable):

kernel-3.10.0-123.20.1.el7.x86_64

How reproducible:

100%

Comment 2 Jiri Pirko 2015-04-08 08:30:18 UTC
Upstream kernel behaves the same. MSG_WAITALL tells kernel to wait until whole buffer can be filled. Looks like the combination with MSG_PEEK is not handled properly in tcp_recvmsg:

                if (copied >= target) {
                        /* Do not sleep, just process backlog. */
                        release_sock(sk);
                        lock_sock(sk);
                } else
                        sk_wait_data(sk, &timeo);

In case both MSG_PEEK and MSG_WAITALL are there, sk_wait_data is not called.

Comment 3 Hannes Frederic Sowa 2015-04-13 12:08:45 UTC
lock_sock (the only lock taken at that moment) is preemptible in process context, so it should not lead to a DoS situation. Albeit maybe we can do better and handle the situation where both flags are set more intelligent?

Comment 4 Dan Searle 2015-07-24 11:21:10 UTC
What's the status of this bug? Is it being worked on? Is there any way a fix can be expedited?

Comment 7 Sabrina Dubroca 2015-07-31 15:02:25 UTC
This bug has been fixed upstream (in David Miller's net tree):
https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=dfbafc995304ebb9a9b03f65083e6e9cea143b20

Comment 8 Dan Searle 2015-07-31 15:05:06 UTC
Many thanks!

Comment 9 Rafael Aquini 2015-08-18 13:03:35 UTC
Patch(es) available on kernel-3.10.0-306.el7

Comment 11 Dan Searle 2015-08-18 13:48:49 UTC
Hi, thanks for the update.

Will this patch go into the 3.13.0 kernel branch? I'm hoping to get Ubuntu to suck it into their LTS kernel packages which seem to be built from the 3.13.0 branch.

Comment 14 errata-xmlrpc 2015-11-19 21:48:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2152.html