Bug 1205258

Summary:	Busy loop in recv(MSG_PEEK\|MSG_WAITALL)
Product:	Red Hat Enterprise Linux 7	Reporter:	Enrico Scholz <rh-bugzilla>
Component:	kernel	Assignee:	Sabrina Dubroca <sdubroca>
kernel sub component:	tcp	QA Contact:	Hangbin Liu <haliu>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	dan, fweimer, hsowa, jstancek, network-qe, rkhan, sdubroca
Version:	7.0
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	kernel-3.10.0-306.el7	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-11-19 21:48:24 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Enrico Scholz 2015-03-24 14:18:42 UTC

Description of problem:

Kernel goes into a busy loop when it waits for more data in a recv(MSG_PEEK|MSG_WAITALL) call.

E.g.

------
#include <stdlib.h>
#include <netinet/ip.h>

int main(void)
{
	struct sockaddr_in	addr = {
		.sin_family	= AF_INET,
		.sin_port	= htons(1234),
		.sin_addr	= { INADDR_ANY }
	};
	int			conn;
	char			buf[16];

	int			s = socket(AF_INET, SOCK_STREAM, 0);

	bind(s, (void *)&addr, sizeof addr);
	listen(s, 1);

	conn = accept(s, NULL, 0);

	recv(conn, buf, sizeof buf, MSG_PEEK|MSG_WAITALL);
}
----
$ gcc x.c
$ a.out &

$ nc 127.0.0.1 1234
1234<enter>

--> 'a.out' consumes 100% CPU


'a.out' stays alive and consumes CPU when the 'nc' connection is closed unclean (e.g. no TCP FIN/RST).  This can be used for DDOS attacks.


Version-Release number of selected component (if applicable):

kernel-3.10.0-123.20.1.el7.x86_64

How reproducible:

100%

Comment 2 Jiri Pirko 2015-04-08 08:30:18 UTC

Upstream kernel behaves the same. MSG_WAITALL tells kernel to wait until whole buffer can be filled. Looks like the combination with MSG_PEEK is not handled properly in tcp_recvmsg:

                if (copied >= target) {
                        /* Do not sleep, just process backlog. */
                        release_sock(sk);
                        lock_sock(sk);
                } else
                        sk_wait_data(sk, &timeo);

In case both MSG_PEEK and MSG_WAITALL are there, sk_wait_data is not called.

Comment 3 Hannes Frederic Sowa 2015-04-13 12:08:45 UTC

lock_sock (the only lock taken at that moment) is preemptible in process context, so it should not lead to a DoS situation. Albeit maybe we can do better and handle the situation where both flags are set more intelligent?

Comment 4 Dan Searle 2015-07-24 11:21:10 UTC

What's the status of this bug? Is it being worked on? Is there any way a fix can be expedited?

Comment 7 Sabrina Dubroca 2015-07-31 15:02:25 UTC

This bug has been fixed upstream (in David Miller's net tree):
https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=dfbafc995304ebb9a9b03f65083e6e9cea143b20

Comment 8 Dan Searle 2015-07-31 15:05:06 UTC

Many thanks!

Comment 9 Rafael Aquini 2015-08-18 13:03:35 UTC

Patch(es) available on kernel-3.10.0-306.el7

Comment 11 Dan Searle 2015-08-18 13:48:49 UTC

Hi, thanks for the update.

Will this patch go into the 3.13.0 kernel branch? I'm hoping to get Ubuntu to suck it into their LTS kernel packages which seem to be built from the 3.13.0 branch.

Comment 14 errata-xmlrpc 2015-11-19 21:48:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2152.html