1163889 – rpc.mountd can be blocked by a bad client

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1163889 - rpc.mountd can be blocked by a bad client

Summary: rpc.mountd can be blocked by a bad client

Keywords:
Status:	CLOSED DUPLICATE of bug 1163891
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	nfs-utils
Sub Component:
Version:	7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Steve Dickson
QA Contact:	Filesystem QE
Docs Contact:
URL:
Whiteboard:
Depends On:	1163886
Blocks:
TreeView+	depends on / blocked

Reported:	2014-11-13 16:05 UTC by Steve Dickson
Modified:	2014-11-13 16:10 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	1163886
Environment:
Last Closed:	2014-11-13 16:10:02 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Steve Dickson 2014-11-13 16:05:44 UTC

+++ This bug was initially created as a clone of Bug #1163886 +++

Description of problem:
From https://bugzilla.suse.com/show_bug.cgi?id=901628:

A few weeks ago we had some trouble at a customer with a NFS server. The clients most of the time could not mount any shares, but in rare cases they had success.

We found out, that during the times when mounts failed, rpc.mountd hung on a write() to a TCP socket. netstat showed, that Send-Q was full and Recv-Q counted up slowly. After a long time the write ended with an error ("TCP timeout" IIRC) and rpc.mountd worked normally for a short while until it again hung on write() for the same reason. The problem was caused by a MTU size configured wrong. So, one single bad client (or as much clients as the number of threads used by rpc.mountd) can block rpc.mountd entirely.

But what happens, if someone intentionally sends RPC requests, but doesn't read() the answers? I wrote a small tool to test this situation. It fires DUMP requests to rpc.mountd as fast as possible, but does not read from the socket. The result is the same as with the problem above: rpc.mountd hangs in write() and no longer responds to other requests while no TCP timeout breaks up this situation. So it's quite easy to even intentionally block rpc.mountd from remote.

I've done some further investigations. I tested rpcbind to see, whether it has the same weakness. But rpcbind uses rpc_control(SVCSET_CONNMAXREC) to switch to nonblocking mode of libtirpc. That nonblocking mode shows two positive effects:
- an attacker sending requests as fast as possible to rpcbind will have no
  success. As soon as rpcbind/libtirpc finds more than one request readable
  at the socket, it closes the connection.
- if the socket buffer is full, the write() fail with -EAGAIN. libtirpc
  uses a loop to retry the write for max. 2 seconds. Then it closes the
  connection.

Unfortunately the write retry loop in libtirpc has a bug. It increments
the length of and decrements the pointer to the retry buffer on each failed
write(). I've sent a patch to libtirpc-devel about 3 weeks ago, but didn't get a
response yet. (I'll attach the patch)

Regarding rpc.mountd, I've found, that using multiple processes (e.g. -t 4) doesn't work well. When using libtirpc or when not using libtirpc but setting -p xxxx option, the listening sockets (tcp listener and udp socket) are not in non-blocking mode. Thus, if a single connection request comes in, all threads wake up from the select(), but only one accept() succeeds. All other threads will wait in accept() for further connection requests.
If a RPC-request comes in via UDP, what happens is very similar: all threads wake up, one thread handles the request, all others wait in read() for further UDP requests.
As TCP connections are assigned to specific threads, all connections handled by one thread will be block as long as the thread waits in accept() or read(). Thus, I've written two patches (attached), that set all listeners to non-blocking in support/nfs/*. A version of the patches for 1.3.1 was sent to linux-nfs, but I got no reply yet.

A further patch (attached) inserts rpc_control(SVCSET_CONNMAXREC) into nfs_svc_create() in support/nfs/svc_create.c for the case of libtirpc.
That patch hardens rpc.mount against DOS attacks (and probably also statd,
as it also uses nfs_svc_create()). Please see this patch as experimental only. I'm not sure, whether setting MAXREC might have negative side effects as I'm not a RPC expert.

Bodo

--- Additional comment from Steve Dickson on 2014-11-13 11:02:41 EST ---



--- Additional comment from Steve Dickson on 2014-11-13 11:03:17 EST ---



--- Additional comment from Steve Dickson on 2014-11-13 11:04:14 EST ---

Comment 2 Steve Dickson 2014-11-13 16:10:02 UTC


*** This bug has been marked as a duplicate of bug 1163891 ***

Note You need to log in before you can comment on or make changes to this bug.