Bug 1575956

Summary: rpc.rquotad version 4.04 burns 100% CPU
Product: [Fedora] Fedora Reporter: Bert DeKnuydt <Bert.Deknuydt>
Component: quotaAssignee: Petr Pisar <ppisar>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 28CC: ovasik, ppisar, steved
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
URL: https://sourceforge.net/p/linuxquota/feature-requests/16/
Whiteboard:
Fixed In Version: quota-4.04-6.fc29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-03 11:36:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch that fixes the problem none

Description Bert DeKnuydt 2018-05-08 12:03:24 UTC
Description of problem:

The quota daemon, rpc.rquotad consumes 100% CPU on one core.
It does answer correctly to queries though.

Version-Release number of selected component (if applicable):

quota-rpc-4.04-5.fc28.x86_64

How reproducible:

100% and immediately.

Steps to Reproduce:
1. dnf install quota-rpc
2. systemctl start rpc-rquotad.services
3. top

Actual results:

You see 90+% CPU usage from rpc.rquotad

Expected results:

Minimal CPU usage from rpc.rquotad

Additional info:

1) Downgrade to the last version of Fedora 27, quota-rpc-4.03-9.fc27.x86_64
   makes everything okay again.

   dnf -y downgrade quota-rpc --releasever=27
 
2) strace on the daemon spits out endless repeats of:

poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=5,   
  events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, 
  events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, 
  events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, -1) = 2 ([{fd=5, 
  revents=POLLHUP}, {fd=7, revents=POLLHUP}])
accept(5, 0x7ffe61698700, [128])        = -1 EINVAL (Invalid argument)
accept(7, 0x7ffe61698700, [128])        = -1 EINVAL (Invalid argument)

Comment 1 Petr Pisar 2018-05-09 09:43:34 UTC
Thanks for the report. I can reproduce it.

Comment 2 Petr Pisar 2018-05-09 10:24:12 UTC
This happens inside tirpc library:

#0  0x00007f86ab2e3254 in accept () from /lib64/libc.so.6
#1  0x00007f86ab5c1274 in rendezvous_request () from /lib64/libtirpc.so.3
#2  0x00007f86ab5bd6b8 in svc_getreq_common () from /lib64/libtirpc.so.3
#3  0x00007f86ab5bd947 in svc_getreq_poll () from /lib64/libtirpc.so.3
#4  0x00007f86ab5c016e in svc_run () from /lib64/libtirpc.so.3
#5  0x000055e6f0a3d6c7 in main (argc=2, argv=0x7ffc9afd7d18) at rquota_svc.c:538

The polled descriptors are:

rpc.rquot 21981 root    4u  IPv4 80449159      0t0      UDP *:rquotad 
rpc.rquot 21981 root    5u  sock      0,9      0t0 80449162 protocol: TCP
rpc.rquot 21981 root    6u  IPv6 80449165      0t0      UDP *:rquotad 
rpc.rquot 21981 root    7u  sock      0,9      0t0 80449168 protocol: TCPv6

One can use "/usr/sbin/rpc.rquotad -F" command to prevent from daemonization.

Fedora 27 is not affected, because it does not use tirpc there. I can also reproduce it on Fedora 29.

Either it's a bug in libtirpc-1.0.3-1.rc1.fc28.x86_64 that wants to accept raw sockets, or rpc.rquotad setups the tirpc service inappropriately. Maybe libtirpc maintainer will know more. rpc.rquotad code is in rquota_svc.c

Comment 3 Steve Dickson 2018-05-09 18:38:01 UTC
(In reply to Petr Pisar from comment #2)
> 
> Either it's a bug in libtirpc-1.0.3-1.rc1.fc28.x86_64 that wants to accept
> raw sockets, or rpc.rquotad setups the tirpc service inappropriately. Maybe
> libtirpc maintainer will know more. rpc.rquotad code is in rquota_svc.c
Well it is spinning in rendezvous_request due to this loop:

again:
    len = sizeof addr;
    sock = accept(xprt->xp_fd, (struct sockaddr *)(void *)&addr, &len);
    if (sock < 0) {
        if (errno == EINTR)
            goto again;
        return (FALSE);
    }
There was the same loop in the old glibc code as well
so nothing there has changed there... 

It definitly has something to do with creating the 
TCP listening socket since the UDP socket seems to 
be fine:

# rpcinfo -T udp localhost 100011
program 100011 version 1 ready and waiting
program 100011 version 2 ready and waiting
# rpcinfo -T tcp localhost 100011
rpcinfo: RPC: Remote system error - Connection refused

Comment 4 Steve Dickson 2018-05-10 19:07:26 UTC
In Comment 3 I was mistaken with that again loop since
the errno being returned from accept is EINVAL not 
EINTR.

The reason for this looping and accept returning 
EINVAL is a listen(2) call is not being done the
the fd used to create the SVCXPRT pointer.

The reason a listen is not done is because
svc_tli_create() is given pre-created socket
and a bound socket. In this cause svc_tli_create()
will not call listen(). 

In the glibc code, a listen was called, I proved 
that with strace, but I'm not sure from where 
since I can not find a svc_til_create() in the 
older (Fedora 27) glibc code.

Since there does not seem to be an upstream 
git tree, I can not tell when that call 
was added. Was that call recently added?

So the assumption is made by the tirpc version 
of svc_tli_create() is when handed a socket that 
is bound, its assumed the listen call has also 
been made. In other apps using it (in nfs-utils)
this is the case.

Comment 5 Steve Dickson 2018-05-12 19:24:20 UTC
Created attachment 1435510 [details]
Patch that fixes the problem

The attached patch adds a listen() call to svc_create_sock()
routine which is needed with libtirpc version of svc_tli_create()
as well as a needed IPv6 setsockopt().

It appears the glibc version of that call must have done
the listen() call. Not clear how that ever worked since
other versions of svc_tli_create() (aka BSD) don't 
do the bind either plus the IPV6_V6ONLY is needed
to allow the binds to work.

Comment 6 Bert DeKnuydt 2018-05-14 09:00:10 UTC
I can conform that this solves the issue, at least here 
(no ipv6 in use).  

Thanks for the quick fix ... Now wait for the rpms...

Comment 7 Petr Pisar 2018-05-22 08:56:36 UTC
The git tree is at <git://git.code.sf.net/p/linuxquota/code>. Fedora 28 rebased quota to 4.04 that migrated from NIS RPC to tirpc. Hence Fedora 28 is the first one that uses tirpc and supports IPv6.

Thanks for the patch. I will review it, forward to the upstream and apply to Fedora.

Comment 8 Petr Pisar 2018-05-22 12:13:23 UTC
The patch looks good and I confirm it fixes this issue. The RPC calls still work over both IPv4 and IPv6. I sent the patch to <https://sourceforge.net/p/linuxquota/feature-requests/16/>

Comment 9 Fedora Update System 2018-05-22 13:01:26 UTC
quota-4.04-6.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-4adeec12ed

Comment 10 Fedora Update System 2018-05-22 19:39:02 UTC
quota-4.04-6.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-4adeec12ed

Comment 11 Fedora Update System 2018-05-30 14:09:04 UTC
quota-4.04-6.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 12 Ben Cotton 2019-05-02 21:40:40 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.