RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1325019 - rpc.gssd uses 100% CPU and lots of I/O when Kerberos ticket expires.
Summary: rpc.gssd uses 100% CPU and lots of I/O when Kerberos ticket expires.
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: nfs-utils
Version: 6.9
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Steve Dickson
QA Contact: Filesystem QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-07 21:12 UTC by Ender
Modified: 2023-09-14 03:20 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-06 11:51:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Close descriptor when return value is POLLERR. (931 bytes, patch)
2016-04-07 21:12 UTC, Ender
no flags Details | Diff

Description Ender 2016-04-07 21:12:26 UTC
Created attachment 1144898 [details]
Close descriptor when return value is POLLERR.

Description of problem:

Under some circumstances, rpc.gssd spikes to 100% CPU when a Kerberos context disappears from disk, due to a corner case in how things are handled internally when the return struct of poll() is analyzed.

The internet is full of reports from people having this issue, admittedly several years old, as it seems that it is no longer a problem after the big refactor that took out the inotify and poll logics.  Still, it's a problem for us because we use RHEL and CentOS 6.

After spending some quality time with gdb, I found that rpc.gssd hits a corner case when the clntXXX/gssd named pipe is deleted while rpc.gssd is still attached to it (but the directory is still there). In that situation the poll() function in the main loop returns a value of POLLERR|POLLHUP, but there's no handling for POLLERR in scan_poll_results() other than re-read all the contents of /var/lib/nfs/rpc_pipefs/ again for changes. Sadly, the containing directory is still there but empty, so the deleting logic is not triggered and the problem remains, so the server goes to 100% CPU reading those directories over, and over and over again (poll()'ing in the meantime).


Version-Release number of selected component (if applicable):

All of them up to latest release (1.2.3-64).


How reproducible:
So far this has happened to us only with mosh/screen/tmux sessions, so I suspect that there's something in these that triggers the behaviour (probably the user is there but the ticket is expired).  It happens when the directory clntXXX is there and rpc.gssd has a fd pointing to the corresponding gssd named pipe (see fd 24):

[...]
lrwx------. 1 root root 64 Mar 25 10:26 2 -> /dev/null
lr-x------. 1 root root 64 Mar 25 10:26 20 -> /var/lib/nfs/rpc_pipefs/nfs/clnt14e1
lr-x------. 1 root root 64 Mar 25 10:26 21 -> /var/lib/nfs/rpc_pipefs/nfs/clnt173b
lr-x------. 1 root root 64 Mar 25 10:26 22 -> /var/lib/nfs/rpc_pipefs/nfs/clnt1aea
lr-x------. 1 root root 64 Mar 25 10:26 23 -> /var/lib/nfs/rpc_pipefs/nfs/clnt18c4
lrwx------. 1 root root 64 Mar 25 10:26 24 -> /var/lib/nfs/rpc_pipefs/nfs/clnt14e1
lr-x------. 1 root root 64 Mar 25 10:26 25 -> /var/lib/nfs/rpc_pipefs/gssd/clntXX
lrwx------. 1 root root 64 Mar 25 10:26 26 -> /var/lib/nfs/rpc_pipefs/gssd/clntXX/gssd
lrwx------. 1 root root 64 Mar 25 10:26 27 -> /var/lib/nfs/rpc_pipefs/nfs/clnt14e1/gssd (deleted)
lr-x------. 1 root root 64 Mar 28 11:42 28 -> /var/lib/nfs/rpc_pipefs/nfs/clnt1aec
[...]

To make sure, I ran a "memset(&pollarray[i], 0, sizeof(struct pollfd))" under gdb and watched how rpc.gssd returned to normal operations (strace showed everything fine, ls -l /proc/PID/fd didn't show anything odd).

Patch attached applies to nfs-utils-1.2.3-64.el6.  I haven't seen a single appearance of this bug ever since I patched our internal binary.

Comment 2 Steve Dickson 2016-08-24 19:39:10 UTC
Could you please post the proposed patch to the NFS upstream at
    linux-nfs.org

Using the patch format described in 
   https://www.kernel.org/doc/Documentation/SubmittingPatches

esp the Signed-off-by, subject line and description 

tia!

Comment 3 Ender 2016-09-05 17:26:05 UTC
Sorry, I missed your note, Steve.  I'll do as soon as I have a moment.  Thanks!

Comment 4 Jan Kurik 2017-12-06 11:51:07 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/

Comment 5 Red Hat Bugzilla 2023-09-14 03:20:48 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.