RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 786463 - nfs mount hangs when kerberos ticket expires
Summary: nfs mount hangs when kerberos ticket expires
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: All
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Scott Mayhew
QA Contact: JianHong Yin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-01 14:10 UTC by Jonathan Underwood
Modified: 2019-07-11 07:34 UTC (History)
15 users (show)

Fixed In Version: kernel-2.6.32-489.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-10-14 05:08:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 902253 0 None None None Never
Red Hat Product Errata RHSA-2014:1392 0 normal SHIPPED_LIVE Important: kernel security, bug fix, and enhancement update 2014-10-14 01:28:44 UTC

Description Jonathan Underwood 2012-02-01 14:10:34 UTC
Description of problem:
I have user home directories on kerberized nfs4 automounted on login. If a user remains logged in after the time the ticket expires (but is still renewable), the logs fill up with endless messages of the form:

Jan 29 03:19:05 tiber kernel: Error: state manager failed on NFSv4 server oax.theory.phys.ucl.ac.uk with error 13

[multiple per second]

At this point, other users aren't able to log in either, and usually rpc.gssd is using a lot of CPU. Basically, an expired kerberos ticket results in a denial-of-service.

This debian bug report outlines the same issue:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=648155

and has a patch which at least seems to stop the problem with other users logging in (in limited testing).


Version-Release number of selected component (if applicable):
nfs-utils-1.2.3-15.el6.x86_64

How reproducible:
Everytime

Steps to Reproduce:
1. Log a user in, allow their kerberos ticket to expire.
2. Boom. Other users can't log in, logs fill up.
3.
  
Actual results:
Other users can't log in, logs fill up.

Expected results:
Other users should still be able to log in. Original user should be able to return and renew the ticket.

Additional info:

Comment 2 Jonathan Underwood 2012-02-08 11:47:17 UTC
Any comment on this? It seems to be a rather serious security bug?

Comment 3 Steve Dickson 2012-02-08 12:22:08 UTC
(In reply to comment #2)
> Any comment on this? It seems to be a rather serious security bug?

We will be looking into this at this year's Connectathon. We are
hoping that the new sssd daemon (part of the IPA package) can
be configured to renew tickets for long running process and cron
jobs.

Comment 4 John Hodrien 2012-02-08 16:11:41 UTC
sssd renewing tickets certainly improves matters, but only so far I think.

I'm running against an AD domain, giving us kerberos tickets with a 7 days max renewable lifetime.  You've surely still got the same problem at the end of those 7 days?

Comment 5 Steve Dickson 2012-02-08 18:07:05 UTC
(In reply to comment #4)
> I'm running against an AD domain, giving us kerberos tickets with a 7 days max
> renewable lifetime.  You've surely still got the same problem at the end of
> those 7 days?
Point... Leaving NFS aside... Is there a way today for one entity to get a new  kerberos ticket for another entity?

Comment 6 John Hodrien 2012-02-08 20:18:08 UTC
(In reply to comment #5)
> Point... Leaving NFS aside... Is there a way today for one entity to get a new 
> kerberos ticket for another entity?

I certainly don't know of any way that could happen.   sssd does have a mode whereby it can actually stash your password in the kernel keyring for the case where you authenticate to a cached credential offline so that it can get a ticket later when it does come online.  Possibly that can be extended to allow infinite renewal of kerberos tickets?  The option is called krb5_store_password_if_offline if that's of any interest.

There'd clearly be some mild security concerns, but possibly it'd be acceptable in some cases?

Comment 7 Jonathan Underwood 2012-02-09 15:29:58 UTC
There's two aspects to this bug:

1) Automatic renewal of Kerberos tickets

2) That an expired ticket causes a DOS for all other users (and also fills the log with messages eventually crashing the machine).

Certainly (1) will mitigate (2). But I still think (2) needs solving in the near future, independent of what sssd does to sort out (1).

[Aside: the kinit (k5init, krenew)  package is helpful for starting a long running process and renewing tickets while it runs].

Comment 8 Jonathan Underwood 2012-03-09 15:23:17 UTC
Should this be moved to the kernel component? This DOS is totally killing our machines.

Comment 9 Steve Dickson 2012-03-10 22:17:01 UTC
We believe the solution to this will be to move over to the IPA information management service. In the service there is a daemon call sssd that will automatically go renew Kerberos tickets. So the solution will be at the user level.

Comment 10 Jonathan Underwood 2012-03-10 23:08:22 UTC
(In reply to comment #9)
> We believe the solution to this will be to move over to the IPA information
> management service. In the service there is a daemon call sssd that will
> automatically go renew Kerberos tickets. So the solution will be at the user
> level.

I am already using sssd (1.5.1 as shipped with rhel 6.2) which already has this functionality (via the krb5_renew_interval option), but the point is: when a ticket fails to renew through sssd or otherwise, rpcgssd starts consuming large amounts of CPU, the kernel fills the logs with error messages, and no other user can mount their home directories (presumably because rpcgssd has gone into some sort of loop). Result: DOS.

Comment 11 Jonathan Underwood 2012-05-09 12:07:05 UTC
Any update on this - this is a really serious problem here (when already using sssd) - an expired kerberos ticket shouldn't bring down a box - this does need to be fixed kernel side, it is a kernel bug.

Comment 12 John Hodrien 2012-05-09 12:17:40 UTC
(In reply to comment #9)
> We believe the solution to this will be to move over to the IPA information
> management service. In the service there is a daemon call sssd that will
> automatically go renew Kerberos tickets. So the solution will be at the user
> level.

This is *not* a solution, as it's merely reducing your exposure to the problem.  Renewal is not infinite, so you're going to hit a point where you can no longer renew and then hit this DOS.  With a lot of users on a system you're going to be hitting this rather regularly.  I assume the bug will manifest itself at this point.  You just can't blame this on the presence of a ticket, the system needs to cope with it.

Comment 13 Steve Dickson 2012-05-10 08:08:01 UTC
Some just pointed me to a daemon call krenew (kstart-3.16.1.el6). They claim it works well for renewing user credentials... Unfortunately I have not had any cycles to look into it... yet...

Comment 14 John Hodrien 2012-05-10 08:39:12 UTC
(In reply to comment #13)
> Some just pointed me to a daemon call krenew (kstart-3.16.1.el6). They claim it
> works well for renewing user credentials... Unfortunately I have not had any
> cycles to look into it... yet...

Does this problem occur when a ticket fully expires (i.e. it has reached the end of its renewable lifetime)?  If so, no amount of renewing can fix this serious bug.

Comment 15 Jonathan Underwood 2012-05-10 13:09:41 UTC
(In reply to comment #13)
> Some just pointed me to a daemon call krenew (kstart-3.16.1.el6). They claim it
> works well for renewing user credentials... Unfortunately I have not had any
> cycles to look into it... yet...

Steve, with respect, you're missing the point. The bug here is not with the ticket expiring - that happens. krenew, k5init, sssd etc are userspace ways of renewing the ticket which work for as long as the ticket is renewable. The bug is that when ticket expiration happens, which is a legit thing to happen, it causes a DOS for all other users. That DOS is the bug.

Comment 16 John Hodrien 2012-05-10 13:17:30 UTC
(In reply to comment #15)
> (In reply to comment #13)
> > Some just pointed me to a daemon call krenew (kstart-3.16.1.el6). They claim it
> > works well for renewing user credentials... Unfortunately I have not had any
> > cycles to look into it... yet...
> 
> Steve, with respect, you're missing the point. The bug here is not with the
> ticket expiring - that happens. krenew, k5init, sssd etc are userspace ways of
> renewing the ticket which work for as long as the ticket is renewable. The bug
> is that when ticket expiration happens, which is a legit thing to happen, it
> causes a DOS for all other users. That DOS is the bug.

100% agree.  If I leave my machine logged in and go away for a week, my ticket *will* expire whatever any daemon tries to do to renew it.  If the result of that is that nobody can use the shared machine I'm logged in to at the time (which in my case may have 100 users on it) that's not going to go down well.

Any mention of renewal is not solving the problem.

Comment 17 Steve Dickson 2012-05-10 14:05:26 UTC
(In reply to comment #15)
> (In reply to comment #13)
> > Some just pointed me to a daemon call krenew (kstart-3.16.1.el6). They claim it
> > works well for renewing user credentials... Unfortunately I have not had any
> > cycles to look into it... yet...
> 
> Steve, with respect, you're missing the point. The bug here is not with the
> ticket expiring - that happens. krenew, k5init, sssd etc are userspace ways of
> renewing the ticket which work for as long as the ticket is renewable. The bug
> is that when ticket expiration happens, which is a legit thing to happen, it
> causes a DOS for all other users. That DOS is the bug.
No... I do understand the point... Once the ticket is completely expired there is no way to grant another ticket that can be renewed by the assorted user level daemons. 

Lets open this up to a wider audience... Andy, Simo any thoughts?

Comment 18 Simo Sorce 2012-05-10 14:32:15 UTC
Steve, you certainly need to gracefully handle a case where user credentials are expired. The problem is that it is difficult to handle this properly.
We may need a notification mechanism that allows user space to tell the kernel to stop asking for user X and another mechanism by which a login process (either a pam module or sssd) can tell the kernel it can now spam user space again with requests for user X.

The reason we need a notification mechanism is that you need to allow access to NFS immediately after login for NFS mounted home dirs, so a time base negative cache won't work.

How to implement this notification mechanism ?
I do not know at this stage.

Comment 19 Simo Sorce 2012-05-10 14:33:44 UTC
Ah another thought, whatever we do it shouldn't be NFS specific as cifs.ko will have exactly the same issue.

Adding Jeff so he can chime in on this as well.

Comment 22 kfu 2014-04-30 21:37:08 UTC
Any progress on this bug?  Just tried on latest RHEL 6.5 with kerberos NFS auto home directory,  when the user's ticket expires, that user can't login to the server (home nfs mount hung). 

Right now I just use following quick and dirty hourly cron to clean up any expired
ticket cache, at least this will allow the user login again and acquire a new ticket.  

 for i in `ls /tmp/krb5cc_*`; do KRB5CCNAME=$i klist -l |grep Expired |awk '{print $2}'|cut -d: -f2 ;  done | grep krb5cc |xargs rm -f

Can anyone comment if the expired tickets can be cleared from /tmp by a more reliable watch daemon , will this be an viable solution?

Comment 23 Steve Dickson 2014-05-02 10:40:19 UTC
(In reply to kfu from comment #22)
> Any progress on this bug?  Just tried on latest RHEL 6.5 with kerberos NFS
> auto home directory,  when the user's ticket expires, that user can't login
> to the server (home nfs mount hung). 
> 
> Right now I just use following quick and dirty hourly cron to clean up any
> expired
> ticket cache, at least this will allow the user login again and acquire a
> new ticket.  
> 
>  for i in `ls /tmp/krb5cc_*`; do KRB5CCNAME=$i klist -l |grep Expired |awk
> '{print $2}'|cut -d: -f2 ;  done | grep krb5cc |xargs rm -f
> 
> Can anyone comment if the expired tickets can be cleared from /tmp by a more
> reliable watch daemon , will this be an viable solution?

Well yes and no... From  NFS stand point no, when a ticket expires NFS will still hang. But using the sssd daemon from the ipa-client package which renews users automatically does avoid the problem.

Comment 24 John Hodrien 2014-05-02 10:49:41 UTC
(In reply to Steve Dickson from comment #23)
> 
> Well yes and no... From  NFS stand point no, when a ticket expires NFS will
> still hang. But using the sssd daemon from the ipa-client package which
> renews users automatically does avoid the problem.

Avoid, or just extend from being a 12 hour to a 7 day issue?

Comment 25 Steve Dickson 2014-05-02 11:07:52 UTC
(In reply to John Hodrien from comment #24)
> (In reply to Steve Dickson from comment #23)
> > 
> > Well yes and no... From  NFS stand point no, when a ticket expires NFS will
> > still hang. But using the sssd daemon from the ipa-client package which
> > renews users automatically does avoid the problem.
> 
> Avoid, or just extend from being a 12 hour to a 7 day issue?

the sssd deamon, part of the ipa-client package, will renew tickets
automatically for user that are created by the ipa user-add
See slices 17 through 19 in 
  http://people.redhat.com/steved/Summits/Summit13/Summit_Handout13.pdf

Comment 26 John Hodrien 2014-05-02 11:10:03 UTC
(In reply to Steve Dickson from comment #25)
> 
> the sssd deamon, part of the ipa-client package, will renew tickets
> automatically for user that are created by the ipa user-add
> See slices 17 through 19 in 
>   http://people.redhat.com/steved/Summits/Summit13/Summit_Handout13.pdf

Yes, up to the maximum renewal time of the ticket (as shown by klist), which for Active Directory [probably the most common case], is not more than 7 days.

Comment 27 Steve Dickson 2014-05-02 14:00:46 UTC
I'm not sure how much further we can go with this bz since
nfs-utils itself does not have a utility to renew tickets
So I'm going to close this as CANTFIX

Comment 28 John Hodrien 2014-05-02 14:06:27 UTC
(In reply to Steve Dickson from comment #27)
> I'm not sure how much further we can go with this bz since
> nfs-utils itself does not have a utility to renew tickets
> So I'm going to close this as CANTFIX

This has nothing to do with not being able to renew, and everything to do with the behaviour being crappy when it does (since ticket expiration is just a fact of life).  But if the kernel / nfs-utils can't cope with an expired ticket, then yes, we're stuffed.

Comment 29 kfu 2014-05-02 14:38:38 UTC
For the case of user auto mount (nfs4 sec=krb5p) in IPA domain, if we can clean up expired tickets (after exhaust renewable life) in /tmp/krb5cc_* on the client, at least it should allow user to log in again and acquire a new ticket, thus regain nfs4 home auto mount.

For those "nfs4 -o sec=krb5p" mounts in /etc/fstab, above won't help to regain nfs mount, but my understanding is to get a keytab of nfs/client.fqdn and put it on the nfs server's keytab to avoiding ticket renewal.

Are above thoughts correct?

Comment 30 Dmitri Pal 2014-05-08 02:36:59 UTC
In latest Fedora/RHEL versions there is a component called GSS-proxy that is created to solve among others this problem too. If given a keytab it can be configured to renew the ticket on behalf of the user indefinitely as it can use constrained delegation if this is required/configured.

https://ssimo.org/slides/devconf-2013-gss-proxy.pdf
http://fedoraproject.org/wiki/Features/gss-proxy
http://k5wiki.kerberos.org/wiki/Projects/ProxyGSSAPI
https://fedorahosted.org/gss-proxy/

Comment 31 John Hodrien 2014-05-08 07:39:12 UTC
(In reply to Dmitri Pal from comment #30)
> In latest Fedora/RHEL versions there is a component called GSS-proxy that is
> created to solve among others this problem too. If given a keytab it can be
> configured to renew the ticket on behalf of the user indefinitely as it can
> use constrained delegation if this is required/configured.

If given a user keytab, sure.  But that's a decidedly atypical case.  If it's still the case that it behaves as the original reported describes, none of these solutions address the problem, and you're still in a pretty grim place.  I think just revisit this with RHEL7 and go on from there.

Comment 32 Dmitri Pal 2014-05-08 13:00:20 UTC
(In reply to John Hodrien from comment #31)
> (In reply to Dmitri Pal from comment #30)
> > In latest Fedora/RHEL versions there is a component called GSS-proxy that is
> > created to solve among others this problem too. If given a keytab it can be
> > configured to renew the ticket on behalf of the user indefinitely as it can
> > use constrained delegation if this is required/configured.
> 
> If given a user keytab, sure.  But that's a decidedly atypical case.  If
> it's still the case that it behaves as the original reported describes, none
> of these solutions address the problem, and you're still in a pretty grim
> place.  I think just revisit this with RHEL7 and go on from there.

You can use a user keytab. For sure that would work. But I was talking about a different use case. You can have a keytab issued for GSS proxy. GSS proxy can be configured to do constrained delegation (subject to server side policy enforcement). This means that GSS proxy can be told to use s4u2self + s4u2proxy.
What happens is that it will first acquire a ticket on behalf of the user for itself (s4u2self) and then use it to acquire ticket for NFS server (s4u2proxy). If GSS proxy is configured to do this (impersonate user) and KDC policies (if any) allow this to happen the user ticket will be acquired on demand when needed solving the issue of ticket expiration. 

Please try RHEL7.

Comment 35 RHEL Program Management 2014-06-26 20:40:44 UTC
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 38 Rafael Aquini 2014-07-08 02:19:31 UTC
Patch(es) available on kernel-2.6.32-489.el6

Comment 42 errata-xmlrpc 2014-10-14 05:08:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-1392.html


Note You need to log in before you can comment on or make changes to this bug.