Red Hat Bugzilla – Bug 442680
Better support for Kerberos ticket cache management
Last modified: 2013-03-13 02:21:46 EDT
Description of problem:
Many companies want to move to NFSv4 with Kerberos. This requires that the OS
ensures to the maximum extend possible that a valid ticket cache is always
available. If not, processes may not be able to access NFS.
At the moment, krb5-auth-dialog is able to renew and/or refresh Kerberos tickets
for an X11 session. This is the only option in RHEL and Fedora for the
management of ticket caches. This functionality is too limited however. It does
not provide ticket management for the following scenarios.
- remote logons through ssh.
- console logons
- nohup jobs
- cron and at jobs
Also krb5-auth-dialog only supports renewal and refresh. Two further options
should be added:
- refresh a ticket if a key is found in the keytab (important for cron jobs)
- provide a warning to the user but take no action (for console logons)
Requirements definition for this new functionality:
- The solution should provide ticket management for all processes running on a
system, no matter how they were started. This should include at least:
* X11 logons
* remote ssh logons
* console logons
* nohup jobs
* cron jobs
* at jobs.
- The solution should provide at least the following options when a ticket has
* renew it (if the ticket is renewable)
* refresh it by asking a password to the user (if possible)
* refresh it by looking for a matching key in the system keytab
* display a warning (last resort)
- The solution should provide bullet-proof reference counting of ticket caches
such that when a cache is no longer needed it is discarded. This is particularly
important with renewable tickets. If you do not probably detect that a ticket
cache is not required anymore, and continue to renew it, you potentially leave a
valid ticket cache on a system up until the maximum renewal time (which in many
configurations is weeks). This would be a serious security issue.
Implementation suggestions (suggestions only!)
- Single system-wide daemon. Solaris implements most of the functionality above
using a single, system-wide daemon (ktkt_warnd). I think is the right approach,
for two reasons:
* It is difficult to ensure that the per-user or per-session daemons are
always started up and shut down correctly. PAM could provide session startup
hooks, but this precludes per-user ticket caches and but not every application
is PAM aware.
* On certain workloads (e.g. HPC submission nodes, terminal servers) there
could be hundreds of these daemons lying around, wasting system resources.
- Use kernel keyring as the default ticket cache store. The kernel keyring
provides bullet-proof reference counting, something which can only be
approximated from user space.
- Provide the option to have per user ticket caches (maybe even make this the
default). Per session ticket caches cannot be guaranteed to work for all
applications. Problem areas are expected to be custom application that implement
remote command execution (common in the HPC space) or do their own
daemonization. A superfluous setsid() call could nuke the session keyring and
make an application stop working.
Version-Release number of selected component (if applicable):
This bug is filed against krb5-auth-dialog but it is likely that multiple
components will need to be changed in order to implement this functionality.
I would like to add two real-world use cases where the functionality above is
absolutely required. Without it these would not work.
- You're running server software that keeps its storage on NFS. This is a common
deployment scenario for e.g. Oracle. With the new ticket management
functionality you would have a keytab entry for the oracle account in
/etc/krb5.keytab. Before starting oracle you would "kinit -k". You can now
assume that any further renewal will be done by the daemon.
- HPC workloads with storage on NFS. Again the functionality described in this
bugzilla will make this use case work. All job schedulers and MPI
implementations that I know of can start up remote processes over ssh. By
enabling ticket forwarding in ssh, you make sure that the remote jobs will have
access to a valid ticket. The renewal daemon on those remote hosts makes sure
this access is prolonged until the maximum renewal time. Configuring a renewal
time up to a few weeks or so should not be a big issue for enterprises (as
opposed to changing the standard ticket which would be problematic). This should
be enough to run 99% of all your HPC jobs. Normally jobs take at most a few days
to limit the impact if a job fails. Another solution is to deploy keytabs on the
HPC compute nodes. That is feasible only if the jobs run under a single
functional account instead of the end-user account. But both approaches require
active ticket management on the compute nodes.
> - Single system-wide daemon. Solaris implements most of the functionality
> above using a single, system-wide daemon (ktkt_warnd).
What would this actually do? Just issue ticket warnings? Invoke the refresh
or whatever routine? Actually hold tickets? Note that a single system-wide
daemon may not be feasible for a couple of reasons: firstly, the upcoming
container/namespace stuff; and secondly, SELinux.
> I think is the right approach, for two reasons:
> * It is difficult to ensure that the per-user or per-session daemons are
> always started up and shut down correctly. PAM could provide session startup
> hooks, but this precludes per-user ticket caches and but not every
> application is PAM aware.
Does having a single, system-wide daemon actually make per-user or per-session
maintenance any easier? The daemon would have to monitor the process list to
determine whether a session is still extant, but it might not be able to gain
access to that list (SELinux).
You could have PAM's session teardown code, for example, signal the daemon,
but what if that signal doesn't happen - for instance if the process that
holds the session SEGV's?
> - Use kernel keyring as the default ticket cache store.
I think that's the right way to do things, which probably won't surprise you.
This does have an upcall infrastructure that could be used to deal with this.
The question is, I suppose, at what point should the failing ticket be
detected and renewed? How much time do you have from it lapsing to having to
Whatever we work out for NFS can also be applied to AFS and other FS's.
> A superfluous setsid() call could nuke the session keyring and
> make an application stop working.
Wouldn't that then be a bug?
Note that setsid() won't nuke the session keyring.
The idea to use a system-wide daemon for renewal was mostly a suggestion. The
real requirements are that a) the solution is *extremely* robust (if a ticket
expires and a critical service can't access its data anymore this is serious),
b) reasonably efficient (one daemon per session scares me in that respect) and
c) available to all processes that run on the system (not only gui apps).
It may be entirely true that one daemon is not the right approach if this will
break with containers or selinux.
> > - Single system-wide daemon. Solaris implements most of the functionality
> > above using a single, system-wide daemon (ktkt_warnd).
> What would this actually do? Just issue ticket warnings? Invoke the refresh
> or whatever routine? Actually hold tickets? Note that a single system-wide
> daemon may not be feasible for a couple of reasons: firstly, the upcoming
> container/namespace stuff; and secondly, SELinux.
The daemon would try to renew/refresh and only if nothing else is possible issue
a warning. It would not hold the ticket caches. These would be in the kernel
keyring (which gets you your reference counting -- so I see no need to scan to
process list to see which sessions are still active or actively notify the
daemon that a session has exited).
> > - Use kernel keyring as the default ticket cache store.
> I think that's the right way to do things, which probably won't surprise you.
> This does have an upcall infrastructure that could be used to deal with this.
Do you mean that you could set a timer on a keyring entry and have it generate
an upcall to user space to renew the ticket on behalf of the user? That would be
a great solution in my view. Does this take care of the container / selinux
> The question is, I suppose, at what point should the failing ticket be
> detected and renewed? How much time do you have from it lapsing to having to
> renew it?
These should be configurable by policy. A good standard policy would in my view
be to renew as soon as half your ticket life time is over.
> Whatever we work out for NFS can also be applied to AFS and other FS's.
> > A superfluous setsid() call could nuke the session keyring and
> > make an application stop working.
> Wouldn't that then be a bug?
> Note that setsid() won't nuke the session keyring.
I was (i think mistakenly) under the impression that a session keyring is bound
to the Unix session ID (getsid()). If it isn't, and the "session" concept you
talk about is purely something from user space, then this concern does not apply.
> Do you mean that you could set a timer on a keyring entry and have it
> generate an upcall to user space to renew the ticket on behalf of the user?
> That would be a great solution in my view. Does this take care of the
> container / selinux concerns?
The key upcall mechanism can create a temporary daemon for you as and when it
is required. This will then have the appropriate security features (UID, GID,
This would need some work to make keys container aware - something that hasn't
been done yet - but it shouldn't be too hard.
We could set a timer on a key to renew it, though there are probably more
efficient ways of doing things than that. We only really need one timer, and
a list of keys that need renewing; that, however, is an implementation detail.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release. Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release. This request is not yet committed for
Development Management has reviewed and declined this request. You may appeal
this decision by reopening this request.
*** Bug 479360 has been marked as a duplicate of this bug. ***
Verified using ssh logon and x11(gdm) logon.
Version: # rpm -qi sssd | head
Name : sssd Relocations: (not relocatable)
Version : 1.5.1 Vendor: Red Hat, Inc.
Release : 13.el6 Build Date: Tue 08 Mar 2011 11:55:44 AM EST
Install Date: Wed 09 Mar 2011 01:29:22 AM EST Build Host: x86-005.build.bos.redhat.com
Group : Applications/System Source RPM: sssd-1.5.1-13.el6.src.rpm
Size : 3418301 License: GPLv3+
Signature : (none)
Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
URL : http://fedorahosted.org/sssd/
Summary : System Security Services Daemon
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.