Bug 442680
Summary: | Better support for Kerberos ticket cache management | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Geert Jansen <gjansen> |
Component: | sssd | Assignee: | Stephen Gallagher <sgallagh> |
Status: | CLOSED ERRATA | QA Contact: | Jenny Severance <jgalipea> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6.0 | CC: | borgan, bugzilla, dhowells, dpal, grajaiya, jgalipea, jhrozek, j, k.georgiou, michael.eisler, ricardo.labiaga, riek, rpacheco, sputhenp, stephan.wiesand, steved |
Target Milestone: | rc | Keywords: | Reopened |
Target Release: | 6.1 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | sssd-1.5.0-1.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-05-19 11:40:01 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 579778, 580566 |
Description
Geert Jansen
2008-04-16 09:09:11 UTC
I would like to add two real-world use cases where the functionality above is absolutely required. Without it these would not work. - You're running server software that keeps its storage on NFS. This is a common deployment scenario for e.g. Oracle. With the new ticket management functionality you would have a keytab entry for the oracle account in /etc/krb5.keytab. Before starting oracle you would "kinit -k". You can now assume that any further renewal will be done by the daemon. - HPC workloads with storage on NFS. Again the functionality described in this bugzilla will make this use case work. All job schedulers and MPI implementations that I know of can start up remote processes over ssh. By enabling ticket forwarding in ssh, you make sure that the remote jobs will have access to a valid ticket. The renewal daemon on those remote hosts makes sure this access is prolonged until the maximum renewal time. Configuring a renewal time up to a few weeks or so should not be a big issue for enterprises (as opposed to changing the standard ticket which would be problematic). This should be enough to run 99% of all your HPC jobs. Normally jobs take at most a few days to limit the impact if a job fails. Another solution is to deploy keytabs on the HPC compute nodes. That is feasible only if the jobs run under a single functional account instead of the end-user account. But both approaches require active ticket management on the compute nodes. > - Single system-wide daemon. Solaris implements most of the functionality > above using a single, system-wide daemon (ktkt_warnd). What would this actually do? Just issue ticket warnings? Invoke the refresh or whatever routine? Actually hold tickets? Note that a single system-wide daemon may not be feasible for a couple of reasons: firstly, the upcoming container/namespace stuff; and secondly, SELinux. > I think is the right approach, for two reasons: > * It is difficult to ensure that the per-user or per-session daemons are > always started up and shut down correctly. PAM could provide session startup > hooks, but this precludes per-user ticket caches and but not every > application is PAM aware. Does having a single, system-wide daemon actually make per-user or per-session maintenance any easier? The daemon would have to monitor the process list to determine whether a session is still extant, but it might not be able to gain access to that list (SELinux). You could have PAM's session teardown code, for example, signal the daemon, but what if that signal doesn't happen - for instance if the process that holds the session SEGV's? > - Use kernel keyring as the default ticket cache store. I think that's the right way to do things, which probably won't surprise you. This does have an upcall infrastructure that could be used to deal with this. The question is, I suppose, at what point should the failing ticket be detected and renewed? How much time do you have from it lapsing to having to renew it? Whatever we work out for NFS can also be applied to AFS and other FS's. > A superfluous setsid() call could nuke the session keyring and > make an application stop working. Wouldn't that then be a bug? Note that setsid() won't nuke the session keyring. The idea to use a system-wide daemon for renewal was mostly a suggestion. The real requirements are that a) the solution is *extremely* robust (if a ticket expires and a critical service can't access its data anymore this is serious), b) reasonably efficient (one daemon per session scares me in that respect) and c) available to all processes that run on the system (not only gui apps). It may be entirely true that one daemon is not the right approach if this will break with containers or selinux. > > - Single system-wide daemon. Solaris implements most of the functionality > > above using a single, system-wide daemon (ktkt_warnd). > > What would this actually do? Just issue ticket warnings? Invoke the refresh > or whatever routine? Actually hold tickets? Note that a single system-wide > daemon may not be feasible for a couple of reasons: firstly, the upcoming > container/namespace stuff; and secondly, SELinux. The daemon would try to renew/refresh and only if nothing else is possible issue a warning. It would not hold the ticket caches. These would be in the kernel keyring (which gets you your reference counting -- so I see no need to scan to process list to see which sessions are still active or actively notify the daemon that a session has exited). > > - Use kernel keyring as the default ticket cache store. > > I think that's the right way to do things, which probably won't surprise you. > > This does have an upcall infrastructure that could be used to deal with this. Do you mean that you could set a timer on a keyring entry and have it generate an upcall to user space to renew the ticket on behalf of the user? That would be a great solution in my view. Does this take care of the container / selinux concerns? > The question is, I suppose, at what point should the failing ticket be > detected and renewed? How much time do you have from it lapsing to having to > renew it? These should be configurable by policy. A good standard policy would in my view be to renew as soon as half your ticket life time is over. > Whatever we work out for NFS can also be applied to AFS and other FS's. Yup. > > A superfluous setsid() call could nuke the session keyring and > > make an application stop working. > > Wouldn't that then be a bug? > > Note that setsid() won't nuke the session keyring. I was (i think mistakenly) under the impression that a session keyring is bound to the Unix session ID (getsid()). If it isn't, and the "session" concept you talk about is purely something from user space, then this concern does not apply. > Do you mean that you could set a timer on a keyring entry and have it
> generate an upcall to user space to renew the ticket on behalf of the user?
> That would be a great solution in my view. Does this take care of the
> container / selinux concerns?
The key upcall mechanism can create a temporary daemon for you as and when it
is required. This will then have the appropriate security features (UID, GID,
keyrings, etc).
This would need some work to make keys container aware - something that hasn't
been done yet - but it shouldn't be too hard.
We could set a timer on a key to renew it, though there are probably more
efficient ways of doing things than that. We only really need one timer, and
a list of keys that need renewing; that, however, is an implementation detail.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. *** Bug 479360 has been marked as a duplicate of this bug. *** Verified using ssh logon and x11(gdm) logon. Version: # rpm -qi sssd | head Name : sssd Relocations: (not relocatable) Version : 1.5.1 Vendor: Red Hat, Inc. Release : 13.el6 Build Date: Tue 08 Mar 2011 11:55:44 AM EST Install Date: Wed 09 Mar 2011 01:29:22 AM EST Build Host: x86-005.build.bos.redhat.com Group : Applications/System Source RPM: sssd-1.5.1-13.el6.src.rpm Size : 3418301 License: GPLv3+ Signature : (none) Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> URL : http://fedorahosted.org/sssd/ Summary : System Security Services Daemon An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0560.html An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0560.html |