Bug 658689

Summary: On screen unlock, pam_sss doesn't renew Kerberos ticket
Product: [Fedora] Fedora Reporter: Bojan Smojver <bojan>
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 14CC: dpal, jhrozek, sbose, sgallagh, ssorce
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: sssd-1.5.0-1.fc14 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-10 21:30:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bojan Smojver 2010-12-01 01:04:58 UTC
Description of problem:
If screen is locked in Gnome (e.g. suspend or hibernate), on unlock, Kerberos ticket is not renewed.

Version-Release number of selected component (if applicable):
sssd-1.4.1-3.fc14.x86_64

How reproducible:
Always.

Steps to Reproduce:
1. Suspend or hibernate the machine in Gnome (make sure screen gets locked).
2. Resume, unlock.
3. Hover over krb5-auth-dialog applet and check ticket expiry (or run klist).
  
Actual results:
Ticket not renewed.

Expected results:
Ticket should be renewed, whenever backend is online, even when cached credentials are used.

Additional info:
Before switch to sssd, this worked.

When I say renewed, I mean:

1. If ticket is renewable, extend it's lifetime.
2. If ticket is not renewable, get a fresh one.

Comment 1 Bojan Smojver 2010-12-01 06:11:44 UTC
(In reply to comment #0)

> 2. If ticket is not renewable, get a fresh one.

Actually, just doing this on unlock should be OK as well.

Comment 2 Bojan Smojver 2010-12-01 07:33:01 UTC
If I'm understanding things correctly in the sssd code, if cache_credentials is true, cached data will be preferred to actually going to the backend.

Shouldn't the code be changed to go the the backend first and fail to cached credentials if nothing is found there? This way, Kerberos tickets would always be renewed.

Or am I misunderstanding how this works?

Comment 3 Sumit Bose 2010-12-01 08:26:38 UTC
SSSD first tries to get a new ticket from the KDC. But there is some kind of race condition when going back online where the backend might still think that it is offline. This is documented in upstream in https://fedorahosted.org/sssd/ticket/655.

Can you try to wait 1-2 minutes after unlocking the screen and then check again if the ticket got renewed? Maybe you need to trigger sssd to go back online by calling 'id' or 'getent passwd username' in a terminal.

Comment 4 Bojan Smojver 2010-12-01 09:51:39 UTC
The reason I started noticing this problem is the fact that krb5-auth-dialog started telling me my ticket expired. This never happened before. So, I'm pretty sure waiting for a while doesn't have a desired effect, because after I unlock the screen I usually spend hours working.

My tickets are issued for 1 day, with renewal period of 2 days, if this somehow matters.

Anyway, not sure how calling id or anything else from the command line would be a workaround. Regular users won't know how to do this.

Is there some debug info I can give you that would help fix this?

Comment 5 Sumit Bose 2010-12-01 10:54:35 UTC
Yes, debug logs and your (sanitized) sssd.conf are most welcome. Debug logs can be created by starting sssd as 'sssd -f -D -d 9' and can be found in /var/log/sssd.

Are you using the 'krb5_store_password_if_offline = true' config option?

Comment 6 Bojan Smojver 2010-12-01 11:32:10 UTC
No, not using that option.

Will get you the data in the morning (not on my box right now).

Comment 7 Stephen Gallagher 2010-12-01 12:25:09 UTC
Bojan, I've seen this behavior before. The problem is that the credential cache that your session and krb5-auth-dialog are seeing has one value, and SSSD has a record of you using a different one.

This can happen sometimes if you have manually purged your cache while logged in.

To prove this:
type 'klist' at a shell prompt

You should see something like:
Ticket cache: FILE:/tmp/krb5cc_13041_dCUq3y

Then try (as root, requires the ldb-tools package):
ldbsearch -H /var/lib/sss/db/cache_redhat.com.ldb name=<username> ccacheFile

If the ccacheFile that SSSD recognizes doesn't agree with klist, then they're out of sync. This means that whenever you unlock your screen, SSSD is actually updating the wrong credential cache with your new ticket information.


You should be able to get yourself back in sync by doing the following:
1) rm -f /tmp/krb5cc_<UID>*
2) Log completely out of all active sessions of that user
3) Log back in

Then doing klist and the ldbsearch above should show the same ccacheFile. From here on, unlocking the screen should renew the ticket.

Please let us know if this works.

Comment 8 Stephen Gallagher 2010-12-01 12:39:29 UTC
Also, please include your /etc/pam.d/system-auth and /etc/pam.d/password-auth files. I'd like to see what your PAM stack looks like.

We can also see this behavior if both pam_sss and pam_krb5 are listed in the stack (since the latter might be supplanting SSSD's credential cache)

Comment 9 Bojan Smojver 2010-12-01 20:36:32 UTC
Before I forget, this just happened after I shut down the system, turned it on and logged in. I got authenticated with cached credentials, although I'm connected to the network.

My pam stack was generated by authconfig.

I'll get you all those details shortly.

Comment 11 Bojan Smojver 2010-12-01 22:55:10 UTC
(In reply to comment #7)
 
> Then try (as root, requires the ldb-tools package):
> ldbsearch -H /var/lib/sss/db/cache_redhat.com.ldb name=<username> ccacheFile

Strange:
---------------------------
[root@shrek ~]# ldbsearch -H /var/lib/sss/db/cache_redhat.com.ldb name=bojan ccacheFile
# returned 0 records
# 0 entries
# 0 referrals
---------------------------

klist has:
---------------------------
[bojan@shrek ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_500_zJgMVp
Default principal: bojan

Valid starting     Expires            Service principal
12/01/10 18:49:05  12/02/10 18:49:05  krbtgt/REXURSIVE.COM
	renew until 12/03/10 18:49:05
12/01/10 19:16:25  12/02/10 18:49:05  imap/beauty.rexursive.com
	renew until 12/03/10 18:49:05
---------------------------

WTF?

Comment 12 Bojan Smojver 2010-12-01 23:07:54 UTC
(In reply to comment #11)
 
> Strange:
> ---------------------------
> [root@shrek ~]# ldbsearch -H /var/lib/sss/db/cache_redhat.com.ldb name=bojan
> ccacheFile
> # returned 0 records
> # 0 entries
> # 0 referrals
> ---------------------------

Ah, hang on - wrong file:
---------------------------
[root@shrek ~]# ldbsearch -H /var/lib/sss/db/cache_default.com.ldb name name=bojan ccacheFile
# record 1
dn: @BASEINFO

# returned 1 records
# 1 entries
# 0 referrals
---------------------------

However, if I pull strings on that file, I can see myself in there and the path to the ticket looks good (it appears 3 times). Maybe I should zap that file, just to be sure what's going on.

Comment 13 Bojan Smojver 2010-12-01 23:49:06 UTC
(In reply to comment #12)
 
> Ah, hang on - wrong file:
> ---------------------------
> [root@shrek ~]# ldbsearch -H /var/lib/sss/db/cache_default.com.ldb name
> name=bojan ccacheFile
> # record 1
> dn: @BASEINFO
> 
> # returned 1 records
> # 1 entries
> # 0 referrals
> ---------------------------

Still wrong file, actually:
---------------------------
[root@shrek ~]# ldbsearch -H /var/lib/sss/db/cache_default.ldb name=bojan ccacheFile
asq: Unable to register control with rootdse!
# record 1
dn: name=bojan,cn=groups,cn=default,cn=sysdb

# record 2
dn: name=bojan,cn=users,cn=default,cn=sysdb
ccacheFile: FILE:/tmp/krb5cc_500_lGuboo
---------------------------

What you see here is ccacheFile after I've blown away all .ldb files in /var/lib/sss/db and restarted sssd. So, of course, ccacheFile is different. I'll reboot later on and clean the db again.

I'm getting a distinct feeling here that the problem is that somehow SSSD can't talk to LDAP over SSL:
---------------------------
(Thu Dec  2 10:21:38 2010) [sssd[nss]] [sss_dp_send_acct_req_create] (4): Sending request for [default][4097][1][name=bojan]
(Thu Dec  2 10:21:38 2010) [sssd[nss]] [sbus_add_timeout] (8): 0x2311d40
(Thu Dec  2 10:21:38 2010) [sssd[nss]] [sbus_remove_timeout] (8): 0x2311d40
(Thu Dec  2 10:21:38 2010) [sssd[nss]] [sbus_dispatch] (9): dbus conn: 230DFF0
(Thu Dec  2 10:21:38 2010) [sssd[nss]] [sbus_dispatch] (9): Dispatching.
(Thu Dec  2 10:21:38 2010) [sssd[nss]] [sss_dp_get_reply] (4): Got reply (1, 11, Fast reply - offline) from Data Provider
(Thu Dec  2 10:21:38 2010) [sssd[nss]] [nss_cmd_getpwnam_dp_callback] (2): Unable to get information from Data Provider
Error: 1, 11, Fast reply - offline
Will try to return what we have in cache
---------------------------

When I switch my config from:
---------------------------
[sssd]
config_file_version = 2
reconnection_retries = 3
sbus_timeout = 30
services = nss, pam
domains = default
[nss]
filter_groups = root
filter_users = root
reconnection_retries = 3
[pam]
reconnection_retries = 3
[domain/default]
ldap_tls_reqcert = allow
ldap_id_use_start_tls = false
cache_credentials = True
auth_provider = krb5
debug_level = 0
krb5_kpasswd = auth.rexursive.com
krb5_realm = REXURSIVE.COM
ldap_search_base = dc=rexursive,dc=com
chpass_provider = krb5
id_provider = ldap
min_id = 500
ldap_uri = ldaps://auth.rexursive.com/
krb5_kdcip = auth.rexursive.com
ldap_tls_cacertdir = /etc/openldap/cacerts
---------------------------

To (the only line changed):
---------------------------
ldap_uri = ldap://auth.rexursive.com/
---------------------------

I start getting results.

Something is not right again with my OpenLDAP over SSL, I think.

ldapsearch is whining about /etc/openldap/cacerts directory:
---------------------------
TLS: did not find any valid CA certificates in /etc/openldap/cacerts
TLS: could perform TLS system initialization.
TLS: error: could not initialize moznss security context - error -5939:No more entries in the directory
TLS: can't create ssl handle.
ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)
---------------------------

So, when I remove this line:
---------------------------
ldap_tls_cacertdir = /etc/openldap/cacerts
---------------------------

Then sssd can fetch stuff again.

OK, let me clean the lot, reboot and see if that does it.

Comment 14 Bojan Smojver 2010-12-02 00:06:43 UTC
OK, after cleanup and reboot, without ldap_tls_cacertdir = /etc/openldap/cacerts line, it seems to be picking up the account properly again.

I know what happened - I ran authconfig-tui. It put that line back into my sssd.conf file, but it did not create nss cert database there. So, sssd got stuck (again), because it could not talk to the LDAP server over SSL.

I've hit an exact same problem before, with another LDAP client (the details of which escape me). I think I'll just create that nss database there by hand, just to be on the safe side.

So, not really a bug in sssd. Closing.

Comment 15 Bojan Smojver 2010-12-02 06:35:42 UTC
And, I forgot, apologies to everyone for wasting their time on this :-(

Comment 16 Bojan Smojver 2010-12-02 22:32:23 UTC
Interestingly enough, this morning when I resumed my system (with X crashed for me: bug #657816), I had to login again. And what do you know, I got authenticated with cached credentials. My ticket had 8+ hrs till expiry (it should have had 24 hrs).

I did id for a user I know never logs into this box and this came back as no such user. Only users that used this machine would come back (from cache).

However, restart of sssd promptly fixed all that, so all id invocations worked and lock/unlock gave me a ticket valid for 24 hrs.

So, somehow sssd thought that we were still offline. Unfortunately, I was not running sssd in debugging mode when all this happened, so I cannot show you the logs.

Just FYI etc.

Comment 17 Bojan Smojver 2010-12-03 09:27:05 UTC
(In reply to comment #16)
> Interestingly enough, this morning when I resumed my system

Just went through another suspend/resume cycle and it did it again. One before that didn't do it. So, intermittent.

I think I'll put sssd reload in my thaw/resume scripts for now.

Comment 18 Sumit Bose 2010-12-03 09:46:41 UTC
I still think that the current work on improving the going-online detection (https://fedorahosted.org/sssd/changeset/c8708cd958c633cc3c57a3460bdb15391200e1e1) might help here.

Comment 19 Bojan Smojver 2010-12-03 09:59:35 UTC
(In reply to comment #18)
> I still think that the current work on improving the going-online detection
> (https://fedorahosted.org/sssd/changeset/c8708cd958c633cc3c57a3460bdb15391200e1e1)
> might help here.

Excellent. Feel free to patch, build and point me to it. I'll be happy to test!

Comment 20 Bojan Smojver 2010-12-03 23:01:22 UTC
(In reply to comment #17)

> I think I'll put sssd reload in my thaw/resume scripts for now.

Nope, that's no good. I think something in /etc/NetworkManager/dispatcher.d may be more appropriate.

Comment 21 Bojan Smojver 2010-12-03 23:39:44 UTC
I put this in for now:
------------------------
$ cat /etc/NetworkManager/dispatcher.d/90-sssd 
#!/bin/sh

if [ "$2" = "up" ]; then
	/sbin/service sssd reload || :
fi
------------------------

Will see how it fares.

Comment 22 Bojan Smojver 2010-12-04 04:01:50 UTC
(In reply to comment #21)
> I put this in for now:
> ------------------------
> $ cat /etc/NetworkManager/dispatcher.d/90-sssd 
> #!/bin/sh
> 
> if [ "$2" = "up" ]; then
>  /sbin/service sssd reload || :
> fi
> ------------------------
> 
> Will see how it fares.

Marginally better then before. On resume/unlock, I still get the old ticket. If I leave the box for a while, lock the screen and unlock, indeed the ticket gets renewed (because sssd gets reloaded in the meantime, once NM does all it's stuff).

I guess the problem really is this:

Dec  4 14:50:18 shrek sssd[be[default]]: LDAP connection error: (null)

Comment 23 Sumit Bose 2010-12-06 19:27:26 UTC
(In reply to comment #19)
> (In reply to comment #18)
> > I still think that the current work on improving the going-online detection
> > (https://fedorahosted.org/sssd/changeset/c8708cd958c633cc3c57a3460bdb15391200e1e1)
> > might help here.
> 
> Excellent. Feel free to patch, build and point me to it. I'll be happy to test!

Please find a scratch build of sssd with all the current offline-online detection improvements at http://koji.fedoraproject.org/koji/taskinfo?taskID=2646979 .

I would be nice if you can test it and give us some feedback if it works for you.

Comment 24 Bojan Smojver 2010-12-06 20:24:25 UTC
(In reply to comment #23)
 
> Please find a scratch build of sssd with all the current offline-online
> detection improvements at
> http://koji.fedoraproject.org/koji/taskinfo?taskID=2646979 .
> 
> I would be nice if you can test it and give us some feedback if it works for
> you.

Shall install now. Thank you providing the build!

Comment 25 Bojan Smojver 2010-12-06 22:59:24 UTC
(In reply to comment #23)

> I would be nice if you can test it and give us some feedback if it works for
> you.

Suspended once thus far and got a fresh ticket on unlock. Of course, I'll have to keep this running for a few days to know for sure.

Comment 26 Sumit Bose 2010-12-07 07:50:12 UTC
Thank you, these are good news. Please keep us informed if something unexpected happens again.

Comment 27 Bojan Smojver 2010-12-07 22:20:23 UTC
(In reply to comment #26)
> Thank you, these are good news. Please keep us informed if something unexpected
> happens again.

Just resumed again. Got a ticket from last night. So, there are still problems. Let me lock the screen now...

On unlock, I get a fresh one (note that I removed that NetworkManager script, so it's just sssd now).

So, there is still some kind of race between resume and sssd coming online.

Comment 28 Bojan Smojver 2010-12-08 20:46:31 UTC
Yep, another morning resume gave me last night's ticket. Let's lock... Got fresh on unlock.

So, seems that sssd 1.5.0 eventually detects that we're online. Just that first bit, just after resume, is still a problem. Obviously, if one's screen locks any time during the day (see - that's why taking a lunch break is a must :-), a new ticket will be issued.

Comment 31 Stephen Gallagher 2010-12-08 20:55:54 UTC
Bojan, ok, I think it's working like this:

SSSD triggers off of the routing table or resolv.conf changing in order to retry its online status. So if during a resume, neither of these events occurs, then SSSD will operate for one full minute as if it was offline. After that, an attempt to go perform auth will occur online.

Bojan, try also setting the option
krb5_store_password_if_offline = true

What will happen here is that if SSSD performs an offline auth, it will hang onto your kerberos password until it detects that it's back online, and then perform a kinit on your behalf at that moment to refresh your ticket.

So it might take as much as a minute from when you resume and sign in, but your ticket will be renewed automatically.

Comment 33 Bojan Smojver 2010-12-08 21:08:06 UTC
(In reply to comment #31)
 
> Bojan, try also setting the option
> krb5_store_password_if_offline = true

OK, will try that. Thanks.

Comment 34 Bojan Smojver 2010-12-09 00:54:34 UTC
(In reply to comment #33)
 
> OK, will try that. Thanks.

Indeed that does the trick. Just did a quick suspend/resume and on unlock, I had an old ticket. Within seconds, I got a fresh one.

Just out of curiosity, once the new ticket is obtained, the password is purged from sssd, correct?

Comment 35 Stephen Gallagher 2010-12-09 12:02:27 UTC
(In reply to comment #34) 
> Just out of curiosity, once the new ticket is obtained, the password is purged
> from sssd, correct?

The key is actually stored in the kernel keyring, not SSSD (this is more secure). However we do purge the key from the kernel keyring when we authenticate against the KDC (on both success and failure, as long as the answer came from the authoritative source instead of cached auth).

So yes, your key is only saved locally on the machine until an online auth occurs.

Comment 36 Bojan Smojver 2010-12-09 12:41:35 UTC
Ah, so that's the reason it's Linux only feature. Thanks for the info!

Comment 37 Stephen Gallagher 2010-12-09 12:59:28 UTC
(In reply to comment #36)
> Ah, so that's the reason it's Linux only feature. Thanks for the info!

No, SSSD *can* store it in a reversible hash in SSSD memory on non-Linux systems, but it uses the more secure storage of the kernel keyring on Linux.

Availability of this feature is determined at compile-time as part of the configure script.

Comment 38 Bojan Smojver 2010-12-09 20:49:17 UTC
(In reply to comment #37)
> (In reply to comment #36)
> > Ah, so that's the reason it's Linux only feature. Thanks for the info!
> 
> No, SSSD *can* store it in a reversible hash in SSSD memory on non-Linux
> systems, but it uses the more secure storage of the kernel keyring on Linux.
> 
> Availability of this feature is determined at compile-time as part of the
> configure script.

I guess the manual page should be changed then (and it has a typo in the second last line):
---------------------------------------
       krb5_store_password_if_offline (boolean)
           Store the password of the user if the provider is offline and use
           it to request a TGT when the provider gets online again.

           Please note that this feature currently only available on a Linux
           plattform.

           Default: false
---------------------------------------

Comment 39 Bojan Smojver 2010-12-12 22:59:55 UTC
(In reply to comment #23)
 
> Please find a scratch build of sssd with all the current offline-online
> detection improvements at
> http://koji.fedoraproject.org/koji/taskinfo?taskID=2646979

This has been working very well for me for some time now. With krb5_store_password_if_offline it's been rather consistent - the new ticket is issued usually within the first minute of unlocking the screen after resume. Thank you.

Given the solution is based on yet unreleased sssd 1.5.0, is this something that will wait until F-15 or backported? Just curious.

Comment 40 Stephen Gallagher 2010-12-13 12:09:01 UTC
As noted at https://fedorahosted.org/sssd/roadmap, our plan is to release 1.5.0 on 12/21. I do intend to release 1.5.0 packages for Fedora 14, since there are no incompatible changes from 1.4.0.

Comment 41 Stephen Gallagher 2010-12-13 12:21:24 UTC
I'm going to reopen this (probably should have a while ago) so we can mark it as fixed when the 1.5.0 packages are released.

Because while the original issue isn't strictly a bug, it's an annoyance that we can declare fixed.

Comment 42 Bojan Smojver 2010-12-13 20:14:59 UTC
Thanks, great news! Much appreciated.

Comment 43 Fedora Update System 2010-12-23 18:45:23 UTC
sssd-1.5.0-1.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/sssd-1.5.0-1.fc14

Comment 44 Fedora Update System 2010-12-25 00:22:30 UTC
sssd-1.5.0-1.fc14 has been pushed to the Fedora 14 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update sssd'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/sssd-1.5.0-1.fc14

Comment 45 Fedora Update System 2011-01-10 21:29:54 UTC
sssd-1.5.0-1.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.