Created attachment 608698 [details] sssd_$DOMAIN log at debug_level 8 Description of problem: sudo is failing in Fedora 18 (development) with identities via LDAP and authorization via Kerberos. Version-Release number of selected component (if applicable): # rpm -qa | egrep 'krb5|systemd|sssd' systemd-libs-188-3.fc18.i686 systemd-188-3.fc18.i686 systemd-sysv-188-3.fc18.i686 sssd-1.9.0-19.fc18.beta6.i686 pam_krb5-2.3.14-3.fc18.i686 sssd-client-1.9.0-19.fc18.beta6.i686 krb5-libs-1.10.2-7.fc18.i686 How reproducible: always Actual results: User POV: sudo date [sudo] password for my_user_name: Sorry, try again. [sudo] password for my_user_name: sudo: 1 incorrect password attempt Log POV: ==> /var/log/messages <== Aug 31 13:39:29 test-host [sssd[krb5_child[10593]]]: Credential cache directory /run/user/my_uid/ccdir does not exist ==> /var/log/secure <== Aug 31 13:39:29 test-host sudo: pam_sss(sudo:auth): system info: [Credential cache directory /run/user/my_uid/ccdir does not exist] No AVCs reported. See attachment.
krb5 1.10 doesn't create the directory for applications. Could SSSD be getting tripped up by that? The Kerberos libraries will start to create the final component of the path, if necessary, in krb5 1.11, but krb5 1.11 is slated to land during the F19 cycle, so I would recommend against depending on it. There are also some leftovers in the directory after krb5_cc_destroy() is used to remove it, but that's probably less of an issue here.
Upstream ticket: https://fedorahosted.org/sssd/ticket/1512
(In reply to comment #1) > krb5 1.10 doesn't create the directory for applications. Could SSSD be > getting tripped up by that? > > The Kerberos libraries will start to create the final component of the path, > if necessary, in krb5 1.11, but krb5 1.11 is slated to land during the F19 > cycle, so I would recommend against depending on it. > The SSSD would attempt to create the last directory component of the dircache if it doesn't exist (and only the last component, we don't pretend we're mkdir -p) > There are also some leftovers in the directory after krb5_cc_destroy() is > used to remove it, but that's probably less of an issue here.
You are running sudo you I suspect you were able to log in fine with that user. Can you paste the output of "klist" when logged in? I assume the login was performed offline when the LDAP server was down. This debug message indicates that it was: (Fri Aug 31 16:18:33 2012) [sssd[be[default]]] [sdap_process_result] (0x0040): ldap_result error: [Can't contact LDAP server] What is the value of cache_credentials in your sssd.conf? Does the same error happen even if the remote server is up? Can you also attach the file /var/log/sssd/krb5_child.log ?
(In reply to comment #4) > You are running sudo you I suspect you were able to log in fine with that > user. Can you paste the output of "klist" when logged in? > Err, sorry, my English skills have apparently reached a new low. The sentence should have read "You are running sudo so I suspect you were able to log in fine with as that user".
(In reply to comment #3) > (In reply to comment #1) > > krb5 1.10 doesn't create the directory for applications. Could SSSD be > > getting tripped up by that? > > > > The Kerberos libraries will start to create the final component of the path, > > if necessary, in krb5 1.11, but krb5 1.11 is slated to land during the F19 > > cycle, so I would recommend against depending on it. > > The SSSD would attempt to create the last directory component of the > dircache if it doesn't exist (and only the last component, we don't pretend > we're mkdir -p) Good, that's in line with what 1.11 will be doing.
(In reply to comment #6) > (In reply to comment #3) > > (In reply to comment #1) > > > krb5 1.10 doesn't create the directory for applications. Could SSSD be > > > getting tripped up by that? > > > > > > The Kerberos libraries will start to create the final component of the path, > > > if necessary, in krb5 1.11, but krb5 1.11 is slated to land during the F19 > > > cycle, so I would recommend against depending on it. > > > > The SSSD would attempt to create the last directory component of the > > dircache if it doesn't exist (and only the last component, we don't pretend > > we're mkdir -p) > > Good, that's in line with what 1.11 will be doing. OK, I checked the SSSD code again after reading the discussion in #848228 and I was wrong about calling mkdir. We *do* create the whole directory, we just attempt to be clever: 1) the SSSD expands the template specified by krb5_ccname_template and krb5_ccachedir to get an absolute path name 2) then the SSSD process (running as root) creates the path up to the last component 3) the last component is created when the ccache is primed in a separate process running with user's credentials mostly to avoid TOCTOU vulnerabilities between creating the directory and chowning it. We've been creating the directory also in order to get around the problem of systemd creating the directory too late during session, whereas we need the ccache to be stored in a directory during the auth phase.
(In reply to comment #4) > You are running sudo you I suspect you were able to log in fine with that > user. Can you paste the output of "klist" when logged in? [translated ... :-)] > "You are running sudo so I suspect you were able to log in fine with as that user". Yes, my normal login (via ssh) worked/works just fine. > I assume the login was performed offline when the LDAP server was down. This > debug message indicates that it was: > (Fri Aug 31 16:18:33 2012) [sssd[be[default]]] [sdap_process_result] > (0x0040): ldap_result error: [Can't contact LDAP server] I have to agree with your assessment, but would be surprised by that. However, I cannot confirm/deny it at this point. I can say that this LDAP server is a rather critical corporate component so there should have been lots of complaining if it had been down. Also, I would have required the same server for my ssh login only moments earlier and for identity purposes with my home directory and given the ridiculous amount of crap^Wcustomization in my .bashrc I can tell you that my user login experience is frightful when $HOME is fouled up by LDAP problems -- none of which I recall in this case. > What is the value of cache_credentials in your sssd.conf? Hmmm, not what I seem to recall: # grep cache_credentials /etc/sssd/sssd.conf cache_credentials = True I would also say this has been true for the life of the host too, since I have puppet managing all of this and those resources are under git and I've not changed the master sssd.conf since 2011-12-05. Given that caching is enabled, maybe there was a short-term problem with the LDAP server, or it was being administered with a ham-fisted approach. There are many admin-capable folks in my team and inter-personnel communications could be much better so it's really impossible for me to say for certain. > Does the same error happen even if the remote server is up? I'm not seeing it now, so I'd say no. > Can you also attach the file /var/log/sssd/krb5_child.log ? Coming right up.
Created attachment 611834 [details] krb5_child.log messages resulting from running "sudo date"
Just applied lots of updates (after posting the above messages) and sudo seems to be working for me now. Packages that look involved are: # rpm -qa --last | awk '/sss|sudo|pam.*Tue 11 Sep 2012/' sudo-1.8.6-1.fc18.i686 Tue 11 Sep 2012 11:43:55 AM EDT libsss_sudo-1.9.0-21.fc18.beta7.i686 Tue 11 Sep 2012 11:43:35 AM EDT sssd-1.9.0-21.fc18.beta7.i686 Tue 11 Sep 2012 11:40:47 AM EDT libsss_idmap-1.9.0-21.fc18.beta7.i686 Tue 11 Sep 2012 11:40:46 AM EDT sssd-client-1.9.0-21.fc18.beta7.i686 Tue 11 Sep 2012 11:40:41 AM EDT Thanks guys!
(In reply to comment #7) > OK, I checked the SSSD code again after reading the discussion in #848228 > and I was wrong about calling mkdir. We *do* create the whole directory, we > just attempt to be clever: > 1) the SSSD expands the template specified by krb5_ccname_template > and krb5_ccachedir to get an absolute path name > 2) then the SSSD process (running as root) creates the path up to the > last component > 3) the last component is created when the ccache is primed in a separate > process running with user's credentials mostly to avoid TOCTOU > vulnerabilities between creating the directory and chowning it. > > We've been creating the directory also in order to get around the problem of > systemd creating the directory too late during session, whereas we need the > ccache to be stored in a directory during the auth phase. I've resigned myself to having to do the same thing in pam_krb5.
There has been no activity here for quite some time, the setup seems to work now for the original reported and the corresponding upstream ticket has been closed as WORKSFORME as well.
I'm running into this as well, and after an extensive debugging session on IRC, sgallagh asked me to reopen this ticket. I can trivially reproduce this and would be happy to provide any information you request.
I'm using stock F18 with recent updates: systemd-197-1.fc18.1.x86_64 sssd-1.9.3-1.fc18.x86_64 pam_krb5-2.4.1-1.fc18.x86_64 krb5-libs-1.10.3-5.fc18.x86_64
I haven't been able to figure out why the directory isn't available in the first place for Jason, but I can trivially reproduce the effects as follows: 1) Log in to the system after a fresh boot. (This part works for me just fine). 2) rm -Rf /run/user/$UID/krb5cc 3) Attempt to 'su - <username>' Expected results: The krb5cc directory should be recreated by SSSD and the user should be logged in. Actual results: (Wed Jan 23 14:27:46 2013) [[sssd[krb5_child[9809]]]] [create_ccache_in_dir] (0x0040): 495: [-1765328189][Credential cache directory /run/user/13041/krb5cc does not exist] It looks like the problem is that, if the user has a DIR::/path/to/ccfile (note the two colons and that the path ends with the file, not the directory), SSSD assumes that the directory MUST already exist and then it jumps to krb5_cc_resolve() which returns KRB5_FCC_NOFILE and we then fail. So there are three things contributing here: 1) Something has apparently removed this directory from Jason's running system. I have no idea what caused this. 2) The LDB cache is storing the DIR::/path/to/ccfile version of the cache instead of DIR:/path/to/cachedir. I'm not sure if that's intentional. 3) If we get passed the DIR:: (two colon) version of the ccache to create_ccache_in_dir(), we attempt to reuse the file in a nonexistent directory and fail. My recommendation here is that we should check for KRB5_FCC_NOFILE as a return code from krb5_cc_resolve() and switch to attempting to create the directory. This probably means linking to libpath_utils from ding-libs and taking advantage of the get_basename() routine so we can truncate the file at the end of the DIR:: path.
Jason noted on IRC that it appears that he suffers this issue after he has logged out of all sessions of his user on the system. That's believable, as I think systemd now purges /run/user/$UID when the last session is terminated. This just makes it even more urgent to ensure that we can recreate this cache.
*** Bug 890062 has been marked as a duplicate of this bug. ***
Thank you for the extensive information in comments #15 and #16. That should be enough for a fix. I've reopened upstream #1512 as well.
Just a note; Stephen asked that I try after a reboot, and I went ahead and tried it. Unfortunately it seems to be random; on some boots the problem appears and on some it doesn't. I'm at a complete loss as to why. /run is on tmpfs so it can't be holding any state.
sssd-1.9.4-2.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/sssd-1.9.4-2.fc18
Package sssd-1.9.4-2.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing sssd-1.9.4-2.fc18' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-1795/sssd-1.9.4-2.fc18 then log in and leave karma (feedback).
sssd-1.9.4-2.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.