Bug 607233
Summary: | SSSD users cannot log in through GDM | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Ray Strode [halfline] <rstrode> | ||||||
Component: | sssd | Assignee: | Stephen Gallagher <sgallagh> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Chandrasekar Kannan <ckannan> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 6.0 | CC: | benl, cmeadors, grajaiya, jgalipea, jkoten, jlaska, jmccann, notting, overholt, roland, rstrode, sbose, sgallagh, syeghiay | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | sssd-1.2.1-21.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | 578303 | ||||||||
: | 621700 (view as bug list) | Environment: | |||||||
Last Closed: | 2010-11-10 21:40:00 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 578303 | ||||||||
Bug Blocks: | 579775, 599016 | ||||||||
Attachments: |
|
Description
Ray Strode [halfline]
2010-06-23 15:13:01 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. It might be prudent to update the summary of this bug. SSSD users CAN log into GDM, however they may occasionally be denied because of EINTR interactions. -4 had an issue, should be set in -5 NOT fixed, issue still present in gdm-2.30.4-5.el6. Actual results same as comment #0. Hi jiri, Just to be sure. You have done a full reboot after installing the latest gdm package? I did further testing and log in as a SSSD user works after gdm is restarted (killall gdm-binary). Configuring authentication, logout (switch users) and then login as a SSSD user reproduces the issue. More info. <sgallagh> Basically, getpwnam can return EINTR if it receives a signal during execution <sgallagh> This means they're supposed to call it again until it returns normal success or failure of course we do that now (as of comment 5), and it's still not working for Jiri, so something else is up. Needs more investigation. Created attachment 433829 [details]
Reconnect to sssd if it has gone away
So I looked at this today with sgallagher.
The problem is that the sssd nsswitch module doesn't handle sssd going away very well. Instead of trying to reconnect to the server it just fails. This means after running authconfig (which restarts sssd) all existing processes that have talked to sssd before it was restarted will fail the next time they do a getpwnam() call (or whatever).
The above patch seems to fix the problem for me. I'm not sure if it's the "right" fix though. Will need someone on the sssd team to look over.
I have created and upstream bug to track this issue https://fedorahosted.org/sssd/ticket/571 and have posted a slightly modified version of the patch (reconnect on all errors and no goto) to sssd-devel https://fedorahosted.org/pipermail/sssd-devel/2010-July/004236.html I can confirm the modified patch works, too. *** Bug 621255 has been marked as a duplicate of this bug. *** Created attachment 436600 [details]
messages log
Still not working for me. Reproducing "the first login" by changing auth. conf. to local account only, removing cached credentials (# rm /var/lib/sss/db/cache_default.ldb), rebooting and then setting auth. conf. to ldap+kerberos and switching user.
sssd-1.2.1-21.el6.x86_64
gdm-2.30.4-13.el6.x86_64
Moving back to ASSIGNED based on comment#19. Sumit: Is the procedure Jiri notes in comment#19 a correct way to reset the system configuration to reproduce this failure? That set of reproduction steps is incomplete. It doesn't describe where in those steps they are attempting logins. Let me try to explain what's happening at each step here. 1) Change auth conf to local account only This removes sss from nsswitch.conf and pam_sss from /etc/pam.d/[system|password]-auth as well as shutting the daemon down. As a result, after this step, no user identity is looked up from SSSD. 2) Removing cached credentials It is safe to purge the cache at this time, since the SSSD is not running. Be aware that this is also removing the cached user identities, so if the system is not online after this, it will not be able to return user information. 3) Rebooting This step could be shortened to dropping into runlevel 3 and then returning to runlevel 5. I assume that the goal here is just to restart gdm. I'm assuming that at this point the engineer is logging in using a local user account. At this time, no activity happens related to the SSSD. The sss client libraries aren't in use, thus this is NOT a valid test of this bug. 4) Setting auth conf to ldap+kerberos This adds sss back into nsswitch.conf and starts up the SSSD daemon processes. 5) Switching user This would actually be the first lookup to the SSSD. If this is failing at this time, then it's most likely that SSSD cannot reach the LDAP server or is experiencing a similar failure that is resulting in it not answering the request. For this, I'd need to see the /var/log/sssd/sssd_default.log (and I'd prefer that the debug_level be set to 9 in the sssd.conf) So this approach is NOT testing the specific fix. Testing this specific fix is actually very easy: 1) Use authconfig to set LDAP+Kerberos 1) telinit 3 2) Log in as root on the local console 3) service stop sssd 4) rm -f /var/log/sss/db/cache_default.ldb 5) service sssd start 6) telinit 5 7) Log in to GDM as an LDAP user with the appropriate Kerberos password 8) service sssd restart 9) Log out of the logged-in user and log in again Before this fix, that would crash. After this fix it should go smoothly. Your steps may be good to test the specific fix in sssd but it don't reproduce steps from comment 0. The problem is in step 7 - you start GDM with sssd already configured. But that worked even before the fix - see comment 11. Use case is that you use local user to configure sssd through authconfig-gtk and then you switch to sssd user (i.e. without restarting gdm). Steps to reproduce: 1) Change auth. conf to local account only 2) telinit 3 3) telinit 5 4) Log in as a local user 5) Use authconfig to set LDAP+Kerberos 6) Switch user 7) Log in to GDM as an LDAP user with the appropriate Kerberos password It seems to me that the problem is in gdm - feel free to clone this bug against gdm. Okay, so sgallagh and I spent a few hours looking into this today. What's going on is one of gdm's processes is very long running. This process runs before sssd is configured in nsswitch.conf and continues to run after nsswith.conf is configured in nsswitch.conf. The problem is, it seems that glibc will only read the list of modules from nsswitch.conf once for the lifetime of a process (the first time the process calls getpwnam()), so it doesn't notice that the system has been updated. This gives the long running gdm process an inconsistent view of the world compared to the shorter running gdm processes, and gdm doesn't handle that inconsistency very robustly. There are a few possibilities on what we could do next: 1) Fix glibc to automatically detect when nsswitch.conf is updated and clear its cache 2) Add a new function to glibc ala res_init() but for nsswitch.conf instead of resolv.conf and make gdm call that function before doing getpwnam(). 3) Make gdm fork a helper process any time it wants to call getpwnam() to ensure that getpwnam() always returns current information 4) Make gdm fail instead of crash. This would prevent users from being able to login with sssd until they reboot, but would at least wouldn't show a crash message in their syslog. 5) Make authconfig tell the user they need to reboot for changes to take effect. 6) Release note this limitation 1 and 2 would be nicest fixes for me, but they may not be feasible on the glibc side. From a code aesthetics point of view, 3 loses, but it has the advantage of making everything work out of the box. 4, 5, and 6 all lose from a "Just Works" point of view. The other workaround that comes to mind is using nscd. If nscd is available, then libc won't look at nsswitch.conf at all. The first time a call is tried when nscd is gone, it should fall back to looking at nsswitch.conf. So you could do some dance where nscd runs in a default configuration to begin with, and then when sssd is enabled you write nsswitch.conf first and then "service nscd stop". That should work, but it seems pretty fragile. The #3 sort of approach is really the only thing that is surely going to be robust without relying on intricate details of libc internals. For adding or changing libc behavior (#2 is plausible, #1 won't happen), you need to consult the upstream glibc maintainers, i.e. drepper. Given nscd has other side-effects (although some of those orthogonal side-effects would be a good thing!), it's probably not appropriate to do at this stage in the development cycle. So it sounds like 2 would potentially be a better long term option, but 3 is better for rhel 6. I'll add the workaround to GDM. This bug is getting a little crowded though, given it's covering two different issues, one that's already fixed in sssd and this new one that we need to work around in gdm. I'll clone this bug to cover the gdm work. I'd like to add that this issue is only visible if authconfig is run and sets up SSSD and then a new login occurs before GDM has been restarted (e.g. reboot). I think a sufficient answer to this issue in RHEL 6 is a release note stating that a reboot is required after making changes to authconfig. The gdm clone for this bug is bug 621700. I'm moving this bug back to MODIFIED. We'll use this bug to cover the specific sssd fix mentioned in comment 21 Verified the fix mentioned in comment #21. Version: sssd-1.2.1-26.el6.x86_64. Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |