Bug 607233 - SSSD users cannot log in through GDM
SSSD users cannot log in through GDM
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: sssd (Show other bugs)
6.0
All Linux
low Severity medium
: rc
: ---
Assigned To: Stephen Gallagher
Chandrasekar Kannan
:
: 621255 (view as bug list)
Depends On: 578303
Blocks: 579775 599016
  Show dependency treegraph
 
Reported: 2010-06-23 11:13 EDT by Ray Strode [halfline]
Modified: 2015-01-04 18:42 EST (History)
14 users (show)

See Also:
Fixed In Version: sssd-1.2.1-21.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 578303
: 621700 (view as bug list)
Environment:
Last Closed: 2010-11-10 16:40:00 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Reconnect to sssd if it has gone away (676 bytes, patch)
2010-07-22 18:45 EDT, Ray Strode [halfline]
no flags Details | Diff
messages log (11.12 KB, text/plain)
2010-08-04 12:36 EDT, Jiri Koten
no flags Details

  None (edit)
Description Ray Strode [halfline] 2010-06-23 11:13:01 EDT
+++ This bug was initially created as a clone of Bug #578303 +++

Description of problem:
Network-provided users can't log in through GDM. They stall after accepting the password and are never presented with a desktop.

Version-Release number of selected component (if applicable):
gdm-2.29.6-1.fc13

How reproducible:
Every time

Steps to Reproduce:
Install sssd >= 1.1.0 and authconfig >= 6.1.2
Run authconfig-gtk and set up the following settings (valid during the SSSDByDefault Test Day)

User Account Database: LDAP
LDAP Search Base DN: dc=fedoraproject,dc=org
LDAP Server: ldaps://fedoraproject.org
(download an appropriate certificate. For the test day it's http://jlaska.fedorapeople.org/sssd/cacert.asc)
Leave "Use TLS to encrypt connections" unchecked

Authentication Method: LDAP


Hit apply. Now logout (or switch users) and attempt to log in as a user provided by LDAP. (E.g. user 'sssdtest10016' with password 'sssdtest')

Actual results:
Password is accepted and GDM begins the process to load the desktop environment, but it hangs with nothing on the screen but the mouse cursor and the background visible.

Expected results:
The user should be presented with a functional desktop environment.

Additional info:

From /var/log/messages:

Mar 30 15:15:06 dhcp-100-3-105 gdm-binary[1619]: DEBUG(+): GdmDisplay: Adding authorization for user:sssdtest10017 on display :1
Mar 30 15:15:06 dhcp-100-3-105 gdm-binary[1619]: DEBUG(+): GdmDisplay: Adding user authorization for sssdtest10017
Mar 30 15:15:06 dhcp-100-3-105 gdm-simple-slave[3576]: WARNING: Failed to add user authorization: could not find user "sssdtest10017" on system
Mar 30 15:15:06 dhcp-100-3-105 gdm[3649]: ******************* START **********************************
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: [Thread debugging using libthread_db enabled]
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: [New Thread 0x7f5d9fbf7710 (LWP 3581)]
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: 0x00007f5da7cbc59d in waitpid () from /lib64/libpthread.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #0  0x00007f5da7cbc59d in waitpid () from /lib64/libpthread.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #1  0x000000000041f11b in ?? ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #2  0x000000000041f1c7 in ?? ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #3  <signal handler called>
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #4  0x00007f5da71fb955 in raise () from /lib64/libc.so.6
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #5  0x00007f5da71fd135 in abort () from /lib64/libc.so.6
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #6  0x00007f5da75af184 in g_assertion_message () from /lib64/libglib-2.0.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #7  0x00007f5da75af730 in g_assertion_message_expr ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]:    from /lib64/libglib-2.0.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #8  0x000000000041af1c in dbus_message_append_args ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #9  0x00007f5da7588db2 in g_main_context_dispatch ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]:    from /lib64/libglib-2.0.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #10 0x00007f5da758cb98 in ?? () from /lib64/libglib-2.0.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #11 0x00007f5da758d0a5 in g_main_loop_run () from /lib64/libglib-2.0.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #12 0x00000000004072d5 in dbus_message_append_args ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #13 0x00007f5da71e6d2d in __libc_start_main () from /lib64/libc.so.6
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #14 0x0000000000406f79 in dbus_message_append_args ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #15 0x00007fff1e939148 in ?? ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #16 0x000000000000001c in ?? ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #17 0x0000000000000003 in ?? ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #18 0x00007fff1e939da9 in ?? ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #19 0x0000000000000000 in ?? ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: 
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: Thread 2 (Thread 0x7f5d9fbf7710 (LWP 3581)):
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #0  0x00007f5da7cbba4d in read () from /lib64/libpthread.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #1  0x00007f5da7589f9b in ?? () from /lib64/libglib-2.0.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #2  0x00007f5da75b2164 in ?? () from /lib64/libglib-2.0.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #3  0x00007f5da7cb3e11 in start_thread () from /lib64/libpthread.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: one () from /lib64/libc.so.6
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: 
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: Thread 1 (Thread 0x7f5da8872700 (LWP 3576)):
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #0  0x00007f5da7cbc59d in waitpid () from /lib64/libpthread.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #1  0x000000000041f11b in ?? ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #2  0x000000000041f1c7 in ?? ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #3  <signal handler called>
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: from /lib64/libc.so.6
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #5  0x00007f5da71fd135 in abort () from /lib64/libc.so.6
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #6  0x00007f5da75af184 in g_assertion_message () from /lib64/libglib-2.0.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]:   0x00007f5da75af730 in g_assertion_message_expr ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]:    from /lib64/libglib-2.0.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #8  0x000000000041af1c in dbus_message_append_args ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #9  0x00007f5da7588db2 in g_main_context_dispatch ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]:    from /lib64/libglib-2.0.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #10 0x00007f5da758cb98 in ?? () from /lib64/libglib-2.0.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #11 0x00007f5da758d0a5 in g_main_loop_run () from /lib64/libglib-2.0.so.0
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: ilable.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #12 0x00000000004072d5 in dbus_message_append_args ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #13 0x00007f5da71e6d2d in __libc_start_main () from /lib64/libc.so.6
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #14 0x0000000000406f79 in dbus_message_append_args ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: bol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #15 0x00007fff1e939148 in ?? ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #16 0x000000000000001c in ?? ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #17 0x0000000000000003 in ?? ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #18 0x00007fff1e939da9 in ?? ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: mbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #19 0x0000000000000000 in ?? ()
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: No symbol table info available.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: A debugging session is active.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: 
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: #011Inferior 1 [process 3576] will be detached.
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: 
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: Quit anyway? (y or n) [answered Y; input not from terminal]
Mar 30 15:15:07 dhcp-100-3-105 gdm[3649]: ******************* END **********************************

From http://git.gnome.org/browse/gdm/tree/daemon/gdm-display-access-file.c#n212
static gboolean
_get_uid_and_gid_for_user (const char *username,
                           uid_t      *uid,
                           gid_t      *gid)
{
        struct passwd *passwd_entry;

        g_assert (username != NULL);
        g_assert (uid != NULL);
        g_assert (gid != NULL);

        errno = 0;
        passwd_entry = getpwnam (username);

        if (passwd_entry == NULL) {
                return FALSE;
        }

        *uid = passwd_entry->pw_uid;
        *gid = passwd_entry->pw_gid;

        return TRUE;
}


The problem here is that if getpwnam(username) returns NULL, errno needs to be checked for EINTR. This indicates that a signal was received while waiting for the blocking call to return, and getpwnam() should be retried.
Comment 2 RHEL Product and Program Management 2010-06-23 11:32:53 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 3 Stephen Gallagher 2010-06-30 09:44:48 EDT
It might be prudent to update the summary of this bug. SSSD users CAN log into GDM, however they may occasionally be denied because of EINTR interactions.
Comment 5 Ray Strode [halfline] 2010-07-02 12:05:22 EDT
-4 had an issue, should be set in -5
Comment 7 Jiri Koten 2010-07-14 06:35:03 EDT
NOT fixed, issue still present in gdm-2.30.4-5.el6. Actual results same as comment #0.
Comment 9 Ray Strode [halfline] 2010-07-14 09:28:40 EDT
Hi jiri,

Just to be sure.  You have done a full reboot after installing the latest gdm package?
Comment 11 Jiri Koten 2010-07-15 08:18:49 EDT
I did further testing and log in as a SSSD user works after gdm is restarted (killall gdm-binary). Configuring authentication, logout (switch users) and then login as a SSSD user reproduces the issue.
Comment 12 Cameron Meadors 2010-07-21 14:20:22 EDT
More info.

<sgallagh> Basically, getpwnam can return EINTR if it receives a signal 
during execution
<sgallagh> This means they're supposed to call it again until it returns 
normal success or failure
Comment 13 Ray Strode [halfline] 2010-07-21 15:12:56 EDT
of course we do that now (as of comment 5), and it's still not working for Jiri, so something else is up.  Needs more investigation.
Comment 14 Ray Strode [halfline] 2010-07-22 18:45:23 EDT
Created attachment 433829 [details]
Reconnect to sssd if it has gone away

So I looked at this today with sgallagher.

The problem is that the sssd nsswitch module doesn't handle sssd going away very well.  Instead of trying to reconnect to the server it just fails.  This means after running authconfig (which restarts sssd) all existing processes that have talked to sssd before it was restarted will fail the next time they do a getpwnam() call (or whatever).

The above patch seems to fix the problem for me.  I'm not sure if it's the "right" fix though.  Will need someone on the sssd team to look over.
Comment 15 Sumit Bose 2010-07-23 09:58:34 EDT
I have created and upstream bug to track this issue

https://fedorahosted.org/sssd/ticket/571

and have posted a slightly modified version of the patch (reconnect on all errors and no goto) to sssd-devel

https://fedorahosted.org/pipermail/sssd-devel/2010-July/004236.html
Comment 16 Ray Strode [halfline] 2010-07-23 10:40:43 EDT
I can confirm the modified patch works, too.
Comment 18 Stephen Gallagher 2010-08-04 11:46:04 EDT
*** Bug 621255 has been marked as a duplicate of this bug. ***
Comment 19 Jiri Koten 2010-08-04 12:36:50 EDT
Created attachment 436600 [details]
messages log

Still not working for me. Reproducing "the first login" by changing auth. conf. to local account only, removing cached credentials (# rm /var/lib/sss/db/cache_default.ldb), rebooting and then setting auth. conf. to ldap+kerberos and switching user.

sssd-1.2.1-21.el6.x86_64
gdm-2.30.4-13.el6.x86_64
Comment 20 James Laska 2010-08-04 16:39:32 EDT
Moving back to ASSIGNED based on comment#19.  

Sumit: Is the procedure Jiri notes in comment#19 a correct way to reset the system configuration to reproduce this failure?
Comment 21 Stephen Gallagher 2010-08-05 07:50:54 EDT
That set of reproduction steps is incomplete. It doesn't describe where in those steps they are attempting logins.

Let me try to explain what's happening at each step here.

1) Change auth conf to local account only
This removes sss from nsswitch.conf and pam_sss from /etc/pam.d/[system|password]-auth as well as shutting the daemon down.

As a result, after this step, no user identity is looked up from SSSD.

2) Removing cached credentials
It is safe to purge the cache at this time, since the SSSD is not running. Be aware that this is also removing the cached user identities, so if the system is not online after this, it will not be able to return user information.

3) Rebooting
This step could be shortened to dropping into runlevel 3 and then returning to runlevel 5. I assume that the goal here is just to restart gdm.

I'm assuming that at this point the engineer is logging in using a local user account. At this time, no activity happens related to the SSSD. The sss client libraries aren't in use, thus this is NOT a valid test of this bug.

4) Setting auth conf to ldap+kerberos
This adds sss back into nsswitch.conf and starts up the SSSD daemon processes.

5) Switching user
This would actually be the first lookup to the SSSD. If this is failing at this time, then it's most likely that SSSD cannot reach the LDAP server or is experiencing a similar failure that is resulting in it not answering the request. For this, I'd need to see the /var/log/sssd/sssd_default.log  (and I'd prefer that the debug_level be set to 9 in the sssd.conf)


So this approach is NOT testing the specific fix.

Testing this specific fix is actually very easy:
1) Use authconfig to set LDAP+Kerberos
1) telinit 3
2) Log in as root on the local console
3) service stop sssd
4) rm -f /var/log/sss/db/cache_default.ldb
5) service sssd start
6) telinit 5
7) Log in to GDM as an LDAP user with the appropriate Kerberos password
8) service sssd restart
9) Log out of the logged-in user and log in again

Before this fix, that would crash. After this fix it should go smoothly.
Comment 22 Jiri Koten 2010-08-05 09:35:45 EDT
Your steps may be good to test the specific fix in sssd but it don't reproduce steps from comment 0. The problem is in step 7 - you start GDM with sssd already configured. But that worked even before the fix - see comment 11.

Use case is that you use local user to configure sssd through authconfig-gtk and then you switch to sssd user (i.e. without restarting gdm).

Steps to reproduce:
1) Change auth. conf to local account only
2) telinit 3
3) telinit 5
4) Log in as a local user
5) Use authconfig to set LDAP+Kerberos
6) Switch user
7) Log in to GDM as an LDAP user with the appropriate Kerberos password

It seems to me that the problem is in gdm - feel free to clone this bug against gdm.
Comment 24 Ray Strode [halfline] 2010-08-05 16:15:58 EDT
Okay, so sgallagh and I spent a few hours looking into this today.

What's going on is one of gdm's processes is very long running.  This process runs before sssd is configured in nsswitch.conf and continues to run after nsswith.conf is configured in nsswitch.conf.

The problem is, it seems that glibc will only read the list of modules from nsswitch.conf once for the lifetime of a process (the first time the process calls getpwnam()), so it doesn't notice that the system has been updated.  This gives the long running gdm process an inconsistent view of the world compared to the shorter running gdm processes, and gdm doesn't handle that inconsistency very robustly.

There are a few possibilities on what we could do next:

1) Fix glibc to automatically detect when nsswitch.conf is updated and clear its cache
2) Add a new function to glibc ala res_init() but for nsswitch.conf instead of resolv.conf and make gdm call that function before doing getpwnam().
3) Make gdm fork a helper process any time it wants to call getpwnam() to ensure that getpwnam() always returns current information
4) Make gdm fail instead of crash.  This would prevent users from being able to login with sssd until they reboot, but would at least wouldn't show a crash message in their syslog.
5) Make authconfig tell the user they need to reboot for changes to take effect.
6) Release note this limitation

1 and 2 would be nicest fixes for me, but they may not be feasible on the glibc side. From a code aesthetics point of view, 3 loses, but it has the advantage of making everything work out of the box.  4, 5, and 6 all lose from a "Just Works" point of view.
Comment 25 Roland McGrath 2010-08-05 16:34:51 EDT
The other workaround that comes to mind is using nscd.  If nscd is available, then libc won't look at nsswitch.conf at all.  The first time a call is tried when nscd is gone, it should fall back to looking at nsswitch.conf.  So you could do some dance where nscd runs in a default configuration to begin with, and then when sssd is enabled you write nsswitch.conf first and then "service nscd stop".  That should work, but it seems pretty fragile.  The #3 sort of approach is really the only thing that is surely going to be robust without relying on intricate details of libc internals.

For adding or changing libc behavior (#2 is plausible, #1 won't happen), you need to consult the upstream glibc maintainers, i.e. drepper.
Comment 26 Ray Strode [halfline] 2010-08-05 16:45:43 EDT
Given nscd has other side-effects (although some of those orthogonal side-effects would be a good thing!), it's probably not appropriate to do at this stage in the development cycle.

So it sounds like 2 would potentially be a better long term option, but 3 is better for rhel 6.

I'll add the workaround to GDM.

This bug is getting a little crowded though, given it's covering two different issues, one that's already fixed in sssd and this new one that we need to work around in gdm.

I'll clone this bug to cover the gdm work.
Comment 27 Stephen Gallagher 2010-08-05 16:50:56 EDT
I'd like to add that this issue is only visible if authconfig is run and sets up SSSD and then a new login occurs before GDM has been restarted (e.g. reboot).

I think a sufficient answer to this issue in RHEL 6 is a release note stating that a reboot is required after making changes to authconfig.
Comment 28 Ray Strode [halfline] 2010-08-05 17:50:01 EDT
The gdm clone for this bug is bug 621700.

I'm moving this bug back to MODIFIED.  We'll use this bug to cover the specific sssd fix mentioned in comment 21
Comment 30 Gowrishankar Rajaiyan 2010-08-20 04:36:47 EDT
Verified the fix mentioned in comment #21. 
Version: sssd-1.2.1-26.el6.x86_64.
Comment 31 releng-rhel@redhat.com 2010-11-10 16:40:00 EST
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.