Hide Forgot
Description of problem: I'm not sure if this is a bug in the kernel or an issue with sssd, but the problem is exhibited in sssd, so I'm starting there. Please reassign as necessary. When I log into a system using a password with kerberos auth, it will succeed on the first attempt, but fail on subsequent attempts (once a ccache entry exists). It fails in get_uid_from_pid (find_uid.c), more specifically when calling strtouint32(), while looping through processes checking Uid in /proc/<pid>/status and encounters a UID of -1. num = strtouint32(p, &endptr, 10); error = errno; if (error != 0) { DEBUG(1, ("strtol failed [%s].\n", strerror(error))); return error; } (Tue Feb 28 14:44:46 2012) [sssd[be[EMPLOYEES]]] [get_uid_from_pid] (1): strtol failed [Numerical result out of range]. (Tue Feb 28 14:44:46 2012) [sssd[be[EMPLOYEES]]] [get_active_uid_linux] (1): get_uid_from_pid failed. (Tue Feb 28 14:44:46 2012) [sssd[be[EMPLOYEES]]] [check_if_uid_is_active] (1): get_uid_table failed. (Tue Feb 28 14:44:46 2012) [sssd[be[EMPLOYEES]]] [check_if_ccache_file_is_used] (1): check_if_uid_is_active failed. (Tue Feb 28 14:44:46 2012) [sssd[be[EMPLOYEES]]] [krb5_auth_send] (1): check_if_ccache_file_is_used failed. It's encountering a Uid of -1 because an nrpe process is defaulting to the UID of (2^32 - 1), which as far as I can tell is a perfectly acceptable UID since it's in the unsigned 32 range. With a UID of 4294967295, /proc/<pid>/status is showing -1, instead of 4294967295. [root@host tmp]$ ps -ef | grep nrpe 4294967295 32590 1 0 Feb28 ? 00:00:01 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d [root@host tmp]$ grep ^Uid /proc/32590/status Uid: -1 -1 -1 -1 Version-Release number of selected component (if applicable): kernel-2.6.32-220.el6.x86_64 sssd-1.5.1-66.el6_2.3.x86_64 How reproducible: Steps to Reproduce: 1. Run a process with a UID of 2^32-1 2. While using kerberos for authentication, login to the host twice Actual results: Login fails. Expected results: Login succeeds.
I checked the nrpe source, and it's defaulting to calling setuid(-1) when it drops privileges and the 'nrpe' user (more specifically, the nrpe_user as defined in nrpe.cfg) doesn't exist on the system. So the -1 in /proc/<pid>/status makes sense.
Ok, the problem here is that SSSD assumes that PIDs are unsigned 32-bit integers, but the standard type of pid_t is actually a *signed* 32-bit integer. What's happening is that we're using strtoul32() which internally converts the string to a signed long long and then checks that it's > 0. Apparently we were working under a faulty assumption that UIDs were guaranteed to be positive. I'll switch this conversion to use strtol32() instead of strtoul32() (and then cast it to uint32_t after this). Thanks for the bug report!
Upstream ticket: https://fedorahosted.org/sssd/ticket/1216
Created attachment 566574 [details] Tool to reproduce the issue You can use the attached file nobody.c to reproduce this issue. Build it with: gcc -o nobody nobody.c To run it: setenforce 0 ./nobody If it works, you will see a message telling you that it's going into an infinite loop. So, to reproduce this issue: 1) Configure SSSD for Kerberos atuh 2) Start SSSD (do not start ./nobody until later) 3) Log in online with a Kerberos user 4) Start the "nobody" tool 5) Try to restart SSSD Actual results: SSSD fails to start completely, and the following log message appears in sssd_DOMAIN.log: (Tue Feb 28 14:44:46 2012) [sssd[be[DOMAIN]]] [get_uid_from_pid] (1): strtol failed [Numerical result out of range]. Expected results: SSSD should start as expected. I wasn't able to duplicate the original situation where the login would fail (might be due to differences between SSSD on RHEL 6.2 and 6.3), but the same behavior causes issues with restart, which would cause an outage if the monitor had to restart the sssd_be process.
Verified in version sssd-1.8.0-25 Output of Beaker automation run: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ LOG ] :: verify bz 798655 :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ PASS ] :: Running '> /var/log/sssd/sssd_LDAP-KRB5.log' Stopping sssd: [ OK ] :: [ PASS ] :: Running 'service sssd stop' :: [ PASS ] :: Running 'rm -fr /var/lib/sss/db/*.ldb' Starting sssd: [ OK ] [ OK ] :: [ PASS ] :: Running 'service sssd start' :: [ PASS ] :: napping for 5 secs... :: [ PASS ] :: Running 'restart_clearing_cache' spawn ssh -q -l puser1 localhost echo 'login successful' puser1@localhost's password: login successful :: [ PASS ] :: Authentication successful, as expected :: [ PASS ] :: Running 'auth_success puser1 12345678' :: [ PASS ] :: Running 'gcc -o /root/nobody /root/nobody.c' :: [ PASS ] :: Running '/root/nobody &' spawn ssh -q -l puser1 localhost echo 'login successful' puser1@localhost's password: login successful :: [ PASS ] :: Authentication successful, as expected :: [ PASS ] :: Running 'auth_success puser1 12345678' :: [ PASS ] :: File '/var/log/sssd/sssd_LDAP-KRB5.log' should not contain 'strtol failed \[Numerical result out of range\]' ./bugzilla-automation.sh: line 257: 21804 Killed /root/nobody Stopping sssd: [ OK ] :: [ PASS ] :: Running 'service sssd stop' :: [ PASS ] :: Running 'rm -fr /var/lib/sss/db/*.ldb' Starting sssd: [ OK ] [ OK ] :: [ PASS ] :: Running 'service sssd start' :: [ PASS ] :: napping for 5 secs... :: [ PASS ] :: Running 'restart_clearing_cache' '6cf818f6-cb75-4699-8445-dc11feb60f90' verify-bz-798655 result: PASS
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: No documentation required
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0747.html