Bug 988068

Summary: getpwnam_r fails for non-existing users when sssd is not running
Product: [Fedora] Fedora Reporter: Jochen De Smet <jochen.redhatbugs>
Component: sssdAssignee: Jakub Hrozek <jhrozek>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 21CC: abokovoy, codonell, jakub, jhrozek, jochen.redhatbugs, law, lslebodn, mkosek, pbrezina, pfrankli, preichl, sbose, schwab, sgallagh, spoyarek, ssorce, stefw
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-14 14:43:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
SSS client hacky workaround none

Description Jochen De Smet 2013-07-24 16:32:34 UTC
Description of problem:
In Bug 867473, "sss" was added to the default nsswitch.conf. This causes getpwnam_r to report an error when queried for a non-existing user, instead of just returning "user not found".  This breaks things like postfix's luser_relay functionality.

Version-Release number of selected component (if applicable):
sssd-1.10.0-16.fc19.armv7hl


How reproducible:
Always.

Steps to Reproduce:
1. Install F19; notice that sssd is installed by default, but not enabled
2. Compile this short test program:
#include <sys/types.h>
#include <pwd.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
  struct passwd pwd;
  char buf[4096];
  int err;
  struct passwd *res;

  err = getpwnam_r(argv[1], &pwd, buf, 4096, &res);

  printf("<%s> err: <%d>\n", argv[1], err);
  return 0;
}


3. Run it with a non-existing user:  ./t unknown-user

Actual results:
# ./t unknown-user
<unknown-user> err: <2>


Expected results:
# ./t unknown-user
<unknown-user> err: <0>


Additional info:
Removing the sss from nsswitch.conf results in the expected behaviour

Comment 1 Lukas Slebodnik 2013-07-25 10:09:08 UTC
I don't think, that it is a sssd bug.

I reproduced it successfully with nss-pam-ldapd

Steps to Reproduce:
1. Install F19
2. Install nss-pam-ldapd, but service nslcd must be inactive
3. Configure nss to use nss-pam-ldap
   --file /etc/nsswitch have to contain line "passwd:     files ldap"
4. Compile short test program (from bug description)
5. Run it with a non-existing user:  ./t unknown-user

Results:
# ./t unknown-user
<unknown-user> err: <2>

Additional info:
Removing the ldap from nsswitch.conf results in your expected behaviour

Comment 2 Lukas Slebodnik 2013-07-25 10:23:16 UTC
According to libc(nss) manual documentation:
http://www.gnu.org/software/libc/manual/html_node/NSS-Modules-Interface.html

While the user-level function returns a pointer to the result the reentrant function return an enum nss_status value: 
NSS_STATUS_TRYAGAIN (numeric value -2)
NSS_STATUS_UNAVAIL (numeric value -1)
NSS_STATUS_NOTFOUND (numeric value 0)
NSS_STATUS_SUCCESS (numeric value 1)

In case the interface function has to return an error it is important that the correct error code is stored in *errnop. Some return status value have only one associated error code, others have more.

NSS_STATUS_UNAVAIL 	ENOENT 	A necessary input file cannot be found. 

In your case, sssd (_nss_sss_getpwnam_r) returned NSS_STATUS_UNAVAIL and
*errnop was set to ENOENT. ENOENT is defined as number "2". And number 2 is  returned from getpwnam_r in your example code.
The sssd behaves exactly as is described in the NSS-Modules-Interface documentation.

Possible solutions:
* NSS_STATUS_UNAVAIL should be handled in nss code
* manual pages of getpwnam_r should be updated

Comment 3 Jakub Hrozek 2013-07-25 10:32:53 UTC
Seems like an glibc issue. I'm adding the glibc maintainer to CC list.

Comment 4 Jochen De Smet 2013-07-25 13:09:33 UTC
To be clear, my main issue is that the default F19 configuration breaks postfix.

Whether there's an actual bug in ssd/postfix/glibc, or if it's simple a matter of needing to remove or properly configure sssd in the default install, I'll leave to you to decide.

Comment 5 Simo Sorce 2013-07-25 16:19:44 UTC
(In reply to Jochen De Smet from comment #4)
> To be clear, my main issue is that the default F19 configuration breaks
> postfix.

Understood.

> Whether there's an actual bug in ssd/postfix/glibc, or if it's simple a
> matter of needing to remove or properly configure sssd in the default
> install, I'll leave to you to decide.

you can certainly remove sss locally to work around the issue however this bug seem primarily a glibc inconsistency issue and to a lesser degree a postifx issue in the sense that it is being a little too strict.

This is what the man page says about retun errors:

RETURN VALUE
       The  getpwnam()  and  getpwuid() functions return a pointer to a passwd
       structure, or NULL if the matching entry  is  not  found  or  an  error
       occurs.   If an error occurs, errno is set appropriately.  If one wants
       to check errno after the call, it should be  set  to  zero  before  the
       call.
[..]
       On  success, getpwnam_r() and getpwuid_r() return zero, and set *result
       to pwd.  If no matching password  record  was  found,  these  functions
       return  0 and store NULL in *result.  In case of error, an error number
       is returned, and NULL is stored in *result.

now, for getpwname_r() it is true that the doc says 0 is returned if the user is not found, and this is where I think glibc's bug is as it is not respecting it when sss returns NSS_STATUS_UNAVAIL ENOENT
however postfix could also be a little bit more leninet and treate 0 and ENOENT the same as this is also in the manpage:

ERRORS
       0 or ENOENT or ESRCH or EBADF or EPERM or ...
              The given name or uid was not found.

Comment 6 Carlos O'Donell 2013-08-20 06:15:48 UTC
I've read and re-read this issue a couple of times now and I come up with the same answer each time: this is a problem with the nss module for sss.

If you are the last lookup in a list of chained lookups and you return NSS_STATUS_UNAVAIL / ENOENT, that error will be propagated to the caller.

The error might have been hidden if you had a long list of lookups. The interface provides no way to inspect the failures of the lookups in the middle of the list. Thus if sss is in the middle of a list of lookups with the last lookup returning NSS_STATUS_NOTFOUND / ENOENT, then no error is returned.

There are many ways to resolve this problem, but somone needs to make a choice amongst them. I see no problem with glibc's behaviour. If you do, please explain why you think there is a problem and what is inconsistent about it.

I think that it is correct for the sss nss modules to return NSS_STATUS_UNAVAIL / ENOENT since that is exactly the problem. The sssd daemon is not running and the service is unavailable and that is a helpful diagnostic.

Comments?

Comment 13 Carlos O'Donell 2013-08-20 15:58:48 UTC
I'm going to take this issue upstream to get a policy decision made and the documentation updated to clarify the exact situation we are facing here. Once we have a policy decision we can file specific bugs to fix issues.

In the meantime the glibc team is going to look at:

* Can we make glibc more conservative in fedora and not return an error for service failures. The API would instead return no result and no error. Errors will only be returned for critical internal failures.

I suggest others look at:

* What would it take to work around this in the nss_sss module e.g. do the wrong thing for the right reasons and don't return NSS_STATUS_UNAVAIL / ENOENT.

* Have a default config for sssd such that the service can be started right away.

Both of these solutions would be temporary until we can fix glibc.

Comment 14 Carlos O'Donell 2013-08-20 16:00:54 UTC
I don't have an ETA for fixing this as we have quite a bit of other work, but I'll keep this issue updated.

Comment 15 Stephen Gallagher 2013-08-20 17:36:19 UTC
Created attachment 788567 [details]
SSS client hacky workaround

I'm not necessarily condoning this approach, but if we *do* decide to hack together a workaround in nss_sss.so.2, the attached patch should cover it.

Comment 19 Lukas Slebodnik 2014-12-18 10:58:02 UTC
After log discussion the workaround in sssd was merged in sssd upstream.
https://fedorahosted.org/sssd/ticket/2439.

Therefore reassigning to sssd

Comment 20 Lukas Slebodnik 2014-12-18 11:00:02 UTC
There is a question whether we want patch in fedora 19 or ticket should be moved to fedora 20.

Fedora 19 will be out of life in a month.

Comment 21 Jakub Hrozek 2015-01-07 16:23:41 UTC
Fedora 19 is EOL now.

But I think it still makes sense to track that this issue was resolved in sssd upstream.

Comment 22 Jakub Hrozek 2015-01-07 16:24:17 UTC
Upstream ticket:
https://fedorahosted.org/sssd/ticket/2439

Comment 23 Fedora Update System 2015-01-19 13:34:42 UTC
sssd-1.12.3-3.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/sssd-1.12.3-3.fc21

Comment 24 Fedora Update System 2015-01-20 21:00:57 UTC
Package sssd-1.12.3-3.fc21:
* should fix your issue,
* was pushed to the Fedora 21 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing sssd-1.12.3-3.fc21'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-0900/sssd-1.12.3-3.fc21
then log in and leave karma (feedback).

Comment 25 Fedora Update System 2015-01-22 10:41:08 UTC
sssd-1.12.3-4.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/sssd-1.12.3-4.fc21

Comment 26 Fedora Update System 2015-02-02 17:21:01 UTC
sssd-1.12.3-4.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.