This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 71546 - nss severely broken when using multiple name services
nss severely broken when using multiple name services
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: glibc (Show other bugs)
9
i386 Linux
high Severity high
: ---
: ---
Assigned To: Nalin Dahyabhai
Jay Turner
:
Depends On:
Blocks: 66682
  Show dependency treegraph
 
Reported: 2002-08-14 19:40 EDT by Chris Ricker
Modified: 2015-01-07 18:59 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-04-24 20:50:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Chris Ricker 2002-08-14 19:40:52 EDT
nss does not appear to support simultaneous usage of multiple name services.
If two services are listed for a particular database, nss does not try the
first, then try the second.  Instead, it appears to try both to make sure they
both work, and only then actually query first the first source, and then the
second if the first did not work.

To see this in action, configure two systems.  Make one an LDAP server, and the
other an LDAP client.  Verify that both local accounts (users in /etc/passwd &&
/etc/shadow, but not in LDAP) and distributed accounts (users in LDAP but not in
/etc/passwd) can log in on the client.  So far, so good.  Now, disable network
access on the client to the LDAP server (bring down the interface, for example;
I actually discovered this accidentally by misconfiguring resolv.conf, which
will work if the LDAP server was specified by hostname and not by IP address in
/etc/ldap.conf).  Now, try to log in locally.  You can't do it!

The basic problem is this:  /etc/nsswitch.conf lists the services in the order
in which they should be tried.  In my client nsswitch.conf, the databases in
question are listed as:

passwd:  files ldap
shadow:  files ldap
group:  files ldap

That is, nss should use files, and if that returns anything other than SUCCESS,
then it should move on to try ldap.

However, that's not what appears to happen.  If LDAP is unavailable due to any
sort of network problem (downed interface, misconfigured resolv.conf, unplugged
cable, whatever), nss should still allow local, valid users to log in, since
files is listed before LDAP.  It doesn't do this, however; instead, even though
valid local files exist and contain valid entries for the users, it erroneously
will not allow authentication since LDAP is unavailable.  The error messages
logged are:

login[825]:  pam_ldap: ldap_simple_bind Can't contact LDAP server
login[825]:  Authentication service cannot retrieve authentication info.

so it looks like even though nsswitch.conf specifies files ldap, LDAP is still
being tried before files.  Even explicitly forcing nsswitch to ignore the second
source if the first source works (which shouldn't be necessary, since it's the
documented behavior for nss) by doing:

passwd:  files [SUCCESS=return] ldap
shadow:  files [SUCCESS=return] ldap
group:  files [SUCCESS=return] ldap

does not solve the problem; users who should be authenticated successfully
against files still cannot login simply because ldap is not available.

Obviously, this is a major bug.  Just to make matters interesting, it's NOT
buggy for all nss databases, since at leasts hosts is handled non-buggily.  If
nsswitch.conf lists

hosts:  files dns

and dns is unavailable, files is still correctly queried to resolve hosts
(simulated by unplugging network cable, preventing access to nameserver; hosts
listed in /etc/hosts were still correctly resolved after this).  Because it
works for hosts but not for user authentications, I'm not sure if the bug is in
nss or in pam, or a combination of the two.
Comment 1 Chris Ricker 2002-08-22 22:41:39 EDT
This is still true with null
Comment 2 Chris Ricker 2003-01-07 00:54:09 EST
See also

Bug 63631: local users never authenticated if ldap server down
Bug 58568: nis for host files always used, regardless of nsswitch.conf
Bug 66682: nis for user files always used, regardless of nsswitch.conf
Comment 3 Chris Ricker 2003-04-03 09:27:59 EST
I still see this on RHL 9 with glibc-2.3.2-11.9 -- even with

passwd: files ldap

in nsswitch.conf, local files are never read if ldap is unavailable
Comment 4 Ulrich Drepper 2003-04-17 02:37:32 EDT
I do not think this is a problem in glibc.  If you strace getent such as

  strace getent passwd <user-in-/etc/passwd>

you'll see that the request won't hit the wire, no contact to the LDAP server is
made.

Instead I fear this is a problem in PAM and the way it does the authentication.
 PAM most likely has some code which, accidently or not, requires all system
services to succeed.

Somebody who actually knows PAM will have to comment.  I'll try to get somebody
to look at this and if necessary will reassign the bug.
Comment 5 Ulrich Drepper 2003-04-24 01:42:23 EDT
See last comment.  I cannot reproduce any problem with just the functions in libc.
Comment 6 Chris Ricker 2003-04-24 10:30:03 EDT
PAM may be involved, but nss in glibc is definitely broken as well.

Do the following.

1. set up a workstation (I used 10.100.0.7 as the IP)
2. on it, put

hosts: files dns

in nsswitch.conf

3. in /etc/resolv.conf, use a nameserver across the network (I used 10.100.0.254)

4. put a bogus host in /etc/hosts; I used

192.168.0.7 slartibartfast.example.com slartibartfast

5. do

getent hosts slartibartfast

with the network cable unplugged, and discover that /etc/hosts is never consulted

look at

strace getent hosts slartibartfast

with the network cable unplugged, and discover that glibc is still trying to
connect to the name server

6. do

getent hosts slartibartfast

with the network cable plugged back in, and discover that /etc/hosts is consulted


Comment 7 Ulrich Drepper 2003-04-24 20:50:10 EDT
You should look at the getent sources.  The DNS server contact is necessary
since getent first looks for an IPv6 address.  Your /etc/hosts file doesn't
contain any and therefore the next service (DNS) is used.  Therefore the NSS
behavior is 100% correct.

Now, how to avoid unnecessary IPv6 lookups.  First, don't use gethostbyname2. 
Use getaddrinfo.  The current code, when used with AF_UNSPEC should already not
use DNS.  It should find the entry in /etc/hosts and be done.

The one remaining problem is the case where AF_UNSPEC is used (since the program
is IPv6 enabled, which makes this frequent) but the system doesn't need IPv6
addresses since it has no such interfaces.  The POSIX standard already has a
solution for this: the AI_ADDRCONFIG flag.  I have implemented this now and
it'll be in the next release.

But to summarize: this is *not* the problem.  If there is any application which
hangs it either means the application itself is buggy/badly written (e.g., using
gethostbyname2) or PAM is buggy.

I just implemented a new lookup mode for getent: "ahosts".  Just like hosts,
just uses getaddrinfo.  Look at the strace output when you use this new version.
 Then you can see that I'm right.


I'm closing this bug with CURRENTRELEASE.  The new getaddrinfo feature isn't
necessary, it's just a bonus.
Comment 8 Chris Ricker 2003-04-25 07:06:31 EDT
Thanks for the detailed explanation -- sorry about the confusion on my part.

You're definitely right about host lookups honoring NSS, and I was wrong. If I
only have an IPv4 addr in /etc/hosts, the getent lookup of it does:

open("/etc/nsswitch.conf", O_RDONLY)    = 3
close(3)                                = 0
...
connect(3, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("10.100.0.254")}, 28) = 0
close(3)                                = 0
connect(3, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("10.100.0.254")}, 28) = 0
close(3)                                = 0
...
open("/etc/hosts", O_RDONLY)            = 3
close(3)                                = 0

Once I add an IPv6 site-local addr for the same host, getent just does

open("/etc/nsswitch.conf", O_RDONLY)    = 3
close(3)                                = 0
...
open("/etc/hosts", O_RDONLY)            = 3
close(3)                                = 0

and never tries to contact the name server....

However, the original bug report here is *not* fixed. It just looks like it's
due to something in PAM, not in NSS. Should it be reopened and reassigned (back
;-) to pam?
Comment 9 Ulrich Drepper 2003-04-25 13:49:04 EDT
The problem is that confirmation is needed.  Without evidence of the smoking gun
this is an unattractive problem to pick up.  Ideally you would be able to look
at the exact PAM services and also the program itself and point to the problem.
 If you cannot do this, by all means, reopen and reassign the bug.
Comment 10 Darren Gamble 2003-08-25 15:03:04 EDT
We also have this problem.  We are trying to set up a machine to fail back to a
password file if LDAP fails.  Users out of either the password file or the LDAP
directory can authenticate if the directory server is running, but if the LDAP
service is shut off or otherwise unavailable, no users can authenticate, even
those who exist only in the password file.

This is a huge problem for us, and we'd really like to get it fixed.  The status
of the bug right now is "CLOSED - CURRENTRELEASE", but I don't see how this can
be since RH9 is (currently) the most recent release.  Is there a rawhide package
that fixes this, or has the bug just been put into the wrong status?

If the bug has not actually been fixed, is there something that we could try to
try to isolate and identify the problem?
Comment 11 Chris Ricker 2003-08-25 15:18:15 EDT
It's closed b/c what I I was filing it about (NSS in glibc being broken) in fact
is fixed.

The underlying problem which led me to believe incorrectly that NSS was broken
(local logins using local account data failing if distributed information
service is down) is still there. Rather than re-opening this bug, you're
probably better off looking at Bug #63631 which is open, has fixes, etc.

Note You need to log in before you can comment on or make changes to this bug.