Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 71546

Summary:	nss severely broken when using multiple name services
Product:	[Retired] Red Hat Linux	Reporter:	Chris Ricker <chris.ricker>
Component:	glibc	Assignee:	Nalin Dahyabhai <nalin>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Jay Turner <jturner>
Severity:	high	Docs Contact:
Priority:	high
Version:	9	CC:	aleksey, drepper, fweimer, srevivo
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2003-04-25 00:50:10 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	66682

Description Chris Ricker 2002-08-14 23:40:52 UTC

nss does not appear to support simultaneous usage of multiple name services.
If two services are listed for a particular database, nss does not try the
first, then try the second.  Instead, it appears to try both to make sure they
both work, and only then actually query first the first source, and then the
second if the first did not work.

To see this in action, configure two systems.  Make one an LDAP server, and the
other an LDAP client.  Verify that both local accounts (users in /etc/passwd &&
/etc/shadow, but not in LDAP) and distributed accounts (users in LDAP but not in
/etc/passwd) can log in on the client.  So far, so good.  Now, disable network
access on the client to the LDAP server (bring down the interface, for example;
I actually discovered this accidentally by misconfiguring resolv.conf, which
will work if the LDAP server was specified by hostname and not by IP address in
/etc/ldap.conf).  Now, try to log in locally.  You can't do it!

The basic problem is this:  /etc/nsswitch.conf lists the services in the order
in which they should be tried.  In my client nsswitch.conf, the databases in
question are listed as:

passwd:  files ldap
shadow:  files ldap
group:  files ldap

That is, nss should use files, and if that returns anything other than SUCCESS,
then it should move on to try ldap.

However, that's not what appears to happen.  If LDAP is unavailable due to any
sort of network problem (downed interface, misconfigured resolv.conf, unplugged
cable, whatever), nss should still allow local, valid users to log in, since
files is listed before LDAP.  It doesn't do this, however; instead, even though
valid local files exist and contain valid entries for the users, it erroneously
will not allow authentication since LDAP is unavailable.  The error messages
logged are:

login[825]:  pam_ldap: ldap_simple_bind Can't contact LDAP server
login[825]:  Authentication service cannot retrieve authentication info.

so it looks like even though nsswitch.conf specifies files ldap, LDAP is still
being tried before files.  Even explicitly forcing nsswitch to ignore the second
source if the first source works (which shouldn't be necessary, since it's the
documented behavior for nss) by doing:

passwd:  files [SUCCESS=return] ldap
shadow:  files [SUCCESS=return] ldap
group:  files [SUCCESS=return] ldap

does not solve the problem; users who should be authenticated successfully
against files still cannot login simply because ldap is not available.

Obviously, this is a major bug.  Just to make matters interesting, it's NOT
buggy for all nss databases, since at leasts hosts is handled non-buggily.  If
nsswitch.conf lists

hosts:  files dns

and dns is unavailable, files is still correctly queried to resolve hosts
(simulated by unplugging network cable, preventing access to nameserver; hosts
listed in /etc/hosts were still correctly resolved after this).  Because it
works for hosts but not for user authentications, I'm not sure if the bug is in
nss or in pam, or a combination of the two.

Comment 1 Chris Ricker 2002-08-23 02:41:39 UTC

This is still true with null

Comment 2 Chris Ricker 2003-01-07 05:54:09 UTC

See also

Bug 63631: local users never authenticated if ldap server down
Bug 58568: nis for host files always used, regardless of nsswitch.conf
Bug 66682: nis for user files always used, regardless of nsswitch.conf

Comment 3 Chris Ricker 2003-04-03 14:27:59 UTC

I still see this on RHL 9 with glibc-2.3.2-11.9 -- even with

passwd: files ldap

in nsswitch.conf, local files are never read if ldap is unavailable

Comment 4 Ulrich Drepper 2003-04-17 06:37:32 UTC

I do not think this is a problem in glibc.  If you strace getent such as

  strace getent passwd <user-in-/etc/passwd>

you'll see that the request won't hit the wire, no contact to the LDAP server is
made.

Instead I fear this is a problem in PAM and the way it does the authentication.
 PAM most likely has some code which, accidently or not, requires all system
services to succeed.

Somebody who actually knows PAM will have to comment.  I'll try to get somebody
to look at this and if necessary will reassign the bug.

Comment 5 Ulrich Drepper 2003-04-24 05:42:23 UTC

See last comment.  I cannot reproduce any problem with just the functions in libc.

Comment 6 Chris Ricker 2003-04-24 14:30:03 UTC

PAM may be involved, but nss in glibc is definitely broken as well.

Do the following.

1. set up a workstation (I used 10.100.0.7 as the IP)
2. on it, put

hosts: files dns

in nsswitch.conf

3. in /etc/resolv.conf, use a nameserver across the network (I used 10.100.0.254)

4. put a bogus host in /etc/hosts; I used

192.168.0.7 slartibartfast.example.com slartibartfast

5. do

getent hosts slartibartfast

with the network cable unplugged, and discover that /etc/hosts is never consulted

look at

strace getent hosts slartibartfast

with the network cable unplugged, and discover that glibc is still trying to
connect to the name server

6. do

getent hosts slartibartfast

with the network cable plugged back in, and discover that /etc/hosts is consulted

Comment 7 Ulrich Drepper 2003-04-25 00:50:10 UTC

You should look at the getent sources.  The DNS server contact is necessary
since getent first looks for an IPv6 address.  Your /etc/hosts file doesn't
contain any and therefore the next service (DNS) is used.  Therefore the NSS
behavior is 100% correct.

Now, how to avoid unnecessary IPv6 lookups.  First, don't use gethostbyname2. 
Use getaddrinfo.  The current code, when used with AF_UNSPEC should already not
use DNS.  It should find the entry in /etc/hosts and be done.

The one remaining problem is the case where AF_UNSPEC is used (since the program
is IPv6 enabled, which makes this frequent) but the system doesn't need IPv6
addresses since it has no such interfaces.  The POSIX standard already has a
solution for this: the AI_ADDRCONFIG flag.  I have implemented this now and
it'll be in the next release.

But to summarize: this is *not* the problem.  If there is any application which
hangs it either means the application itself is buggy/badly written (e.g., using
gethostbyname2) or PAM is buggy.

I just implemented a new lookup mode for getent: "ahosts".  Just like hosts,
just uses getaddrinfo.  Look at the strace output when you use this new version.
 Then you can see that I'm right.


I'm closing this bug with CURRENTRELEASE.  The new getaddrinfo feature isn't
necessary, it's just a bonus.

Comment 8 Chris Ricker 2003-04-25 11:06:31 UTC

Thanks for the detailed explanation -- sorry about the confusion on my part.

You're definitely right about host lookups honoring NSS, and I was wrong. If I
only have an IPv4 addr in /etc/hosts, the getent lookup of it does:

open("/etc/nsswitch.conf", O_RDONLY)    = 3
close(3)                                = 0
...
connect(3, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("10.100.0.254")}, 28) = 0
close(3)                                = 0
connect(3, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("10.100.0.254")}, 28) = 0
close(3)                                = 0
...
open("/etc/hosts", O_RDONLY)            = 3
close(3)                                = 0

Once I add an IPv6 site-local addr for the same host, getent just does

open("/etc/nsswitch.conf", O_RDONLY)    = 3
close(3)                                = 0
...
open("/etc/hosts", O_RDONLY)            = 3
close(3)                                = 0

and never tries to contact the name server....

However, the original bug report here is *not* fixed. It just looks like it's
due to something in PAM, not in NSS. Should it be reopened and reassigned (back
;-) to pam?

Comment 9 Ulrich Drepper 2003-04-25 17:49:04 UTC

The problem is that confirmation is needed.  Without evidence of the smoking gun
this is an unattractive problem to pick up.  Ideally you would be able to look
at the exact PAM services and also the program itself and point to the problem.
 If you cannot do this, by all means, reopen and reassign the bug.

Comment 10 Darren Gamble 2003-08-25 19:03:04 UTC

We also have this problem.  We are trying to set up a machine to fail back to a
password file if LDAP fails.  Users out of either the password file or the LDAP
directory can authenticate if the directory server is running, but if the LDAP
service is shut off or otherwise unavailable, no users can authenticate, even
those who exist only in the password file.

This is a huge problem for us, and we'd really like to get it fixed.  The status
of the bug right now is "CLOSED - CURRENTRELEASE", but I don't see how this can
be since RH9 is (currently) the most recent release.  Is there a rawhide package
that fixes this, or has the bug just been put into the wrong status?

If the bug has not actually been fixed, is there something that we could try to
try to isolate and identify the problem?

Comment 11 Chris Ricker 2003-08-25 19:18:15 UTC

It's closed b/c what I I was filing it about (NSS in glibc being broken) in fact
is fixed.

The underlying problem which led me to believe incorrectly that NSS was broken
(local logins using local account data failing if distributed information
service is down) is still there. Rather than re-opening this bug, you're
probably better off looking at Bug #63631 which is open, has fixes, etc.