Bug 71546
| Summary: | nss severely broken when using multiple name services | ||
|---|---|---|---|
| Product: | [Retired] Red Hat Linux | Reporter: | Chris Ricker <chris.ricker> |
| Component: | glibc | Assignee: | Nalin Dahyabhai <nalin> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Jay Turner <jturner> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 9 | CC: | aleksey, drepper, fweimer, srevivo |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | i386 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2003-04-25 00:50:10 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 66682 | ||
|
Description
Chris Ricker
2002-08-14 23:40:52 UTC
This is still true with null See also Bug 63631: local users never authenticated if ldap server down Bug 58568: nis for host files always used, regardless of nsswitch.conf Bug 66682: nis for user files always used, regardless of nsswitch.conf I still see this on RHL 9 with glibc-2.3.2-11.9 -- even with passwd: files ldap in nsswitch.conf, local files are never read if ldap is unavailable I do not think this is a problem in glibc. If you strace getent such as strace getent passwd <user-in-/etc/passwd> you'll see that the request won't hit the wire, no contact to the LDAP server is made. Instead I fear this is a problem in PAM and the way it does the authentication. PAM most likely has some code which, accidently or not, requires all system services to succeed. Somebody who actually knows PAM will have to comment. I'll try to get somebody to look at this and if necessary will reassign the bug. See last comment. I cannot reproduce any problem with just the functions in libc. PAM may be involved, but nss in glibc is definitely broken as well. Do the following. 1. set up a workstation (I used 10.100.0.7 as the IP) 2. on it, put hosts: files dns in nsswitch.conf 3. in /etc/resolv.conf, use a nameserver across the network (I used 10.100.0.254) 4. put a bogus host in /etc/hosts; I used 192.168.0.7 slartibartfast.example.com slartibartfast 5. do getent hosts slartibartfast with the network cable unplugged, and discover that /etc/hosts is never consulted look at strace getent hosts slartibartfast with the network cable unplugged, and discover that glibc is still trying to connect to the name server 6. do getent hosts slartibartfast with the network cable plugged back in, and discover that /etc/hosts is consulted You should look at the getent sources. The DNS server contact is necessary since getent first looks for an IPv6 address. Your /etc/hosts file doesn't contain any and therefore the next service (DNS) is used. Therefore the NSS behavior is 100% correct. Now, how to avoid unnecessary IPv6 lookups. First, don't use gethostbyname2. Use getaddrinfo. The current code, when used with AF_UNSPEC should already not use DNS. It should find the entry in /etc/hosts and be done. The one remaining problem is the case where AF_UNSPEC is used (since the program is IPv6 enabled, which makes this frequent) but the system doesn't need IPv6 addresses since it has no such interfaces. The POSIX standard already has a solution for this: the AI_ADDRCONFIG flag. I have implemented this now and it'll be in the next release. But to summarize: this is *not* the problem. If there is any application which hangs it either means the application itself is buggy/badly written (e.g., using gethostbyname2) or PAM is buggy. I just implemented a new lookup mode for getent: "ahosts". Just like hosts, just uses getaddrinfo. Look at the strace output when you use this new version. Then you can see that I'm right. I'm closing this bug with CURRENTRELEASE. The new getaddrinfo feature isn't necessary, it's just a bonus. Thanks for the detailed explanation -- sorry about the confusion on my part.
You're definitely right about host lookups honoring NSS, and I was wrong. If I
only have an IPv4 addr in /etc/hosts, the getent lookup of it does:
open("/etc/nsswitch.conf", O_RDONLY) = 3
close(3) = 0
...
connect(3, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("10.100.0.254")}, 28) = 0
close(3) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("10.100.0.254")}, 28) = 0
close(3) = 0
...
open("/etc/hosts", O_RDONLY) = 3
close(3) = 0
Once I add an IPv6 site-local addr for the same host, getent just does
open("/etc/nsswitch.conf", O_RDONLY) = 3
close(3) = 0
...
open("/etc/hosts", O_RDONLY) = 3
close(3) = 0
and never tries to contact the name server....
However, the original bug report here is *not* fixed. It just looks like it's
due to something in PAM, not in NSS. Should it be reopened and reassigned (back
;-) to pam?
The problem is that confirmation is needed. Without evidence of the smoking gun this is an unattractive problem to pick up. Ideally you would be able to look at the exact PAM services and also the program itself and point to the problem. If you cannot do this, by all means, reopen and reassign the bug. We also have this problem. We are trying to set up a machine to fail back to a password file if LDAP fails. Users out of either the password file or the LDAP directory can authenticate if the directory server is running, but if the LDAP service is shut off or otherwise unavailable, no users can authenticate, even those who exist only in the password file. This is a huge problem for us, and we'd really like to get it fixed. The status of the bug right now is "CLOSED - CURRENTRELEASE", but I don't see how this can be since RH9 is (currently) the most recent release. Is there a rawhide package that fixes this, or has the bug just been put into the wrong status? If the bug has not actually been fixed, is there something that we could try to try to isolate and identify the problem? It's closed b/c what I I was filing it about (NSS in glibc being broken) in fact is fixed. The underlying problem which led me to believe incorrectly that NSS was broken (local logins using local account data failing if distributed information service is down) is still there. Rather than re-opening this bug, you're probably better off looking at Bug #63631 which is open, has fixes, etc. |