From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20041020 Firefox/0.10.1 Description of problem: Testing Fedora Core 3 final release on a workstation (Ed) that authenticates over LDAP with TLS. When nscd is running, users in LDAP cannot log in. When nscd is stopped, users in LDAP can log in without any problems. Version-Release number of selected component (if applicable): nscd-2.3.3-74 How reproducible: Always Steps to Reproduce: 1. log in as root 2. # authconfig 3. enable "Use LDAP", "Use LDAP Authentication", and "Cache Information", fill in LDAP server information 4. attempt to log in as [username] in the LDAP directory on another console 5. # finger [username] 6. go back to step 2, disable "Cache Information", and repeat Actual Results: Login will fail the first time through, but finger still provides the correct information from the LDAP server. Login will succeed only after "Cache Information" is disabled, or nscd is stopped manually. # /sbin/service nscd restart and # /sbin/service nscd reload don't help either. Expected Results: nscd did not conflict with LDAP in previous Fedora releases and was useful for caching user information to prevent constantly searching the LDAP server. Additional info: The following appears in /var/log/messages: Nov 11 16:29:29 Ed unix_chkpwd[26476]: check pass; user unknown Nov 11 16:29:29 Ed login(pam_unix)[26270]: authentication failure; logname=LOGIN uid=0 euid=0 tty=tty1 ruser= rhost= Nov 11 16:29:29 Ed login(pam_unix)[26270]: could not identify user (from getpwnam(damian)) Nov 11 16:29:29 Ed login[26270]: User not known to the underlying authentication module
I'd just like to confirm that I'm seeing the same problem with Fedora Core 3. I'm using TLS ldap with self-signed certs, so I've got "TLS_REQCERT allow" set in /etc/openldap/ ldap.conf, but otherwise everthing is as default. Version numbers are nscd-2.3.3-74, openldap-2.2.13-2, and nss_ldap-220-3.
Removing /var/db/nscd/* and '/sbin/service nscd restart' at least eased the pain a bit. I'm not quite sure if it is the solution but it works for me.
I hit same. Solution: remove /var/db/nscd/* and restart nscd. Another strange thing- no problems at all, if user was never logged in this system Login failed only for regular (=everyday) users.
So what is the status? Is it in all cases just a corruption of the database? This is something which can be cured by removing the /var/db/nscd/* files. In this case there is nothing linking this problem with LDAP.
We have the same problem, but find some differences from machine to machine that I did not understand. For that reason I just did a fresh install: - install "Personal Workstation" with standard packages - configure yum for local mirror and do "yum update" - configure ldap without TLS, activate nscd during firstboot-config - getent passwd user1 => shows entry of user1 - authconfig: enabling TLS - nscd -i passwd - getent passwd user1 => empty - service nscd stop - getent passwd user1 => shows entry of user1 - service nscd start - nscd -i passwd - getent passwd user1 => empty - authconfig: disabling TLS - service nscd start - nscd -i passwd - getent passwd user1 => shows entry of user1 This can be repeated several times. It works with either TLS or nscd but not with both together. No need to delete the cache files by hand. It is enough to refresh by nscd -i. We obtain the same behaviour for "group" But when editing /etc/ldap.conf by hand things run differently. Disabling TLS by authconfig and enabling it in ldap.conf (ssl start_tls) leads to a working configuration. No idea why.
I think there is more than one issue. First question is, does everybody who has problems have SELinux enabled and if yes, does /var/log/messages show any auditing messages? I suspect that we are missing a few entires in the nscd SELinux description wrt files the program is allowed to read. The LDAP config, or more likely, the files the SSL code needs, might not be readable. So, show the entries you find, please. For those with problem, can you try disabling nscd handling of nscd? This should be possible with setsebool nscd_disable_trans true As for those who can recover after removing the /var/db/nscd/* files, how was the nscd process shut down? The files should not be easy to corrupt unless the system crashes and pending disk flushes do not happen. If this happens, do not remove the files but instead start nscd by hand with three -v parameters. Then run id under strace, like strace id some-local-user
"does everybody who has problems have SELinux enabled" Yes. "does /var/log/messages show any auditing messages?" Actually, yes, don't know why I missed these the first time. Dec 7 17:21:59 localhost nscd: 19694 Access Vector Cache (AVC) started Dec 7 17:21:59 localhost nscd: nscd startup succeeded Dec 7 17:21:59 localhost kernel: audit(1102458119.966:0): avc: denied { read } for pid=19694 exe=/usr/sbin/nscd name=urandom dev=tmpfs ino=932 scontext=root:system_r:nscd_t tcontext=system_u:object_r:urandom_device_t tclass=chr_file Dec 7 17:21:59 localhost kernel: audit(1102458119.966:0): avc: denied { read } for pid=19694 exe=/usr/sbin/nscd name=random dev=tmpfs ino=931 scontext=root:system_r:nscd_t tcontext=system_u:object_r:random_device_t tclass=chr_file Dec 7 17:22:00 localhost kernel: audit(1102458120.058:0): avc: denied { write } for pid=19694 exe=/usr/sbin/nscd name=nscd dev=dm-0 ino=8781825 scontext=root:system_r:nscd_t tcontext=root:object_r:var_t tclass=dir Dec 7 17:22:00 localhost nscd: 19694 cannot create /var/db/nscd/passwd; no persistent database used Dec 7 17:22:00 localhost kernel: audit(1102458120.058:0): avc: denied { write } for pid=19694 exe=/usr/sbin/nscd name=nscd dev=dm-0 ino=8781825 scontext=root:system_r:nscd_t tcontext=root:object_r:var_t tclass=dir Dec 7 17:22:00 localhost nscd: 19694 cannot create /var/db/nscd/group; no persistent database used Dec 7 17:22:00 localhost kernel: audit(1102458120.059:0): avc: denied { write } for pid=19694 exe=/usr/sbin/nscd name=nscd dev=dm-0 ino=8781825 scontext=root:system_r:nscd_t tcontext=root:object_r:var_t tclass=dir Dec 7 17:22:00 localhost nscd: 19694 cannot create /var/db/nscd/hosts; no persistent database used "setsebool nscd_disable_trans true" I have no idea what that does, but it fixed my problem. (Or does it introduce a gaping security hole?) "As for those who can recover after removing the /var/db/nscd/* files..." This never worked for me, so may be related to a separate bug.
The audit messages suggest the following: ~ we need to add allow nscd_t random_device_t:chr_file read; to the domains/program/nscd.te file. Perhaps you can try that. ~ your /var/db/nscd directory does not exist or has wrong contexts. What does ls -alZ /var/db/nscd show? The directory itself should look like this: drwxr-xr-x root root system_u:object_r:nscd_var_run_t . As for setsebool: this disables nscd from being handled securely. You'll have to set the boolean back to false to re-enable it. As
"~ your /var/db/nscd directory does not exist or has wrong contexts." oops, I had deleted it in addition to its sub directories when trying the solution above. I deleted it again with rm -rf /var/db/nscd , and recreated the orginal directory with a fresh install of nscd (rpm -e --nodeps nscd; yum install nscd). Now /var/log/messages just shows: Dec 7 18:13:50 localhost nscd: nscd startup succeeded Dec 7 18:13:50 localhost kernel: audit(1102461230.424:0): avc: denied { read } for pid=20507 exe=/usr/sbin/nscd name=urandom dev=tmpfs ino=932 scontext=root:system_r:nscd_t tcontext=system_u:object_r:urandom_device_t tclass=chr_file Dec 7 18:13:50 localhost kernel: audit(1102461230.424:0): avc: denied { read } for pid=20507 exe=/usr/sbin/nscd name=random dev=tmpfs ino=931 scontext=root:system_r:nscd_t tcontext=system_u:object_r:random_device_t tclass=chr_file and logins still fail as before.
The problems occur with SELinux disabled here. After clearing the cache directory everything has worked like charm. Perhaps the cache should be cleared on boot time?
> Now /var/log/messages just shows: > ... See bug 142184. Please get the appropriate policy and install it. This should get rid of these warnings.
> After clearing the cache directory everything has worked like charm. I've asked before: how was the system shut down when this happened? And what do you see when you start nscd by hand with -v -v -v on the command line?
(In reply to comment #12) > > After clearing the cache directory everything has worked like charm. > > I've asked before: how was the system shut down when this happened? And what do > you see when you start nscd by hand with -v -v -v on the command line? With /sbin/poweroff nscd: invalid option -- v Try `nscd --help' or `nscd --usage' for more information.
I meant -d -d -d. And I cannot believe that this really happens when you shut down cleanly. Unless of course your disks are total crap and they don't write the content to disk. nscd flushes the entire memory to disk before it terminates. Anyway, the debug output will show more. If running 'id' shows the problem, also attach running strace id WHATEVER
Last week I experienced something that might or might not be related. I have a number of identical FC3 x86_64 boxes that I kickstarted with the same identical setup. One of them, for some reason, had one CPU pegged to 100%. The rogue process was nscd. I tried "service nscd restart". No luck. I rebooted the box twice. No luck. I tried restarting nscd again and noticed that the CPU didn't get pegged immediately when nscd started. It always took 10-20 seconds before it started going crazy. I tried the -i trick with all three databases. No luck. Eventually, out of desperation, I removed the database files manually. That fixed it. I'll keep your "-d -d -d" suggestion in mind and try it when (if) this happens again.
Here's a hint for anyone who has seen this bug, has disabled nscd, but still cannot get users to login (but the log files still show what is generally described above): Check the ACLs in your slapd.conf file. There are some where the syntax has changed. The slaptest command that was added in FC3 to /etc/rc.d/init.d/ldap finds a few, but one that it missed was the change of by peername="IP=127\.0\.0\.1" read to by peername.ip=127.0.0.1 read Correcting the above permitted me to get my system working with nscd disabled (that is, I still see the problem described in the original submission). FWIW, I'm seeing this bug on a system that started life as RH7.3, and was subsequently upgraded in stages through to FC1. It was then taken to FC2 and immediately to FC3 + current official patches.
has anyone else noticed that a very recent update of nscd has now caused LDAP queries to be of the form: uid=nscd or uid=root instead of: uid=<whatever_user_nss_ldap_needs_to_look_up>? very curious! Peter Dohm
Created attachment 111855 [details] NSCD configuration file
I had the same user recognition problem. The issue is that NSCD's default configuration is to have persistent tables. That is why restartting the service does not solve the problem, but errasing the db does. This option can be disabled in /etc/nscd.conf change: persistent passwd yes to: persistent passwd no That solves our problem partialy. On the other hand there are many NSCD-LDAP discutions. The issue is that nscd is not disigned to operate hand to hand with LDAP. Soy when there is a modification in LDAP directory NSCD has no way of knowing there was a change. To solve this problem (work for me) you have to force the reloading of NSCD (automaticaly, and have the non persistent db option enabled). In adition is a good idea to reduce the caching time of NSCD. I attached my nscd.conf file. ps: Another issue is that the NSCD documentation is incomplete.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-251.html
The errata does not fix the issue, selinux related stuff is just the tip of the iceberg. The real problem is the persistent support in nscd and is not limited to passwd database (I think comment #19 got it really close) but also present in host. We had a host name change and while going to directly to DNS (or with nscd switched off) returned the expected name, going via nscd returned still the old name. It seems the entries in the persistent cache can occasionally get wrong life time (I have not seen any tools to dump the nscd cache, this might aid the debugging). Please reopen this report. Perhaps summary "nscd presistent cache broken" is more apriopriate.
That sounds like #150748, which is already fixed for FC4 and will be fixed in the next FC3 and RHEL4 updates.