Bug 138911 - nscd breaks LDAP authentication
Summary: nscd breaks LDAP authentication
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 3
Hardware: i386
OS: Linux
medium
low
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact:
URL:
Whiteboard:
Depends On: 142184
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-11-11 21:47 UTC by Damian Christey
Modified: 2007-11-30 22:10 UTC (History)
8 users (show)

Fixed In Version: RHBA-2005-251
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-06-09 13:05:51 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
NSCD configuration file (1.63 KB, text/plain)
2005-03-10 14:55 UTC, Nicolas Troncoso Carrere
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2005:251 0 low SHIPPED_LIVE selinux-policy-targeted bug fix update 2005-06-09 04:00:00 UTC

Description Damian Christey 2004-11-11 21:47:30 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20041020
Firefox/0.10.1

Description of problem:
Testing Fedora Core 3 final release on a workstation (Ed) that
authenticates over LDAP with TLS.

When nscd is running, users in LDAP cannot log in.

When nscd is stopped, users in LDAP can log in without any problems.



Version-Release number of selected component (if applicable):
nscd-2.3.3-74

How reproducible:
Always

Steps to Reproduce:
1. log in as root
2. # authconfig
3. enable "Use LDAP", "Use LDAP Authentication", and "Cache
Information", fill in LDAP server information
4. attempt to log in as [username] in the LDAP directory on another
console
5. # finger [username]
6. go back to step 2, disable "Cache Information", and repeat
    

Actual Results:  Login will fail the first time through, but finger
still provides the correct information from the LDAP server.  

Login will succeed only after "Cache Information" is disabled, or nscd
is stopped manually.

# /sbin/service nscd restart
and
# /sbin/service nscd reload 
don't help either.

Expected Results:  nscd did not conflict with LDAP in previous Fedora
releases and was useful for caching user information to prevent
constantly searching the LDAP server.

Additional info:

The following appears in /var/log/messages:

Nov 11 16:29:29 Ed unix_chkpwd[26476]: check pass; user unknown
Nov 11 16:29:29 Ed login(pam_unix)[26270]: authentication failure;
logname=LOGIN uid=0 euid=0 tty=tty1 ruser= rhost=
Nov 11 16:29:29 Ed login(pam_unix)[26270]: could not identify user
(from getpwnam(damian))
Nov 11 16:29:29 Ed login[26270]: User not known to the underlying
authentication module

Comment 1 Matthew West 2004-11-16 17:19:59 UTC
I'd just like to confirm that I'm seeing the same problem with Fedora Core 3. I'm using TLS 
ldap with self-signed certs, so I've got "TLS_REQCERT allow" set in /etc/openldap/
ldap.conf, but otherwise everthing is as default. Version numbers are nscd-2.3.3-74, 
openldap-2.2.13-2, and nss_ldap-220-3.

Comment 2 Niilo Kajander 2004-11-25 11:12:34 UTC
Removing /var/db/nscd/* and '/sbin/service nscd restart' at least
eased the pain a bit. I'm not quite sure if it is the solution but it
works for me.

Comment 3 Alar Suija 2004-11-25 11:23:24 UTC
I hit same. Solution: remove /var/db/nscd/* and restart nscd.

Another strange thing- no problems at all, if user was never logged in
this system
Login failed only for regular (=everyday) users.

Comment 4 Ulrich Drepper 2004-11-27 19:52:57 UTC
So what is the status?  Is it in all cases just a corruption of the database? 
This is something which can be cured by removing the /var/db/nscd/* files.  In
this case there is nothing linking this problem with LDAP.

Comment 5 Frank Mueller 2004-12-02 11:39:15 UTC
We have the same problem, but find some differences from machine to
machine that I did not understand. For that reason I just did a fresh
install:

- install "Personal Workstation" with standard packages
- configure yum for local mirror and do "yum update"
- configure ldap without TLS, activate nscd during firstboot-config
- getent passwd user1 => shows entry of user1

- authconfig: enabling TLS
- nscd -i passwd
- getent passwd user1 => empty
- service nscd stop
- getent passwd user1 => shows entry of user1
- service nscd start
- nscd -i passwd
- getent passwd user1 => empty

- authconfig: disabling TLS
- service nscd start
- nscd -i passwd
- getent passwd user1 => shows entry of user1

This can be repeated several times. It  works with either TLS or nscd
but not with both together. No need to delete the cache files by hand.
It is enough to refresh by nscd -i. We obtain the same behaviour for
"group"

But when editing /etc/ldap.conf by hand things run differently.
Disabling TLS by authconfig and enabling it in ldap.conf (ssl
start_tls) leads to a working configuration. No idea why. 




Comment 6 Ulrich Drepper 2004-12-07 21:45:11 UTC
I think there is more than one issue.

First question is, does everybody who has problems have SELinux
enabled and if yes, does /var/log/messages show any auditing messages?
 I suspect that we are missing a few entires in the nscd SELinux
description wrt files the program is allowed to read.  The LDAP
config, or more likely, the files the SSL code needs, might not be
readable.  So, show the entries you find, please.

For those with problem, can you try disabling nscd handling of nscd? 
This should be possible with

   setsebool nscd_disable_trans true


As for those who can recover after removing the /var/db/nscd/* files,
how was the nscd process shut down?  The files should not be easy to
corrupt unless the system crashes and pending disk flushes do not
happen.  If this happens, do not remove the files but instead start
nscd by hand with three -v parameters.  Then run id under strace, like

  strace id some-local-user

Comment 7 Damian Christey 2004-12-07 22:30:59 UTC
"does everybody who has problems have SELinux
enabled" 
Yes.

"does /var/log/messages show any auditing messages?"
Actually, yes, don't know why I missed these the first time.

Dec  7 17:21:59 localhost nscd: 19694 Access Vector Cache (AVC) started
Dec  7 17:21:59 localhost nscd: nscd startup succeeded
Dec  7 17:21:59 localhost kernel: audit(1102458119.966:0): avc: 
denied  { read } for  pid=19694 exe=/usr/sbin/nscd name=urandom
dev=tmpfs ino=932 scontext=root:system_r:nscd_t
tcontext=system_u:object_r:urandom_device_t tclass=chr_file
Dec  7 17:21:59 localhost kernel: audit(1102458119.966:0): avc: 
denied  { read } for  pid=19694 exe=/usr/sbin/nscd name=random
dev=tmpfs ino=931 scontext=root:system_r:nscd_t
tcontext=system_u:object_r:random_device_t tclass=chr_file
Dec  7 17:22:00 localhost kernel: audit(1102458120.058:0): avc: 
denied  { write } for  pid=19694 exe=/usr/sbin/nscd name=nscd dev=dm-0
ino=8781825 scontext=root:system_r:nscd_t tcontext=root:object_r:var_t
tclass=dir
Dec  7 17:22:00 localhost nscd: 19694 cannot create
/var/db/nscd/passwd; no persistent database used
Dec  7 17:22:00 localhost kernel: audit(1102458120.058:0): avc: 
denied  { write } for  pid=19694 exe=/usr/sbin/nscd name=nscd dev=dm-0
ino=8781825 scontext=root:system_r:nscd_t tcontext=root:object_r:var_t
tclass=dir
Dec  7 17:22:00 localhost nscd: 19694 cannot create
/var/db/nscd/group; no persistent database used
Dec  7 17:22:00 localhost kernel: audit(1102458120.059:0): avc: 
denied  { write } for  pid=19694 exe=/usr/sbin/nscd name=nscd dev=dm-0
ino=8781825 scontext=root:system_r:nscd_t tcontext=root:object_r:var_t
tclass=dir
Dec  7 17:22:00 localhost nscd: 19694 cannot create
/var/db/nscd/hosts; no persistent database used

"setsebool nscd_disable_trans true"
I have no idea what that does, but it fixed my problem.  (Or does it
introduce a gaping security hole?)

"As for those who can recover after removing the /var/db/nscd/* files..."
This never worked for me, so may be related to a separate bug.



Comment 8 Ulrich Drepper 2004-12-07 22:50:40 UTC
The audit messages suggest the following:

~ we need to add
allow nscd_t random_device_t:chr_file read;
  to the domains/program/nscd.te file.  Perhaps you can try that.

~ your /var/db/nscd directory does not exist or has wrong contexts.  What does
ls -alZ /var/db/nscd show?  The directory itself should look like this:

drwxr-xr-x  root     root     system_u:object_r:nscd_var_run_t .

As for setsebool: this disables nscd from being handled securely.  You'll have
to set the boolean back to false to re-enable it.

As



Comment 9 Damian Christey 2004-12-07 23:21:06 UTC
"~ your /var/db/nscd directory does not exist or has wrong contexts."

oops, I had deleted it in addition to its sub directories when trying the
solution above.  I deleted it again with rm -rf /var/db/nscd , and recreated the
orginal directory with a fresh install of nscd (rpm -e --nodeps nscd; yum
install nscd).  Now /var/log/messages just shows:

Dec  7 18:13:50 localhost nscd: nscd startup succeeded
Dec  7 18:13:50 localhost kernel: audit(1102461230.424:0): avc:  denied  { read
} for  pid=20507 exe=/usr/sbin/nscd name=urandom dev=tmpfs ino=932
scontext=root:system_r:nscd_t tcontext=system_u:object_r:urandom_device_t
tclass=chr_file
Dec  7 18:13:50 localhost kernel: audit(1102461230.424:0): avc:  denied  { read
} for  pid=20507 exe=/usr/sbin/nscd name=random dev=tmpfs ino=931
scontext=root:system_r:nscd_t tcontext=system_u:object_r:random_device_t
tclass=chr_file

and logins still fail as before.

Comment 10 Niilo Kajander 2004-12-08 06:47:42 UTC
The problems occur with SELinux disabled here. After clearing the cache
directory everything has worked like charm. Perhaps the cache should be cleared
on boot time?

Comment 11 Ulrich Drepper 2004-12-10 10:37:23 UTC
> Now /var/log/messages just shows:
> ...

See bug 142184.  Please get the appropriate policy and install it.  This should
get rid of these warnings.

Comment 12 Ulrich Drepper 2004-12-10 10:41:05 UTC
> After clearing the cache directory everything has worked like charm.

I've asked before: how was the system shut down when this happened?  And what do
you see when you start nscd by hand with -v -v -v on the command line?

Comment 13 Niilo Kajander 2004-12-10 10:48:42 UTC
(In reply to comment #12)
> > After clearing the cache directory everything has worked like charm.
> 
> I've asked before: how was the system shut down when this happened?  And what do
> you see when you start nscd by hand with -v -v -v on the command line?

With /sbin/poweroff

nscd: invalid option -- v
Try `nscd --help' or `nscd --usage' for more information.


Comment 14 Ulrich Drepper 2004-12-10 10:58:57 UTC
I meant -d -d -d.

And I cannot believe that this really happens when you shut down cleanly. 
Unless of course your disks are total crap and they don't write the content to
disk.  nscd flushes the entire memory to disk before it terminates.

Anyway, the debug output will show more.  If running 'id' shows the problem,
also attach running

  strace id WHATEVER

Comment 15 Rudi Chiarito 2005-01-17 15:48:08 UTC
Last week I experienced something that might or might not be related.

I have a number of identical FC3 x86_64 boxes that I kickstarted with
the same identical setup. One of them, for some reason, had one CPU
pegged to 100%. The rogue process was nscd. I tried "service nscd
restart". No luck. I rebooted the box twice. No luck. I tried
restarting nscd again and noticed that the CPU didn't get pegged
immediately when nscd started. It always took 10-20 seconds before it
started going crazy.

I tried the -i trick with all three databases. No luck. Eventually,
out of desperation, I removed the database files manually. That fixed it.

I'll keep your "-d -d -d" suggestion in mind and try it when (if) this
happens again.

Comment 16 Devin Reade 2005-01-31 04:23:49 UTC
Here's a hint for anyone who has seen this bug, has disabled nscd,
but still cannot get users to login (but the log files still show
what is generally described above):

Check the ACLs in your slapd.conf file.  There are some where the
syntax has changed.  The slaptest command that was added in FC3
to /etc/rc.d/init.d/ldap finds a few, but one that it missed was
the change of
    by peername="IP=127\.0\.0\.1" read
to
    by peername.ip=127.0.0.1 read
Correcting the above permitted me to get my system working with
nscd disabled (that is, I still see the problem described in
the original submission).

FWIW, I'm seeing this bug on a system that started life as RH7.3,
and was subsequently upgraded in stages through to FC1.  It was
then taken to FC2 and immediately to FC3 + current official patches.


Comment 17 Peter J. Dohm 2005-02-11 05:49:19 UTC
has anyone else noticed that a very recent update of nscd has now
caused LDAP queries to be of the form:

uid=nscd

or

uid=root

instead of:

uid=<whatever_user_nss_ldap_needs_to_look_up>?

very curious!

Peter Dohm

Comment 18 Nicolas Troncoso Carrere 2005-03-10 14:55:27 UTC
Created attachment 111855 [details]
NSCD configuration file

Comment 19 Nicolas Troncoso Carrere 2005-03-10 14:56:40 UTC
I had the same user recognition problem. The issue is that NSCD's
default configuration is to have persistent tables. That is why
restartting the service does not solve the problem, but errasing the
db does. This option can be disabled in /etc/nscd.conf
change:
persistent              passwd          yes
to:
persistent              passwd          no

That solves our problem partialy.

On the other hand there are many NSCD-LDAP discutions. The issue is
that nscd is not disigned to operate hand to hand with LDAP. Soy when
there is a modification in LDAP directory NSCD has no way of knowing
there was a change.

To solve this problem (work for me) you have to force the reloading of
NSCD (automaticaly, and have the non persistent db option enabled). In
adition is a good idea to reduce the caching time of NSCD.

I attached my nscd.conf file.

ps: Another issue is that the NSCD documentation is incomplete.

Comment 20 Tim Powers 2005-06-09 13:05:52 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-251.html


Comment 21 Pawel Salek 2005-06-23 16:10:04 UTC
The errata does not fix the issue, selinux related stuff is just the tip of the
iceberg. The real problem is the persistent support in nscd and is not limited
to passwd database (I think comment #19 got it really close) but also present in
host. We had a host name change and  while going to directly to DNS (or with
nscd switched off) returned the expected name, going via nscd returned still the
old name. It seems the entries in the persistent cache can occasionally get
wrong life time (I have not seen any tools to dump the nscd cache, this might
aid the debugging).

Please reopen this report. Perhaps summary "nscd presistent cache broken" is
more apriopriate.

Comment 22 Jakub Jelinek 2005-06-23 16:14:43 UTC
That sounds like #150748, which is already fixed for FC4 and will be fixed
in the next FC3 and RHEL4 updates.


Note You need to log in before you can comment on or make changes to this bug.