Bug 181528

Summary: nscd caches incorrect user shell information
Product: Red Hat Enterprise Linux 3 Reporter: Ian McLeod <imcleod>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: aoliva, drepper, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.3.2-95.44 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-11-20 16:28:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ian McLeod 2006-02-14 20:29:44 UTC
Description of problem:

nscd seems to be caching an incorrect value for a users shell.  On an RHEL3 host
configured as follows:

/etc/nsswitch.conf:

passwd:    compat

/etc/passwd (relevant bits):

+mqm::::::
+::0:0:::/bin/false

(The mqm entry in NIS contains a shell of /bin/csh)

On a regular basis, attempts to su to the mqm user will fail.  When this happens
"getent passwd mqm" returns /bin/false for the user shell, however a ypcat of
the passwd map in NIS shows the actual user shell to be /bin/csh.

Further analysis reveals that these episodes last almost exactly 10 minutes,
which happens to be the positive-time-to-live value for passwd entry caching in
nscd.

The issue can always be resolved by restarting nscd.

The issue can be avoided by turning off passwd caching in nscd.

As a result, we suspect that the issue resides in nscd.


Version-Release number of selected component (if applicable):

nscd-2.3.2-95.30

How reproducible:

We cannot reproduce this on demand.  However, it does occur regularly on hosts
that do frequent su and sudo commands as part of custom applications.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Ian McLeod 2006-08-02 16:31:00 UTC
In revisiting this bug I've discovered an error in how I described the problem.
 It seems that when this happens, the username in question is always referenced
indirectly in the passwd file via +@ notation and the netgroup NIS database. 
So, the passwd snipit above should really read:

+@mq_acts::::::
+::0:0:::/bin/false

Where the netgroup "mq_acts" contains an entry for the "mqm" user.

All other details of the problem remain the same.  (And this is something we
continue to see on occasion on both AS 2.1 and RHEL 3.)

Comment 3 Jakub Jelinek 2006-08-03 12:58:57 UTC
If it is one of the getXXbyYY{,_r} lookups rather than getXXent{,_r}, then it
might be because nss_compat uses innetgr function in several places to see
if a particular use is in a netgroup or not.  Now, innetgr has no error
reporting, it only returns 1 if the netgroup contains the machine/user/domain
triple and 0 otherwise.
So, 0 can be returned both when there really is not such triple or if some error
occurred (such as transient failure due to busy NIS server).
Not sure what's better, if to keep the code as is, or use some other function
instead of innetgr and fail the whole request just because the netgroup lookup
failed.