Bug 1435615

Summary: nscd is not caching ldap netgroup data properly, hangs on nscd -i netgroup
Product: Red Hat Enterprise Linux 7 Reporter: Deepu K S <dkochuka>
Component: glibcAssignee: DJ Delorie <dj>
Status: CLOSED ERRATA QA Contact: Sergey Kolosov <skolosov>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: arawat, ashankar, codonell, cww, dkochuka, edward.goodwin, fweimer, glibc-bugzilla, kludhwan, mcermak, mnewsome, pfrankli, qe-baseos-tools-bugs, skolosov
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glibc-2.17-201.el7 Doc Type: Bug Fix
Doc Text:
Cause: incorrect use of locks in nscd Consequence: On systems where netgroups are cached by nscd, nscd may occasionally hang, resulting in a failure to notice updates in cached information. Fix: nscd has been patched to properly release its internal locks when handling cache timeouts. Result: Cache data should properly update now.
Story Points: ---
Clone Of: 1277672 Environment:
Last Closed: 2018-04-10 13:58:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1277672    
Bug Blocks: 1420851, 1473718    

Comment 8 DJ Delorie 2017-09-26 15:56:18 UTC
To reproduce the core of this issue, the following steps must happen:

* Configure a service that nscd can cache, which provides netgroup maps.  I use LDAP but others should work also.
* Ensure it has at least one positive query for "in net group" (which may require other maps)
* Perform multiple identical "in net group" queries within the TTL time.  If the cached service is remote, you can tell when you have enough when network traffic stops.  I used getent netgroup QAUsers "" testuser23461 ""

(the bug has now been triggered; the next steps show the effects)

* wait for the entry to time out (the positive TTL time).
* If you watch the logs (enable debug) you can see the message where the cleanup task tries to purge it, or after the timeout time, you can use "nscd -i netgroup" to force a purge

If you let the cleanup task run, you can attach to nscd with gdb and "info threads" - one will be waiting for a write lock.

If you use "nscd -i netgroup" it will hang.

To "reset" this test, you need to stop nscd *and* manually remove the persistent cached databases it uses.

Comment 13 errata-xmlrpc 2018-04-10 13:58:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0805