Bug 1277672
| Summary: | nscd is not caching ldap netgroup data properly, hangs on nscd -i netgroup | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Edward Goodwin <edward.goodwin> | ||||||
| Component: | glibc | Assignee: | glibc team <glibc-bugzilla> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | qe-baseos-tools-bugs | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 6.4 | CC: | alanm, al, arawat, ashankar, codonell, cww, dkochuka, edward.goodwin, fweimer, jwright, kludhwan, mnewsome, pfrankli | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 1435615 (view as bug list) | Environment: | |||||||
| Last Closed: | 2017-11-08 09:20:22 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1374441, 1435615, 1461138 | ||||||||
| Attachments: |
|
||||||||
nss package is the Mozilla Network Security Services Name Service Switch (both nscd and /etc/nsswitch.conf) is part of glibc you can see that by either asking the rpm db who owns the file: # rpm -qf /etc/nsswitch.conf glibc-2.12-1.166.el6.x86_64 or which source package is the origin of nscd: # rpm -qi nscd | grep Source Group : System Environment/Daemons Source RPM: glibc-2.12-1.166.el6.src.rpm I'm confirming the following versions of glibc are installed glibc-2.12-1.166.el6_7.3.i686 glibc-2.12-1.166.el6_7.3.x86_64 My workaround for this issue has been to disable netgroup cache in /etc/nscd.conf. It is enabled by default
enable-cache netgroup no
positive-time-to-live netgroup 28800
negative-time-to-live netgroup 20
suggested-size netgroup 211
check-files netgroup yes
persistent netgroup yes
shared netgroup yes
max-db-size netgroup 33554432
Any help on this issue is appreciated, its a major annoyance in our environment.
Please contact customer support and point them to this bug, this will increase its priority. I also encountered problems with nscd -i while solving netgroup cache issues. I'll attach patch to this ticket I used to solve hang when running nscd -i. Created attachment 1267982 [details]
patch to fix nscd -i command hang
Also, see https://bugzilla.redhat.com/show_bug.cgi?id=1436335 for probable reason netgroup cache is failing w/ sudo Sorry, after another look at this I found I had patch test contamination. It seems somehow the patch I posted to ticket 1436335 was preventing the condition causing nscd to hang when trying to invalidate cache. When I remove the other patch, nscd hangs even with this one applied. Created attachment 1269025 [details]
patch to fix nscd -i command hang
This patch corrects orphaned readlocks on netgroup db in nscd
Red Hat Enterprise Linux 6 transitioned to the Production 3 Phase on May 10, 2017. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. This issue does not qualify, but we are considering incorporating the upstream change into Red Hat Enterprise Linux 7; see bug 1435615. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. I find it extremely discouraging that RedHat lets the bug sit for two years until the release transitions to Prod Phase 3 and the declines to fix it; a very convenient way to get out of fixing downstream bugs. (In reply to Edward Goodwin from comment #20) > I find it extremely discouraging that RedHat lets the bug sit for two years > until the release transitions to Prod Phase 3 and the declines to fix it; a > very convenient way to get out of fixing downstream bugs. Edward, Thank you for your feedback, we do appreciate it. We only just managed to reproduce and fix this issue in September, and the backport risk exceeded what we could do in RHEL 6.10. Please keep in mind our customers expect very little change at this point in the RHEL 6 release. We have scheduled a fix for this issue in the upcoming RHEL 7.5 release (Bug 1435615). So while progress was slow we had not forgotten about this issue and we strive to improve subsequent versions of RHEL. Thank you again. Why would it take 5 months to reproduce and fix this issue when I posted the patch I've been using for 9 months now without issue, showing exactly where the orphaned readlocks were back in April of 2017? (In reply to Al Heisner from comment #22) > Why would it take 5 months to reproduce and fix this issue when I posted the > patch I've been using for 9 months now without issue, showing exactly where > the orphaned readlocks were back in April of 2017? Al, Thank you also for your feedback. All changes going into Red Hat Enterprise Linux are verified and validated. This includes looking at broader issues with the locks in nscd and running our own tests on a number of different configurations. Despite the outcome being a small number of lines of code change, the underlying work that goes into accepting those changes is not small. In this case the work also ran against the deadline for transition from Phase 2 to Phase 3 and we appreciate the difficulty that brings for some customers. As I mentioned in comment #21 the fix is scheduled for RHEL 7.5. |
Description of problem: nscd is not properly caching netgroup data from LDAP. It doesn't appear to be updating the cached data from the ldap server. attempting to invalidate the netgroup cache with `nscd -i netgroup` hangs. As a result sudoRole ldap sudo records that reference netgroups with "+" syntax are not being honored. Stopping nscd resolves the issue. Version-Release number of selected component (if applicable): nscd-2.12-1.166.el6_7.3.x86_64 nss-3.19.1-3.el6_6.x86_64 How reproducible: always Steps to Reproduce: --- /etc/nscd.conf --- enable-cache netgroup yes positive-time-to-live netgroup 28800 negative-time-to-live netgroup 20 suggested-size netgroup 211 check-files netgroup yes persistent netgroup yes shared netgroup yes max-db-size netgroup 33554432 ---------------------- --- /etc/nsswitch.conf --- passwd: compat passwd_compat: ldap shadow: compat group: files ldap hosts: files dns bootparams: nisplus [NOTFOUND=return] files ethers: files netmasks: files networks: files protocols: files rpc: files services: files netgroup: ldap publickey: nisplus automount: files nisplus aliases: files nisplus sudoers: files ldap 1. run getent netgroup webusers webusers: ( ,egoodwin, ) ( ,jsmith, ) ( ,aflanders, ) 2. update the netgroup on the ldap side , getent returns the same result webusers: ( ,egoodwin, ) ( ,jsmith, ) ( ,aflanders, ) 3. stop nscd with `service nscd stop` 4. getent returns the up-to-date values webusers: ( ,egoodwin, ) ( ,jsmith, ) ( ,aflanders, ) ( ,jpate, ) 5. restart nscd, it returns the cached values webusers: ( ,egoodwin, ) ( ,jsmith, ) ( ,aflanders, ) 6. attempt to invalidate the netgroup cache with nscd -i Actual results: nscd will hang in step 6. The nscd netgroup cache will never update properly. Expected results: nscd -i to invalidate the cache and repopulate from the source. Additional info: able to reproduce this on several rhel 6.4 servers