1277672 – nscd is not caching ldap netgroup data properly, hangs on nscd -i netgroup

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1277672 - nscd is not caching ldap netgroup data properly, hangs on nscd -i netgroup

Summary: nscd is not caching ldap netgroup data properly, hangs on nscd -i netgroup

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	glibc
Sub Component:
Version:	6.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	glibc team
QA Contact:	qe-baseos-tools-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1374441 1435615 1461138
TreeView+	depends on / blocked

Reported:	2015-11-03 19:36 UTC by Edward Goodwin
Modified:	2020-12-11 11:58 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1435615 (view as bug list)
Environment:
Last Closed:	2017-11-08 09:20:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
patch to fix nscd -i command hang (573 bytes, patch) 2017-03-31 20:53 UTC, Al Heisner	no flags	Details \| Diff
patch to fix nscd -i command hang (333 bytes, patch) 2017-04-05 16:05 UTC, Al Heisner	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Debian BTS	800523	0	None	None	None	Never
Sourceware	22161	0	P2	RESOLVED	nscd cache prune for netgroups hangs after timeout bump	2021-02-10 11:14:55 UTC

Description Edward Goodwin 2015-11-03 19:36:19 UTC

Description of problem:

nscd is not properly caching netgroup data from LDAP. It doesn't appear to be updating the cached data from the ldap server. attempting to invalidate the netgroup cache with `nscd -i netgroup` hangs. As a result sudoRole ldap sudo records that reference netgroups with "+" syntax are not being honored. Stopping nscd resolves the issue. 


Version-Release number of selected component (if applicable):

nscd-2.12-1.166.el6_7.3.x86_64
nss-3.19.1-3.el6_6.x86_64

How reproducible: always


Steps to Reproduce:

--- /etc/nscd.conf ---
        enable-cache            netgroup        yes
        positive-time-to-live   netgroup        28800
        negative-time-to-live   netgroup        20
        suggested-size          netgroup        211
        check-files             netgroup        yes
        persistent              netgroup        yes
        shared                  netgroup        yes
        max-db-size             netgroup        33554432


----------------------

--- /etc/nsswitch.conf ---
passwd:     compat
passwd_compat:    ldap
shadow:     compat
group:      files ldap
hosts:      files dns

bootparams: nisplus [NOTFOUND=return] files
ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files
netgroup:   ldap
publickey:  nisplus
automount:  files nisplus
aliases:    files nisplus
sudoers:  files ldap

1. run getent netgroup webusers

webusers: ( ,egoodwin, ) ( ,jsmith, ) ( ,aflanders, )

2. update the netgroup on the ldap side , getent returns the same result

webusers: ( ,egoodwin, ) ( ,jsmith, ) ( ,aflanders, )

3. stop nscd with `service nscd stop`

4. getent returns the up-to-date values

webusers: ( ,egoodwin, ) ( ,jsmith, ) ( ,aflanders, ) ( ,jpate, )

5. restart nscd, it returns the cached values

webusers: ( ,egoodwin, ) ( ,jsmith, ) ( ,aflanders, )

6.  attempt to invalidate the netgroup cache with nscd -i 



Actual results:

nscd will hang in step 6. The nscd netgroup cache will never update properly. 


Expected results:

nscd -i to invalidate the cache and repopulate from the source. 


Additional info:
able to reproduce this on several rhel 6.4 servers

Comment 2 Hubert Kario 2015-11-04 11:57:22 UTC

nss package is the Mozilla Network Security Services

Name Service Switch (both nscd and /etc/nsswitch.conf) is part of glibc

you can see that by either asking the rpm db who owns the file:
# rpm -qf /etc/nsswitch.conf 
glibc-2.12-1.166.el6.x86_64

or which source package is the origin of nscd:
# rpm -qi nscd | grep Source
Group       : System Environment/Daemons    Source RPM: glibc-2.12-1.166.el6.src.rpm

Comment 3 Edward Goodwin 2015-11-04 15:54:29 UTC

I'm confirming the following versions of glibc are installed

glibc-2.12-1.166.el6_7.3.i686
glibc-2.12-1.166.el6_7.3.x86_64

Comment 6 Edward Goodwin 2016-02-29 11:16:56 UTC

My workaround for this issue has been to disable netgroup cache in /etc/nscd.conf. It is enabled by default 

        enable-cache            netgroup        no
        positive-time-to-live   netgroup        28800
        negative-time-to-live   netgroup        20
        suggested-size          netgroup        211
        check-files             netgroup        yes
        persistent              netgroup        yes
        shared                  netgroup        yes
        max-db-size             netgroup        33554432

Any help on this issue is appreciated, its a major annoyance in our environment.

Comment 7 Hubert Kario 2016-02-29 13:00:39 UTC

Please contact customer support and point them to this bug, this will increase its priority.

Comment 12 Al Heisner 2017-03-31 20:52:04 UTC

I also encountered problems with nscd -i while solving netgroup cache issues. I'll attach patch to this ticket I used to solve hang when running nscd -i.

Comment 13 Al Heisner 2017-03-31 20:53:34 UTC

Created attachment 1267982 [details]
patch to fix nscd -i command hang

Comment 14 Al Heisner 2017-03-31 22:03:17 UTC

Also, see https://bugzilla.redhat.com/show_bug.cgi?id=1436335 for probable reason netgroup cache is failing w/ sudo

Comment 15 Al Heisner 2017-04-04 19:35:52 UTC

Sorry, after another look at this I found I had patch test contamination. It seems somehow the patch I posted to ticket 1436335 was preventing the condition causing nscd to hang when trying to invalidate cache. When I remove the other patch, nscd hangs even with this one applied.

Comment 16 Al Heisner 2017-04-05 16:05:49 UTC

Created attachment 1269025 [details]
patch to fix nscd -i command hang

This patch corrects orphaned readlocks on netgroup db in nscd

Comment 18 Florian Weimer 2017-11-08 09:20:15 UTC

Red Hat Enterprise Linux 6 transitioned to the Production 3 Phase on May 10, 2017.  During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

This issue does not qualify, but we are considering incorporating the upstream change into Red Hat Enterprise Linux 7; see bug 1435615.

Comment 19 Red Hat Bugzilla Rules Engine 2017-11-08 09:20:22 UTC

Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.

Comment 20 Edward Goodwin 2018-01-10 22:55:22 UTC

I find it extremely discouraging that RedHat lets the bug sit  for two years until the release transitions to Prod Phase 3 and the declines to fix it; a very convenient way to get out of fixing downstream bugs.

Comment 21 Carlos O'Donell 2018-01-11 03:59:59 UTC

(In reply to Edward Goodwin from comment #20)
> I find it extremely discouraging that RedHat lets the bug sit  for two years
> until the release transitions to Prod Phase 3 and the declines to fix it; a
> very convenient way to get out of fixing downstream bugs.

Edward,

Thank you for your feedback, we do appreciate it.

We only just managed to reproduce and fix this issue in September, and the backport risk exceeded what we could do in RHEL 6.10. Please keep in mind our customers expect very little change at this point in the RHEL 6 release.

We have scheduled a fix for this issue in the upcoming RHEL 7.5 release (Bug 1435615). So while progress was slow we had not forgotten about this issue and we strive to improve subsequent versions of RHEL.

Thank you again.

Comment 22 Al Heisner 2018-01-11 04:16:50 UTC

Why would it take 5 months to reproduce and fix this issue when I posted the patch I've been using for 9 months now without issue, showing exactly where the orphaned readlocks were back in April of 2017?

Comment 23 Carlos O'Donell 2018-01-11 04:38:29 UTC

(In reply to Al Heisner from comment #22)
> Why would it take 5 months to reproduce and fix this issue when I posted the
> patch I've been using for 9 months now without issue, showing exactly where
> the orphaned readlocks were back in April of 2017?

Al,

Thank you also for your feedback.

All changes going into Red Hat Enterprise Linux are verified and validated. This includes looking at broader issues with the locks in nscd and running our own tests on a number of different configurations. Despite the outcome being a small number of lines of code change, the underlying work that goes into accepting those changes is not small. In this case the work also ran against the deadline for transition from Phase 2 to Phase 3 and we appreciate the difficulty that brings for some customers.

As I mentioned in comment #21 the fix is scheduled for RHEL 7.5.

Note You need to log in before you can comment on or make changes to this bug.