Description of problem: nscd on EL5.5 occasionally doesn't report all groups for user (oracle). Because of this, oracle processes are failing intermittently. The problem occurs unexpectedly hence it's hardly difficult to reproduce. Version-Release number of selected component (if applicable): kernel: 2.6.18-194.17.4.0.1.el5 nscd: nscd-2.5-49 How reproducible: Unable to reproduce. Attaching two straces. One with expected result and one with missing groups Steps to Reproduce: 1. 2. 3. Actual results: id oracle uid=300(oracle) gid=300(dba) groups=300(dba) Expected results: id oracle uid=300(oracle) gid=300(dba) groups=300(dba),400(oinstall),402(asmdba) Additional info:
Thanks for the bug report. I'm afraid I've been unable to replicate this problem. Are you still running into it? Can you please attach your nsswitch.conf, nscd.conf, and nscd logs for a session in which both correct and incorrect results are displayed? passwd and group files might also be important to try to figure this out. Thanks in advance,
Wild guess: do you still get the problem if you set threads and max_threads to 1 in nscd.conf? I suspect the problem might be multiple threads calling getgrouplist concurrently, interferring with each other while iterating over the group list using the not-really-reentrant getgrent_r.
Created attachment 556249 [details] Patch that avoids multi-threaded issues with initgroups compat_call This problem doesn't occur with glibc trunk because nss_files provides _nss_files_initgroups_dyn, that opens and reads /etc/group, whereas glibc 2.5 falls back to compat_call, that uses getgrent_r and makes itself vulnerable to other threads' concurrent use. This patch improves glibc's use of any nss implementation that lacks initgroups_dyn, by stopping multiple threads from interfering with each other's within the compat_call that iterates over the group list supplied by the implementation. I'll submit this improvement upstream.
*** Bug 766786 has been marked as a duplicate of this bug. ***
Created attachment 559373 [details] /etc/group file from system on which problem is occurring
Created attachment 559374 [details] Hourly output of groups command for userid cladmin - most recent first
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Hello, Do you still see this behaviour if you set threads and max_threads to 1 in nscd.conf? Thank You Joe Kachuck
If they're getting permission denied, then this is likely a different problem related to leaking file descriptors. See 795674 and the bugs linked to within. If they are only seeing some groups not being reported, then having them test their system with max_threads and threads to the value 1 in nscd.conf would be greatly appreciated.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1308.html