Bug 429702
Summary: | multiple nscd problems (with ldap) | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Joel Eidsath <thras> |
Component: | nss_ldap | Assignee: | Nalin Dahyabhai <nalin> |
Status: | CLOSED WONTFIX | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | 4.8 | CC: | bgmilne, frank.gruellich, jjneely, jplans, michael.hagmann |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-06-20 16:57:37 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Joel Eidsath
2008-01-22 15:06:19 UTC
The problem with message bus I believe is related to having 'ldap' listed to search for protocols in your /etc/nsswitch.conf file. Remove ldap from that line and see if that helps. I'm also curious to the rest of the ldap/nscd issues. Do they look like Bug #428837? We've removed ldap from protocols in /etc/nsswitch.conf. The next time either our user server or our mail server is rebooted (hopefully not for a while) we'll let you know if that was the fix. This issue does not look at all like #428837. I've never seen nscd hit 100% of the CPU. The two nscd issues that we are having are: 1) nscd crashes at random intervals (usually after running for a week) -- this behavior has existed for a number of months 2) nscd returns bad or missing information for random users -- this behavior has existed for 1 or 2 months. I attempted to debug the first issue, but I was never able to capture a crash with nscd in debug mode -- it generates a lot of debug data. Removing the ldap from the protocols line in /etc/nsswitch.conf did not fix the the messagebus dependency. What does your /etc/ldap.conf look like? Do you use "nss_initgroups_ignoreusers" by any chance in it? Yes, we've got the following line: nss_initgroups_ignoreusers root,ldap Also, I've finally been able to record a failure in the ldap logs. My username is "thras" with uid 4954 With nscd on, I ran 'id thras' a couple times from the command line, and it returned no such user. Then I turned nscd off and 'id thras' worked. I wasn't able to reproduce this again with myself or any other users. But here is (I think -- there aren't any timestamps) the relevant portion of the nscd log: 2906: handle_request: request received (Version = 2) from PID 9515 2906: GETFDPW 2906: provide access to FD 6, for passwd 2906: handle_request: request received (Version = 2) from PID 9515 2906: GETFDGR 2906: provide access to FD 8, for group 2906: handle_request: request received (Version = 2) from PID 9515 2906: GETGRBYGID (4954) 2906: Haven't found "4954" in group cache! 2906: handle_request: request received (Version = 2) from PID 9529 2906: GETFDPW 2906: provide access to FD 6, for passwd 2906: handle_request: request received (Version = 2) from PID 9529 2906: GETFDGR 2906: provide access to FD 8, for group 2906: handle_request: request received (Version = 2) from PID 9529 2906: GETGRBYGID (4954) 2906: Haven't found "4954" in group cache! 2906: pruning hosts cache; time 1203108144 About 2000 lines (~3 minutes) earlier in the log this shows up, but I don't think that's when 'id' failed: 2906: considering INITGROUPS entry "thras", timeout 1200467811 2906: Reloading "thras" in group cache! There was a typo above. It should read "I've finally been able to record a failure in the nscd logs." I'll try adding nscd to nss_initgroups_ignoreusers and see if that corrects the system message bus problem. For the dbus hang-up, see #431301 (https://bugzilla.redhat.com/show_bug.cgi?id=431301). Downgrading nss_ldap from nss_ldap-226-20 to nss_ldap-226-18 solved the problem for me. It is also reported to fix some other issues as well (https://bugzilla.redhat.com/show_bug.cgi?id=426155 and https://bugzilla.redhat.com/show_bug.cgi?id=427189). (In my case, nss_initgroups_ignoreusers didn't help. In fact, it isn't even supported in nss_ldap-226-18.) We were never able to solve the dbus problem. Currently, downgrading nss_ldap seems to fix all sorts of problems. I don't know what sort of testing process is going on with this package before release, but it may need some modifications. We've seen similar problems on RHEL4, since about 17 February, when our updates updated nss_ldap and nscd. We had not seen this before on RHEL4. We also see it on some 5.3 boxes, but they weren't in production on anything before 5.3. However, we have seen problems enumerating local users (e.g. 'getent passwd root' fails), so I suspect this is an nscd bug, and not an nss_ldap bug. E.g., we have about 10 servers which are very similar software-wise, one of these did not get the updates at the same time, and this host is not seeing the problem. (I don't agree with the nss_initgroups_ignoreusers workaround, we use 'bind_policy soft' to restore the older nss_ldap behaviour). Is anyone seeing this problem without nscd ? This could just be bug #495515 ... Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue. |