Bug 495701

Summary: LDAP queries fail entirely on a (temporarily) slow server
Product: Red Hat Enterprise Linux 5 Reporter: Albert Flügel <albert.fluegel>
Component: openldapAssignee: Jan Zeleny <jzeleny>
Status: CLOSED ERRATA QA Contact: BaseOS QE <qe-baseos-auto>
Severity: high Docs Contact:
Priority: medium    
Version: 5.3CC: jplans, omoris, ovasik, timlank
Target Milestone: betaKeywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: nss_ldap-253-22 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 08:05:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch fixing this issue none

Description Albert Flügel 2009-04-14 13:14:38 UTC
Description of problem:
When the LDAP server of this client is slow, after about 10 seconds a query for an account (ou=People) or a host (ou=Hosts) makes the respective program (e.g. getent) print out the error message:
getent: ../../../libraries/libldap/error.c:273: ldap_parse_result: Assertion `r != ((void *)0)' failed.

Version-Release number of selected component (if applicable):
2.3.43-3.el5

How reproducible:
make the LDAP server of this client slow in some way, e.g. renice to a low priority and make the server busy so the ldap service is really slow.

Steps to Reproduce:
1. getent passwd <whatever-valid-account-name-in-ldap>
  
Actual results:
getent: ../../../libraries/libldap/error.c:273: ldap_parse_result: Assertion `r != ((void *)0)' failed.

Expected results:
normal getent output

Additional info:
Seems not to happen on RedHat-EL4 or 3.
Seems, fallback to another LDAP server does not work in this situation.
When the slow LDAP server is back to normal speed, the client with this openldap version stays in this status. nscd must be restarted, otherwise the problem persists.

Comment 2 Jan Zeleny 2009-04-28 11:53:07 UTC
I didn't manage to reproduce this bug, I tried several hundred times:

1. Renice slapd to +19
2. Stress the machine (stress --cpu 20 --vm 20 or stress --cpu 10 --vm 10 --io 10 --hdd 10).
3. for i in `seq 1 100`; do getent passwd <user>; echo $i; done

I tried this both on 32 bit system and 64 bit system. I also tried to set smaller values of timeouts in /etc/ldap.conf. In extreme cases getent hang until system was no longer stressed, but not a single crash.

Do you have any more info about this issue? If not, I'm sorry, but I'm going to close this bug as WORKSFORME.

Comment 3 Albert Flügel 2009-05-06 08:47:43 UTC
Sorry for this issue. Seems we are quite often having problems noone else has
in the world :-( . Probably due to a relatively large scale and heavily loaded
environment. Sorry i have no more info about this problem. Probably with
a newer openldap release this is gone anyway (?).

Comment 4 Jan Zeleny 2009-05-06 09:03:15 UTC
Ok, for now I'm closing this bug. Please re-open this issue if problems persist in newer versions of openldap.

Comment 5 timlank 2009-06-24 19:04:33 UTC
We're experiencing this also - Red Hat SR#1930570

while in an NFS mounted directory, I ran "ls -l" and got the following:
ls: ../../../libraries/libldap/error.c:273: ldap_parse_result: Assertion `r != ((void *)0)' failed

Subsequent invocations of "ls -l" ran fine.

Our master OpenLDAP doesn't seem to be taxed much though.

# rpm -qa | grep ldap
nss_ldap-253-17.el5
openldap-2.3.43-3.el5
mozldap-6.0.5-1.el5
python-ldap-2.2.0-2.1
nss_ldap-253-17.el5
openldap-2.3.43-3.el5
[root@ai13-07 /]# uname -a
Linux ourserver 2.6.18-128.1.10.el5 #1 SMP Wed Apr 29 13:53:08 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
#

Comment 6 timlank 2009-07-03 12:31:41 UTC
This seems to be resolved by increasing the nss_ldap bind_timeout parameter.  I had it at 2 from a RHEL4 environment.  Increasing this to 120, while it may be overkill, resolved the problem.  We'll likely end up reducing this to a comfortable level that does not invoke the error.

Comment 7 Albert Flügel 2009-07-03 12:42:54 UTC
Anyway this is a workaround and does not solve the actual problem. The client should try to bind to the next configured server. Having tried all servers, it should issue a warning, probably fail later. When behaving this way, there might be an additional parameter to configure e.g. allservers_fail_timeout.

Comment 8 Jan Zeleny 2009-07-03 12:47:20 UTC
Since this error seems to live, I'm reopening this bug and will be investigating it.

Comment 10 Jan Zeleny 2009-10-19 14:02:01 UTC
Created attachment 365239 [details]
Patch fixing this issue

I backported patch from upstream, it should eliminate the issue.

Comment 13 Jan Zeleny 2009-11-16 09:48:55 UTC
Patch is in CVS, changing status to MODIFIED.

Comment 15 Ondrej Moriš 2010-01-29 12:29:28 UTC
Bug reported in description is not caused by openldap.

It's the nss_ldap bug (see BZ499302). However, this bug was fixed in 5.4.z, so it is not expected to be reproducible unless you downgrade to nss_ldap < 253-22. NFS bug reported in Comment #5 is very probably the same nss_ldap problem. 

Please note that this problem was fixed in nss_ldap itself which is built with static copy of libldap.

It's reasonable to apply proposed patch to openldap, but of course it can't affect nss_ldap now.

Hence I suggest to close this bug with NOTABUG, CURRENTRELEASE or DUPLICATED (BZ499302).

Comment 16 Ondrej Moriš 2010-01-29 22:19:26 UTC
Please see Comment 15 first. 

You should have installed nss_ldap-253-22 to avoid described bugs (getent, nfs). 

Since this bug is not directly caused by openldap, we perform sanity checks only - patched openldap must work correctly without any regression.

Sanity verification successfull on RHEL5.5-{Client,Server}-20100129.nightly.

Comment 18 errata-xmlrpc 2010-03-30 08:05:24 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0198.html