RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1257543 - slapd crash in do_search
Summary: slapd crash in do_search
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: openldap
Version: 6.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Matus Honek
QA Contact: Patrik Kis
URL:
Whiteboard:
: 1318904 (view as bug list)
Depends On:
Blocks: 1272422 1316450 1665441
TreeView+ depends on / blocked
 
Reported: 2015-08-27 09:55 UTC by Dennis van Dok
Modified: 2019-11-14 06:54 UTC (History)
11 users (show)

Fixed In Version: openldap-2.4.40-8.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1316450 (view as bug list)
Environment:
Last Closed: 2016-05-11 00:59:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
slapd conf (4.06 KB, text/plain)
2015-10-06 13:15 UTC, Andrea
no flags Details
another slapd conf file generating crashes (2.57 KB, text/plain)
2015-10-08 14:19 UTC, Andrea
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1318904 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Product Errata RHBA-2016:0943 0 normal SHIPPED_LIVE openldap bug fix update 2016-05-10 22:55:28 UTC

Internal Links: 1318904

Description Dennis van Dok 2015-08-27 09:55:54 UTC
Description of problem:

slapd crashes under certain high load conditions.

Version-Release number of selected component (if applicable):

openldap-servers-2.4.40-5.el6

How reproducible:

It's consistent with the introduction of this version of openldap-servers; the previous version of 6.6 did not have this issue. But reproducing requires setting up a top-level BDII which is not trivial.

http://www.eu-emi.eu/releases/emi-3-montebianco/products/-/asset_publisher/5dKm/content/bdii-top-2

Steps to Reproduce:
1. Install a top BDII on CentOS 6.7
2. wait for it to crash


Additional info:

An ABRT report was created, but it's too big to attach here. Here is the URL:

http://www.nikhef.nl/~dennisvd/ccpp-2015-08-27-11:10:14-27199.tar.gz

Comment 2 Andrea 2015-10-06 08:47:27 UTC
As now a new version of openldap-servers

openldap-servers-2.4.40-6.el6

has been released as a security fix, this issue is becoming quite urgent to be analysed, cause the new version of openldap-servers  is going to be automatically installed as a security update on all our installations.

could you please check this problem ASAP?

Comment 3 Andreas Haupt 2015-10-06 10:18:56 UTC
A recent (bzipped) core dump (with openldap-servers-2.4.40-6.el6):

https://desycloud.desy.de/public.php?service=files&t=4bef5af0a89f1d9d73011fbc44907bb9&download

Comment 4 Matus Honek 2015-10-06 10:37:59 UTC
I went through the ABRT report and there is an openldap config file missing (I presume this is due to non-standard location); it should be in /etc/bdii according to the command ran.
What more, I am even quite confused what the software you mention (BDII-top) is or does. It might also, quite likely, depend on the data that are transferred which depend on the software's configuration itself (which I do not happen to be able to find in the sosreport).

Comment 5 Andrea 2015-10-06 13:15:09 UTC
Created attachment 1080241 [details]
slapd conf

This is a standard slapd configuration that we have in our top bdii

Comment 6 Dennis van Dok 2015-10-06 13:34:53 UTC
To eleborate a bit: the BDII is a software package used in the scientific grid computing communities to collect information about available grid resources around the world. The underlying implementation is an openldap directory. There is a hierarchy of BDIIs holding information from local resources, local sites, to global. The last category is called the 'top BDII' and is the largest as it holds all information. There are dozens of top BDIIs around the world that contain the same information.

More background here: http://gridinfo.web.cern.ch/information-system-sys-admins

One of the grid admins has already narrowed down the problem to the use of the relay backend, with the o=shadow vs. o=grid massaging. Removing that from the config seems to result in no more crashes.

Comment 7 Andrea 2015-10-08 14:16:53 UTC
Hi,
we have reports from slapd crashes only in case the slapd conf includes relay  databases with overlay rwm, so it looks like it does not depend on the suffixmassage. I'm going to attach the other configuration file which leads to a slapd crash.
thanks
Andrea

Comment 8 Andrea 2015-10-08 14:19:39 UTC
Created attachment 1081029 [details]
another slapd conf file generating crashes

Comment 9 Andrea 2015-10-14 14:27:57 UTC
Hi,
did you have the possibility to look at this problem after we attached the config file?
thanks
Andrea

Comment 10 Matus Honek 2015-10-14 18:30:08 UTC
Hi Andrea,

looks like I have found something. Thanks to a hint about back-relay and notably thanks to core dumps above, I have found an upstream commit in between versions 2.4.39 and 2.4.40 that introduces function that is called in the core dumps (which both show it called from back-relay). This function in some cases seems to call a pointer as a function which results in SIGSEGV (note, it is not a null pointer, so it got somehow mangled).

If I built packages that would lack this arbitrary function, would you be willing to test this for me? As this is a case of a search only, it should not be dangerous but, please, be aware this might break something. Best would be to use a testing environment of yours.

Thank you.

Comment 11 Andrea 2015-10-14 21:01:59 UTC
Hi Matus, 
thanks a lot for you effort. Yes if you provide us the packages we will be really happy to try them in our testing instances.

cheers
Andrea

Comment 12 Dennis van Dok 2015-10-14 21:13:13 UTC
Hi Matus,

yes, we can test this in a testbed setup. It would take some time to determine the stability, as the crash usually takes some time (between a few hours and a few days) to manifest.

Thanks,

Dennis

Comment 13 Andreas Haupt 2015-10-15 06:22:55 UTC
Hi Dennis,

I'm quite surprised it takes so long to get your BDII segfaulting. I can easily make it coredump by running a search query - like lcg-infosites (that's how I produced the core dump).

Cheers,
Andreas

Comment 14 Dennis van Dok 2015-10-15 14:01:37 UTC
(In reply to Andreas Haupt from comment #13)
> Hi Dennis,
> 
> I'm quite surprised it takes so long to get your BDII segfaulting. I can
> easily make it coredump by running a search query.

OK, I didn't try that. I just witness 'spontaneous' crashes but they will probably be due to queries run from the outside.

Comment 17 Oxana Smirnova 2015-10-23 17:14:42 UTC
(In reply to Matus Honek from comment #10)

Hi, I've got two questions/requests:

The bug severely affects our software (see http://bugzilla.nordugrid.org/show_bug.cgi?id=3504) so we'd love to test the potential fix as well. Where can one find the test build?

And another question: is there a reason why it is not reported to upstream? Or is it? (Couldn't find anything resembling it in the OpenLDAP ITS).

Cheers,
Oxana

Comment 19 Andrea 2015-11-09 13:02:23 UTC
Hi Matus,
do you have an update on this? we would like to understand if it's possible to have an rpm with the fix you have mentioned for testing. We are evaluating possible workarounds, but we are still stuck and and we cannot move our installations to openldap v2.4.40
thanks
cheers
Andrea

Comment 20 Andrea 2015-12-04 09:35:12 UTC
Hi all,
we got privately from Matus a new build of openldap with a workaround/fix to this problem.
We have installed it at CERN on a TopBDII and it looks fine

Dennis, Andreas if you have time could you also test the new rpms on your testing nodes ?

https://drive.google.com/file/d/0B0VkVqWTkgPjblczMkM4dWZPZkE/view?usp=sharing

thanks!
cheers
Andrea

Comment 21 Andreas Haupt 2015-12-04 13:54:05 UTC
Hi Andrea,

updated the packages on our test bdii node. Also reverted back to the original bdii configuration (with "o=shadow" relay db enabled again).

It looks really promising so far! I ran a couple of 'lcg-infosites' requests against the patched top-level bdii without any crashes. These resulted in slapd segfaults with the broken version.

Cheers,
Andreas

Comment 22 Andrea 2015-12-04 15:15:03 UTC
Hi  Andres
thanks a lot!
i got also confirmation from ARC that this build of openldap fixes the crash on ARC-CEs
installations

Matus do you think this patch can be released ? and how long it will take?

thanks!
cheers
Andrea

Comment 23 Andrea 2015-12-14 08:51:30 UTC
Hi Matus,
we have been testing the new rpms  quite intensively these days, and  we can definitely say that our services are working fine with that build of openldap.

Can we do something in order to speed up the integration and the release of that version by Red Hat?
thanks
cheers
Andrea

Comment 24 Andrea 2016-01-19 12:48:36 UTC
Hello, 
sorry to bother again, are there any news regarding the integration and release of this change on openldap? 
related to this i have also seen that a new version of openldap for RHEL 7 (2.4.40-8.el7) is in CentOS 7 now, ( we are starting supporting this OS as well) i haven't tested it yet  so i don't know if this problem will appear also there but this change may be applied also on the openldap released in RHEL7.
thanks
cheers
Andrea

Comment 25 Matus Honek 2016-01-19 13:46:58 UTC
Hello Andrea,

I am sorry for not answering sooner. The fix is proposed for rhel-6.8 and should be included with it's release.
Also, I should clone this bugzilla for rhel-7.

Regards,
Matus

Comment 33 errata-xmlrpc 2016-05-11 00:59:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0943.html

Comment 34 Matus Honek 2016-05-30 12:31:40 UTC
*** Bug 1318904 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.