Bug 1257543
Summary: | slapd crash in do_search | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Dennis van Dok <dennisvd> | ||||||
Component: | openldap | Assignee: | Matus Honek <mhonek> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Patrik Kis <pkis> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 6.7 | CC: | andrea.manzi, andreas.haupt, anrussel, dennisvd, mhonek, nkinder, oxana.smirnova, pkis, sgadekar, simon.fayer05, skremen | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | openldap-2.4.40-8.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1316450 (view as bug list) | Environment: | |||||||
Last Closed: | 2016-05-11 00:59:16 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1272422, 1316450, 1665441 | ||||||||
Attachments: |
|
Description
Dennis van Dok
2015-08-27 09:55:54 UTC
As now a new version of openldap-servers openldap-servers-2.4.40-6.el6 has been released as a security fix, this issue is becoming quite urgent to be analysed, cause the new version of openldap-servers is going to be automatically installed as a security update on all our installations. could you please check this problem ASAP? A recent (bzipped) core dump (with openldap-servers-2.4.40-6.el6): https://desycloud.desy.de/public.php?service=files&t=4bef5af0a89f1d9d73011fbc44907bb9&download I went through the ABRT report and there is an openldap config file missing (I presume this is due to non-standard location); it should be in /etc/bdii according to the command ran. What more, I am even quite confused what the software you mention (BDII-top) is or does. It might also, quite likely, depend on the data that are transferred which depend on the software's configuration itself (which I do not happen to be able to find in the sosreport). Created attachment 1080241 [details]
slapd conf
This is a standard slapd configuration that we have in our top bdii
To eleborate a bit: the BDII is a software package used in the scientific grid computing communities to collect information about available grid resources around the world. The underlying implementation is an openldap directory. There is a hierarchy of BDIIs holding information from local resources, local sites, to global. The last category is called the 'top BDII' and is the largest as it holds all information. There are dozens of top BDIIs around the world that contain the same information. More background here: http://gridinfo.web.cern.ch/information-system-sys-admins One of the grid admins has already narrowed down the problem to the use of the relay backend, with the o=shadow vs. o=grid massaging. Removing that from the config seems to result in no more crashes. Hi, we have reports from slapd crashes only in case the slapd conf includes relay databases with overlay rwm, so it looks like it does not depend on the suffixmassage. I'm going to attach the other configuration file which leads to a slapd crash. thanks Andrea Created attachment 1081029 [details]
another slapd conf file generating crashes
Hi, did you have the possibility to look at this problem after we attached the config file? thanks Andrea Hi Andrea, looks like I have found something. Thanks to a hint about back-relay and notably thanks to core dumps above, I have found an upstream commit in between versions 2.4.39 and 2.4.40 that introduces function that is called in the core dumps (which both show it called from back-relay). This function in some cases seems to call a pointer as a function which results in SIGSEGV (note, it is not a null pointer, so it got somehow mangled). If I built packages that would lack this arbitrary function, would you be willing to test this for me? As this is a case of a search only, it should not be dangerous but, please, be aware this might break something. Best would be to use a testing environment of yours. Thank you. Hi Matus, thanks a lot for you effort. Yes if you provide us the packages we will be really happy to try them in our testing instances. cheers Andrea Hi Matus, yes, we can test this in a testbed setup. It would take some time to determine the stability, as the crash usually takes some time (between a few hours and a few days) to manifest. Thanks, Dennis Hi Dennis, I'm quite surprised it takes so long to get your BDII segfaulting. I can easily make it coredump by running a search query - like lcg-infosites (that's how I produced the core dump). Cheers, Andreas (In reply to Andreas Haupt from comment #13) > Hi Dennis, > > I'm quite surprised it takes so long to get your BDII segfaulting. I can > easily make it coredump by running a search query. OK, I didn't try that. I just witness 'spontaneous' crashes but they will probably be due to queries run from the outside. (In reply to Matus Honek from comment #10) Hi, I've got two questions/requests: The bug severely affects our software (see http://bugzilla.nordugrid.org/show_bug.cgi?id=3504) so we'd love to test the potential fix as well. Where can one find the test build? And another question: is there a reason why it is not reported to upstream? Or is it? (Couldn't find anything resembling it in the OpenLDAP ITS). Cheers, Oxana Hi Matus, do you have an update on this? we would like to understand if it's possible to have an rpm with the fix you have mentioned for testing. We are evaluating possible workarounds, but we are still stuck and and we cannot move our installations to openldap v2.4.40 thanks cheers Andrea Hi all, we got privately from Matus a new build of openldap with a workaround/fix to this problem. We have installed it at CERN on a TopBDII and it looks fine Dennis, Andreas if you have time could you also test the new rpms on your testing nodes ? https://drive.google.com/file/d/0B0VkVqWTkgPjblczMkM4dWZPZkE/view?usp=sharing thanks! cheers Andrea Hi Andrea, updated the packages on our test bdii node. Also reverted back to the original bdii configuration (with "o=shadow" relay db enabled again). It looks really promising so far! I ran a couple of 'lcg-infosites' requests against the patched top-level bdii without any crashes. These resulted in slapd segfaults with the broken version. Cheers, Andreas Hi Andres thanks a lot! i got also confirmation from ARC that this build of openldap fixes the crash on ARC-CEs installations Matus do you think this patch can be released ? and how long it will take? thanks! cheers Andrea Hi Matus, we have been testing the new rpms quite intensively these days, and we can definitely say that our services are working fine with that build of openldap. Can we do something in order to speed up the integration and the release of that version by Red Hat? thanks cheers Andrea Hello, sorry to bother again, are there any news regarding the integration and release of this change on openldap? related to this i have also seen that a new version of openldap for RHEL 7 (2.4.40-8.el7) is in CentOS 7 now, ( we are starting supporting this OS as well) i haven't tested it yet so i don't know if this problem will appear also there but this change may be applied also on the openldap released in RHEL7. thanks cheers Andrea Hello Andrea, I am sorry for not answering sooner. The fix is proposed for rhel-6.8 and should be included with it's release. Also, I should clone this bugzilla for rhel-7. Regards, Matus Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0943.html *** Bug 1318904 has been marked as a duplicate of this bug. *** |