Bug 1466441

Summary: ns-slapd: segfault during server shutdown while there are cleanAllRuv tasks
Product: Red Hat Enterprise Linux 7 Reporter: Orion Poplawski <orion>
Component: 389-ds-baseAssignee: mreynolds
Status: CLOSED ERRATA QA Contact: RHDS QE <ds-qe-bugs>
Severity: high Docs Contact: Marc Muehlfeld <mmuehlfe>
Priority: unspecified    
Version: 7.3CC: bsmejkal, cobrown, dpal, glamb, gparente, hartsjc, jvilicic, lkrispen, mkosek, mreynolds, mrhodes, msauton, nkinder, orion, pasik, pkis, rharwood, rmeggins, spichugi, striker, tbordaz, tmihinto, vashirov
Target Milestone: rcKeywords: Reopened
Target Release: 7.7   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.9.1-7.el7 Doc Type: Bug Fix
Doc Text:
.Directory Server no longer crashes when shutting down the service while a `cleanAllRUV` task is running Previously, stopping the Directory Server service while a `cleanAllRUV` task was running freed resources the task was using. As a consequence, the service terminated unexpectedly. With this update, Directory Server increments a reference counter that enables the task to complete before the service shutdown process proceeds. As a result, the server no longer crashes in the mentioned scenario.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-06 12:58:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Orion Poplawski 2017-06-29 15:32:43 UTC
Description of problem:

This morning ns-slapd crashed with:

Jun 29 08:58:28 europa ns-slapd: ns-slapd: ../../../include/k5-thread.h:384: k5_mutex_lock: Assertion `r == 0' failed.

Version-Release number of selected component (if applicable):
389-ds-base-1.3.5.10-21.el7_3.x86_64

How reproducible:
Just once so far

Comment 2 Orion Poplawski 2017-06-29 15:36:38 UTC
Also, I think /usr/lib/systemd/system/dirsrv@.service would benefit from the addition of:

Restart=on-failure

or similar.

Comment 3 wibrown@redhat.com 2017-06-30 03:03:59 UTC
Hi,

That crash is in mit krb, not directory server. I'm going to re-assign the problem to them.

Additionally, we have talking about restart on-failure, and we have chosen not to use it due to concerns about systemd and issues like this. 

Thanks,

Comment 4 Orion Poplawski 2017-06-30 15:07:00 UTC
Happened again last evening:

Jun 29 17:26:25 europa ns-slapd: ns-slapd: ../../../include/k5-thread.h:384: k5_mutex_lock: Assertion `r == 0' failed.

Seems like restart on-failure would have at least got my ldap server back up and running instead of being down all evening.  I'll add it to /etc/sysconfig/dirsrv.systemd and see if it helps.

Recent changes:
Jun 29 04:21 kernel 3.10.0-514.26.1.el7
Jun 20 03:49:59 Updated: glibc.x86_64 2.17-157.el7_3.4

Comment 5 Orion Poplawski 2017-06-30 15:20:01 UTC
Looks like the trigger was the clock being ~281 seconds off.

Comment 6 Robbie Harwood 2017-06-30 15:41:05 UTC
k5_mutex_lock is a a very thin wrapper around pthread_mutex_lock that calls assert() if the latter returns nonzero.  Please 1) update to rhel-7.4 and then 2) provide me a full backtrace (coredump is preferred, but I understand if you don't want to give me that).

Comment 11 Robbie Harwood 2017-09-22 14:46:51 UTC
*** Bug 1478619 has been marked as a duplicate of this bug. ***

Comment 44 bsmejkal 2019-06-13 15:11:32 UTC
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.6.3, pytest-4.6.3, py-1.8.0, pluggy-0.12.0 -- /opt/rh/rh-python36/root/usr/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.6.3', 'Platform': 'Linux-3.10.0-1053.el7.x86_64-x86_64-with-redhat-7.7-Maipo', 'Packages': {'pytest': '4.6.3', 'py': '1.8.0', 'pluggy': '0.12.0'}, 'Plugins': {'metadata': '1.8.0', 'html': '1.20.0'}}
389-ds-base: 1.3.9.1-9.el7
nss: 3.44.0-4.el7
nspr: 4.21.0-1.el7
openldap: 2.4.44-21.el7_6
cyrus-sasl: 2.1.26-23.el7
FIPS: disabled
rootdir: /mnt/tests/rhds/tests/upstream/ds/dirsrvtests, inifile: pytest.ini
plugins: metadata-1.8.0, html-1.20.0
collected 9 items / 8 deselected / 1 selected                                                                                                                                                                     

cleanallruv_test.py::test_clean_shutdown_crash PASSED                                                                                                                                                       [100%]

============================================================================== 1 passed in 137.35 seconds =========================================================================================================


Marking as VERIFIED, SanityOnly

Comment 48 errata-xmlrpc 2019-08-06 12:58:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2152