Hide Forgot
Description of problem: The problem occurs because of two threads trying to acquire locks in the opposite order Version-Release number of selected component (if applicable): Since 0.56.0-13 How reproducible: A high update rate with an high number of updates to trim in the retroCL Steps to Reproduce: 1. disable retroCL trim (maxage=-1) 2. fill a high number of dummy updates (to have lot of records in retroCL) 3. enable retroCL triming let's say with 1h interval and a maxage=5s to give time to apply load (step 4) and have a lot of entries to trim 4. apply dummy updates load Actual results: sometime it deadlocks Expected results: Should not deadlock Additional info:
This bug is a side effect 1435663. 1435663 acquires the slapi-nis lock from bepreop --> bepostop, so it is held during possible updates triggered by betxn plugins.
Slapi-nis deadlock can happen between two threads when sync-repl is enabled. It happens when a retroCL trimming thread is running at the same time that an update is registered in the retroCL. An example is Thread 19 (Thread 0x7ff6e88e7700 (LWP 27945)): #0 0x00007ff775e98a35 in pthread_cond_wait@@GLIBC_2.3.2 (..) at /lib64/libpthread.so.0 #1 0x00007ff7764ef483 in PR_EnterMonitor (..) at /lib64/libnspr4.so #2 0x00007ff76a346f10 in dblayer_lock_backend (..) at ldap/servers/slapd/back-ldbm/dblayer.c:3537 #3 0x00007ff76a34c046 in dblayer_txn_begin (..) at ldap/servers/slapd/back-ldbm/dblayer.c:3255 #4 0x00007ff76a36cc71 in ldbm_back_add (..) at ldap/servers/slapd/back-ldbm/ldbm_add.c:268 #5 0x00007ff7786e47c1 in op_shared_add (..) at ldap/servers/slapd/add.c:679 #6 0x00007ff7786e4d39 in add_internal_pb (..) at ldap/servers/slapd/add.c:407 #7 0x00007ff7786e5a71 in slapi_add_internal_pb (..) at ldap/servers/slapd/add.c:332 #8 0x00007ff7689e1b13 in write_replog_db (..) at ldap/servers/plugins/retrocl/retrocl_po.c:364 #9 0x00007ff7689e1b13 in retrocl_postob (..) at ldap/servers/plugins/retrocl/retrocl_po.c:682 #10 0x00007ff778741028 in plugin_call_func (..) at ldap/servers/slapd/plugin.c:2028 #11 0x00007ff7787412e3 in plugin_call_list (..) at ldap/servers/slapd/plugin.c:1972 #12 0x00007ff7787412e3 in plugin_call_plugins (..) at ldap/servers/slapd/plugin.c:442 #13 0x00007ff76a388ac5 in ldbm_back_modify (..) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:861 #14 0x00007ff77872d5a6 in op_shared_modify (..) at ldap/servers/slapd/modify.c:1021 #15 0x00007ff77872eacb in do_modify (..) at ldap/servers/slapd/modify.c:391 #16 0x0000561a8a0a2684 in connection_dispatch_operation (..) at ldap/servers/slapd/connection.c:625 #17 0x0000561a8a0a2684 in connection_threadmain (..) at ldap/servers/slapd/connection.c:1788 #18 0x00007ff7764f4bfb in _pt_root (..) at /lib64/libnspr4.so #19 0x00007ff775e94ea5 in start_thread (..) at /lib64/libpthread.so.0 #20 0x00007ff7755408cd in clone at /lib64/libc.so.6 Thread 4 (Thread 0x7ff778dbb700 (LWP 28477)): #0 0x00007ff775e9839e in pthread_rwlock_wrlock (..) at /lib64/libpthread.so.0 #1 0x00007ff7683a7639 in backend_be_pre_write_cb (..) at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #2 0x00007ff778741028 in plugin_call_func (..) at ldap/servers/slapd/plugin.c:2028 #3 0x00007ff7787412e3 in plugin_call_list (..) at ldap/servers/slapd/plugin.c:1972 #4 0x00007ff7787412e3 in plugin_call_plugins (..) at ldap/servers/slapd/plugin.c:442 #5 0x00007ff76a37bc28 in ldbm_back_delete (..) at ldap/servers/slapd/back-ldbm/ldbm_delete.c:373 #6 0x00007ff7786f1feb in op_shared_delete (..) at ldap/servers/slapd/delete.c:324 #7 0x00007ff7786f21c8 in delete_internal_pb (..) at ldap/servers/slapd/delete.c:208 #8 0x00007ff7786f2443 in slapi_delete_internal_pb (..) at ldap/servers/slapd/delete.c:151 #9 0x00007ff7689e231e in delete_changerecord (..) at ldap/servers/plugins/retrocl/retrocl_trim.c:89 #10 0x00007ff7689e2605 in trim_changelog (..) at ldap/servers/plugins/retrocl/retrocl_trim.c:290 #11 0x00007ff7689e2605 in changelog_trim_thread_fn (..) at ldap/servers/plugins/retrocl/retrocl_trim.c:333 #12 0x00007ff7764f4bfb in _pt_root (..) at /lib64/libnspr4.so #13 0x00007ff775e94ea5 in start_thread (..) at /lib64/libpthread.so.0 #14 0x00007ff7755408cd in clone (..) at /lib64/libc.so.6 RetroCL being a BETXN_POSTOP, thread 19 holds the slapi-nis map rwlock (write) and then waits for retroCL backend lock. Thread 4 holds retroCL backend lock and then waits for slapi-nis map rwlock (write) The two threads are trying to acquire the locks in the opposite order
As a side effect of 1435663, this bug exists since 0.56.0-12
A valid workaround is to disable retroCL trimming dn: cn=Retro Changelog Plugin,cn=plugins,cn=config changetype: modify replace: nsslapd-changelogmaxage nsslapd-changelogmaxage: -1
Thank you taking your time and submitting this request for Red Hat Enterprise Linux 7. Unfortunately, this bug cannot be kept even as a stretch goal and was postponed to RHEL8.
patch being pushed upstream (0.56.4) -> POST
The patch is incomplete as like describe below => ASSIGNED Thread 18 (Thread 0x7f9378fff700 (LWP 774)): #0 0x00007f939141dd45 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x00007f939147cfcb in PR_EnterMonitor () at /lib64/libnspr4.so #2 0x00007f938cfa65fe in dblayer_txn_begin () at /usr/lib64/dirsrv/plugins/libback-ldbm.so #3 0x00007f938cfb7a03 in ldbm_back_add () at /usr/lib64/dirsrv/plugins/libback-ldbm.so #4 0x00007f9391709fb5 in op_shared_add () at /usr/lib64/dirsrv/libslapd.so.0 #5 0x00007f939170a42f in add_internal_pb () at /usr/lib64/dirsrv/libslapd.so.0 #6 0x00007f939170b1c7 in slapi_add_internal_pb () at /usr/lib64/dirsrv/libslapd.so.0 #7 0x00007f938ce1efc5 in retrocl_postob () at /usr/lib64/dirsrv/plugins/libretrocl-plugin.so #8 0x00007f939176cb09 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0 #9 0x00007f939176cd4b in plugin_call_plugins () at /usr/lib64/dirsrv/libslapd.so.0 #10 0x00007f938cfd43d0 in ldbm_back_modify () at /usr/lib64/dirsrv/plugins/libback-ldbm.so #11 0x00007f9391758ac7 in op_shared_modify () at /usr/lib64/dirsrv/libslapd.so.0 #12 0x00007f9391759b0f in do_modify () at /usr/lib64/dirsrv/libslapd.so.0 #13 0x0000558aeaf2d4ed in connection_threadmain () #14 0x00007f9391482b34 in _pt_root () at /lib64/libnspr4.so #15 0x00007f93914174e2 in start_thread () at /lib64/libpthread.so.0 #16 0x00007f9391291693 in clone () at /lib64/libc.so.6 Thread 43 (Thread 0x7f9380e4f700 (LWP 43762)): #0 0x00007f939141c1ec in pthread_rwlock_rdlock () at /lib64/libpthread.so.0 #1 0x00007f938cdf2bd1 in map_rdlock () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #2 0x00007f938cde0337 in backend_write_cb.isra.0.part () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #3 0x00007f938cde041c in backend_betxn_pre_write_cb () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #4 0x00007f939176cb09 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0 #5 0x00007f939176cd4b in plugin_call_plugins () at /usr/lib64/dirsrv/libslapd.so.0 #6 0x00007f938cfc6686 in ldbm_back_delete () at /usr/lib64/dirsrv/plugins/libback-ldbm.so #7 0x00007f939171960c in op_shared_delete () at /usr/lib64/dirsrv/libslapd.so.0 #8 0x00007f939171981e in delete_internal_pb () at /usr/lib64/dirsrv/libslapd.so.0 #9 0x00007f938ce1f83f in delete_changerecord () at /usr/lib64/dirsrv/plugins/libretrocl-plugin.so #10 0x00007f938ce1fa05 in changelog_trim_thread_fn () at /usr/lib64/dirsrv/plugins/libretrocl-plugin.so #11 0x00007f9391482b34 in _pt_root () at /lib64/libnspr4.so #12 0x00007f93914174e2 in start_thread () at /lib64/libpthread.so.0 #13 0x00007f9391291693 in clone () at /lib64/libc.so.6 RetroCL being a BETXN_POSTOP, thread 18 holds the slapi-nis map rwlock (write) and then waits for retroCL backend lock. Thread 43 holds retroCL backend lock and then waits for slapi-nis map rwlock (write). The two threads are trying to acquire the locks in the opposite order Thread 43 waits for slapi-nis map rwlock in BETXN callback (backend_betxn_pre_write_cb). The original deadlock occurred with the same pattern but with BE callback (backend_be_pre_write_cb) and the fix that applied to BE also applies to BETXN :(
Sanity check with the following 2 tests: version: slapi-nis-0.56.5-2.module+el8.3.0+6580+328a3362.x86_64 ipa-server-4.8.7-8.module+el8.3.0+7513+a375844a.x86_64 1. 2020-08-08T02:26:49 [ci-vm-10-0-106-77.ho] +-----------------------------------------------------------------------------------------+ 2020-08-08T02:26:49 [ci-vm-10-0-106-77.ho] Test:[/ipa-server/rhel80/ipa-nis-integration/root]: [ Pass(164/164): 100% ] 2020-08-08T02:26:49 [ci-vm-10-0-106-77.ho] +-----------------------------------------------------------------------------------------+ 2. 2020-08-08T02:09:26 [ci-vm-10-0-107-225.h] +-----------------------------------------------------------------------------------------+ 2020-08-08T02:09:26 [ci-vm-10-0-107-225.h] Test:[/ipa-server/rhel80/ds-migration-functional/root]: [ Pass(1/1): 100% ] 2020-08-08T02:09:26 [ci-vm-10-0-107-225.h] +-----------------------------------------------------------------------------------------+ 3. 2020-08-08T02:15:40+0000 [ci-vm-10-0-107-79.ho] +-----------------------------------------------------------------------------------------+ 2020-08-08T02:15:40+0000 [ci-vm-10-0-107-79.ho] Test:[/ipa-server/rhel80/ds-migration/root]: [ Pass(1/1): 100% ] 2020-08-08T02:15:40+0000 [ci-vm-10-0-107-79.ho] +-----------------------------------------------------------------------------------------+
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: idm:DL1 and idm:client security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4670