RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1751295 - When sync-repl is enabled, slapi-nis can deadlock during retrochanglog trimming
Summary: When sync-repl is enabled, slapi-nis can deadlock during retrochanglog trimming
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: slapi-nis
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: thierry bordaz
QA Contact: ipa-qe
URL:
Whiteboard:
Depends On:
Blocks: 1820124
TreeView+ depends on / blocked
 
Reported: 2019-09-11 16:15 UTC by thierry bordaz
Modified: 2023-12-15 16:48 UTC (History)
14 users (show)

Fixed In Version: slapi-nis-0.56.5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1820124 (view as bug list)
Environment:
Last Closed: 2020-11-04 02:50:13 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FREEIPA-10647 0 None None None 2023-12-15 16:48:01 UTC

Description thierry bordaz 2019-09-11 16:15:46 UTC
Description of problem:
The problem occurs because of two threads trying to acquire locks in the opposite order

Version-Release number of selected component (if applicable):
Since 0.56.0-13 

How reproducible:
A high update rate with an high number of updates to trim in the retroCL

Steps to Reproduce:
1. disable retroCL trim (maxage=-1)
2. fill a high number of dummy updates (to have lot of records in retroCL)
3. enable retroCL triming let's say with 1h interval and a maxage=5s to give time to apply load (step 4) and have a lot of entries to trim
4. apply dummy updates load

Actual results:
sometime it deadlocks


Expected results:
Should not deadlock

Additional info:

Comment 2 thierry bordaz 2019-09-11 16:22:18 UTC
This bug is a side effect 1435663. 1435663 acquires the slapi-nis lock from bepreop --> bepostop, so it is held during possible updates triggered by betxn plugins.

Comment 3 thierry bordaz 2019-09-11 16:23:01 UTC
    Slapi-nis deadlock can happen between two threads when sync-repl is enabled.
    It happens when a retroCL trimming thread is running at the same time that
    an update is registered in the retroCL.
    An example is
Thread 19 (Thread 0x7ff6e88e7700 (LWP 27945)):
#0  0x00007ff775e98a35 in pthread_cond_wait@@GLIBC_2.3.2 (..) at /lib64/libpthread.so.0
#1  0x00007ff7764ef483 in PR_EnterMonitor (..) at /lib64/libnspr4.so
#2  0x00007ff76a346f10 in dblayer_lock_backend (..) at ldap/servers/slapd/back-ldbm/dblayer.c:3537
#3  0x00007ff76a34c046 in dblayer_txn_begin (..) at ldap/servers/slapd/back-ldbm/dblayer.c:3255
#4  0x00007ff76a36cc71 in ldbm_back_add (..) at ldap/servers/slapd/back-ldbm/ldbm_add.c:268
#5  0x00007ff7786e47c1 in op_shared_add (..) at ldap/servers/slapd/add.c:679
#6  0x00007ff7786e4d39 in add_internal_pb (..) at ldap/servers/slapd/add.c:407
#7  0x00007ff7786e5a71 in slapi_add_internal_pb (..) at ldap/servers/slapd/add.c:332
#8  0x00007ff7689e1b13 in write_replog_db (..) at ldap/servers/plugins/retrocl/retrocl_po.c:364
#9  0x00007ff7689e1b13 in retrocl_postob (..) at ldap/servers/plugins/retrocl/retrocl_po.c:682
#10 0x00007ff778741028 in plugin_call_func (..) at ldap/servers/slapd/plugin.c:2028
#11 0x00007ff7787412e3 in plugin_call_list (..) at ldap/servers/slapd/plugin.c:1972
#12 0x00007ff7787412e3 in plugin_call_plugins (..) at ldap/servers/slapd/plugin.c:442
#13 0x00007ff76a388ac5 in ldbm_back_modify (..) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:861
#14 0x00007ff77872d5a6 in op_shared_modify (..) at ldap/servers/slapd/modify.c:1021                                        
#15 0x00007ff77872eacb in do_modify (..) at ldap/servers/slapd/modify.c:391
#16 0x0000561a8a0a2684 in connection_dispatch_operation (..) at ldap/servers/slapd/connection.c:625
#17 0x0000561a8a0a2684 in connection_threadmain (..) at ldap/servers/slapd/connection.c:1788
#18 0x00007ff7764f4bfb in _pt_root (..) at /lib64/libnspr4.so
#19 0x00007ff775e94ea5 in start_thread (..) at /lib64/libpthread.so.0
#20 0x00007ff7755408cd in clone  at /lib64/libc.so.6

Thread 4 (Thread 0x7ff778dbb700 (LWP 28477)):
#0  0x00007ff775e9839e in pthread_rwlock_wrlock (..) at /lib64/libpthread.so.0
#1  0x00007ff7683a7639 in backend_be_pre_write_cb (..) at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x00007ff778741028 in plugin_call_func (..) at ldap/servers/slapd/plugin.c:2028
#3  0x00007ff7787412e3 in plugin_call_list (..) at ldap/servers/slapd/plugin.c:1972
#4  0x00007ff7787412e3 in plugin_call_plugins (..) at ldap/servers/slapd/plugin.c:442
#5  0x00007ff76a37bc28 in ldbm_back_delete (..) at ldap/servers/slapd/back-ldbm/ldbm_delete.c:373
#6  0x00007ff7786f1feb in op_shared_delete (..) at ldap/servers/slapd/delete.c:324
#7  0x00007ff7786f21c8 in delete_internal_pb (..) at ldap/servers/slapd/delete.c:208
#8  0x00007ff7786f2443 in slapi_delete_internal_pb (..) at ldap/servers/slapd/delete.c:151
#9  0x00007ff7689e231e in delete_changerecord (..) at ldap/servers/plugins/retrocl/retrocl_trim.c:89
#10 0x00007ff7689e2605 in trim_changelog (..) at ldap/servers/plugins/retrocl/retrocl_trim.c:290
#11 0x00007ff7689e2605 in changelog_trim_thread_fn (..) at ldap/servers/plugins/retrocl/retrocl_trim.c:333
#12 0x00007ff7764f4bfb in _pt_root (..) at /lib64/libnspr4.so
#13 0x00007ff775e94ea5 in start_thread (..) at /lib64/libpthread.so.0
#14 0x00007ff7755408cd in clone (..) at /lib64/libc.so.6

    RetroCL being a BETXN_POSTOP, thread 19 holds the slapi-nis map rwlock (write) and then waits for retroCL backend lock.
    Thread 4 holds retroCL backend lock and then waits for slapi-nis map rwlock (write)
    The two threads are trying to acquire the locks in the opposite order

Comment 4 thierry bordaz 2019-09-11 16:24:12 UTC
As a side effect of 1435663, this bug exists since 0.56.0-12

Comment 5 thierry bordaz 2019-09-12 07:40:45 UTC
A valid workaround is to disable retroCL trimming

dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
changetype: modify
replace: nsslapd-changelogmaxage
nsslapd-changelogmaxage: -1

Comment 10 Florence Blanc-Renaud 2020-02-14 15:42:08 UTC
Thank you taking your time and submitting this request for Red Hat Enterprise Linux 7. Unfortunately, this bug cannot be kept even as a stretch goal and was postponed to RHEL8.

Comment 11 thierry bordaz 2020-02-21 16:39:43 UTC
patch being pushed upstream (0.56.4) -> POST

Comment 14 thierry bordaz 2020-03-13 10:45:00 UTC
The patch is incomplete as like describe below => ASSIGNED


Thread 18 (Thread 0x7f9378fff700 (LWP 774)):
#0  0x00007f939141dd45 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x00007f939147cfcb in PR_EnterMonitor () at /lib64/libnspr4.so
#2  0x00007f938cfa65fe in dblayer_txn_begin () at /usr/lib64/dirsrv/plugins/libback-ldbm.so
#3  0x00007f938cfb7a03 in ldbm_back_add () at /usr/lib64/dirsrv/plugins/libback-ldbm.so
#4  0x00007f9391709fb5 in op_shared_add () at /usr/lib64/dirsrv/libslapd.so.0
#5  0x00007f939170a42f in add_internal_pb () at /usr/lib64/dirsrv/libslapd.so.0
#6  0x00007f939170b1c7 in slapi_add_internal_pb () at /usr/lib64/dirsrv/libslapd.so.0
#7  0x00007f938ce1efc5 in retrocl_postob () at /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
#8  0x00007f939176cb09 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0
#9  0x00007f939176cd4b in plugin_call_plugins () at /usr/lib64/dirsrv/libslapd.so.0
#10 0x00007f938cfd43d0 in ldbm_back_modify () at /usr/lib64/dirsrv/plugins/libback-ldbm.so
#11 0x00007f9391758ac7 in op_shared_modify () at /usr/lib64/dirsrv/libslapd.so.0
#12 0x00007f9391759b0f in do_modify () at /usr/lib64/dirsrv/libslapd.so.0
#13 0x0000558aeaf2d4ed in connection_threadmain ()
#14 0x00007f9391482b34 in _pt_root () at /lib64/libnspr4.so
#15 0x00007f93914174e2 in start_thread () at /lib64/libpthread.so.0
#16 0x00007f9391291693 in clone () at /lib64/libc.so.6 

Thread 43 (Thread 0x7f9380e4f700 (LWP 43762)):
#0  0x00007f939141c1ec in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x00007f938cdf2bd1 in map_rdlock () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x00007f938cde0337 in backend_write_cb.isra.0.part () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#3  0x00007f938cde041c in backend_betxn_pre_write_cb () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#4  0x00007f939176cb09 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0
#5  0x00007f939176cd4b in plugin_call_plugins () at /usr/lib64/dirsrv/libslapd.so.0
#6  0x00007f938cfc6686 in ldbm_back_delete () at /usr/lib64/dirsrv/plugins/libback-ldbm.so
#7  0x00007f939171960c in op_shared_delete () at /usr/lib64/dirsrv/libslapd.so.0
#8  0x00007f939171981e in delete_internal_pb () at /usr/lib64/dirsrv/libslapd.so.0
#9  0x00007f938ce1f83f in delete_changerecord () at /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
#10 0x00007f938ce1fa05 in changelog_trim_thread_fn () at /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
#11 0x00007f9391482b34 in _pt_root () at /lib64/libnspr4.so
#12 0x00007f93914174e2 in start_thread () at /lib64/libpthread.so.0
#13 0x00007f9391291693 in clone () at /lib64/libc.so.6 

    RetroCL being a BETXN_POSTOP, thread 18 holds the slapi-nis map rwlock (write) and then waits for retroCL backend lock.
    Thread 43 holds retroCL backend lock and then waits for slapi-nis map rwlock (write).
    The two threads are trying to acquire the locks in the opposite order

   Thread 43 waits for slapi-nis map rwlock in BETXN callback (backend_betxn_pre_write_cb).
   The original deadlock occurred with the same pattern but with BE callback (backend_be_pre_write_cb) and the fix that applied to BE also applies to BETXN :(

Comment 24 Mohammad Rizwan 2020-08-10 07:39:23 UTC
Sanity check with the following 2 tests:

version:
slapi-nis-0.56.5-2.module+el8.3.0+6580+328a3362.x86_64
ipa-server-4.8.7-8.module+el8.3.0+7513+a375844a.x86_64

1.
2020-08-08T02:26:49 [ci-vm-10-0-106-77.ho]  +-----------------------------------------------------------------------------------------+
2020-08-08T02:26:49 [ci-vm-10-0-106-77.ho]      Test:[/ipa-server/rhel80/ipa-nis-integration/root]: [ Pass(164/164): 100% ] 
2020-08-08T02:26:49 [ci-vm-10-0-106-77.ho]  +-----------------------------------------------------------------------------------------+

2.
2020-08-08T02:09:26 [ci-vm-10-0-107-225.h]  +-----------------------------------------------------------------------------------------+
2020-08-08T02:09:26 [ci-vm-10-0-107-225.h]      Test:[/ipa-server/rhel80/ds-migration-functional/root]: [ Pass(1/1): 100% ] 
2020-08-08T02:09:26 [ci-vm-10-0-107-225.h]  +-----------------------------------------------------------------------------------------+

3.
2020-08-08T02:15:40+0000 [ci-vm-10-0-107-79.ho]  +-----------------------------------------------------------------------------------------+
2020-08-08T02:15:40+0000 [ci-vm-10-0-107-79.ho]      Test:[/ipa-server/rhel80/ds-migration/root]: [ Pass(1/1): 100% ] 
2020-08-08T02:15:40+0000 [ci-vm-10-0-107-79.ho]  +-----------------------------------------------------------------------------------------+

Comment 28 errata-xmlrpc 2020-11-04 02:50:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: idm:DL1 and idm:client security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4670


Note You need to log in before you can comment on or make changes to this bug.