This bug is created as a clone of upstream ticket: https://fedorahosted.org/389/ticket/47976 A deadlock occurs in a single thread processing a DEL operation. The deadlock is dynamic and there is no clear testcase. The deadlock occurs in the database lock while the DEL operation under transaction holds a db_lock in write, then issue an internal DEL without the parent txn. The internal DEL needs the db_lock in read, but as it is not using the same transaction the lock is not granted Thread 6 (Thread 0x7f52c1feb700 (LWP 17418)): #0 0x00007f52f2c0c705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f52ed55d2f3 in __db_hybrid_mutex_suspend () from /lib64/libdb-5.3.so #2 0x00007f52ed55c640 in __db_tas_mutex_lock () from /lib64/libdb-5.3.so #3 0x00007f52ed606cea in __lock_get_internal () from /lib64/libdb-5.3.so #4 0x00007f52ed6077d0 in __lock_get () from /lib64/libdb-5.3.so #5 0x00007f52ed633112 in __db_lget () from /lib64/libdb-5.3.so #6 0x00007f52ed57a5f5 in __bam_search () from /lib64/libdb-5.3.so #7 0x00007f52ed565256 in __bamc_search () from /lib64/libdb-5.3.so #8 0x00007f52ed566d0f in __bamc_get () from /lib64/libdb-5.3.so #9 0x00007f52ed61fc56 in __dbc_iget () from /lib64/libdb-5.3.so #10 0x00007f52ed62c843 in __db_get () from /lib64/libdb-5.3.so #11 0x00007f52ed630123 in __db_get_pp () from /lib64/libdb-5.3.so #12 0x00007f52e97535fb in id2entry (be=0x1cc48b0, id=id@entry=4, txn=txn@entry=0x0, err=err@entry=0x7f52c1fe6638) at ldap/servers/slapd/back-ldbm/id2entry.c:323 #13 0x00007f52e9777c20 in ldbm_back_delete (pb=0x7f5284061cf0) at ldap/servers/slapd/back-ldbm/ldbm_delete.c:463 #14 0x00007f52f4c1d380 in op_shared_delete (pb=pb@entry=0x7f5284061cf0) at ldap/servers/slapd/delete.c:364 #15 0x00007f52f4c1d532 in delete_internal_pb (pb=pb@entry=0x7f5284061cf0) at ldap/servers/slapd/delete.c:242 #16 0x00007f52f4c1d7e3 in slapi_delete_internal_pb (pb=pb@entry=0x7f5284061cf0) at ldap/servers/slapd/delete.c:185 #17 0x00007f52e907c3cd in mep_del_post_op (pb=<optimized out>) at ldap/servers/plugins/mep/mep.c:2556 #18 0x00007f52f4c638c5 in plugin_call_func (list=0x1bcae30, operation=operation@entry=563, pb=pb@entry=0x3184f90, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:1453 #19 0x00007f52f4c63a78 in plugin_call_list (pb=0x3184f90, operation=563, list=<optimized out>) at ldap/servers/slapd/plugin.c:1415 #20 plugin_call_plugins (pb=pb@entry=0x3184f90, whichfunction=whichfunction@entry=563) at ldap/servers/slapd/plugin.c:398 #21 0x00007f52e9776f07 in ldbm_back_delete (pb=0x3184f90) at ldap/servers/slapd/back-ldbm/ldbm_delete.c:1091 #22 0x00007f52f4c1d380 in op_shared_delete (pb=pb@entry=0x3184f90) at ldap/servers/slapd/delete.c:364 #23 0x00007f52f4c1d643 in do_delete (pb=0x3184f90) at ldap/servers/slapd/delete.c:128 #24 0x0000000000417688 in connection_dispatch_operation (pb=<optimized out>, op=0x2dbc8e0, conn=0x7f52e02b5190) at ldap/servers/slapd/connection.c:587 #25 connection_threadmain () at ldap/servers/slapd/connection.c:2372 #26 0x00007f52f32679eb in _pt_root () from /lib64/libnspr4.so #27 0x00007f52f2c08df3 in start_thread () from /lib64/libpthread.so.0 #28 0x00007f52f293601d in clone () from /lib64/libc.so.6 The delete is done under backend-transcation and owns in write a id2entry page (2): 800006b9 dd= 8 locks held 185 write locks 85 pid/thread 17379/139993418741504 flags 0 priority 100 800006b9 READ 30 HELD ipaca/ancestorid.db page 1 ... 800006b9 WRITE 1 HELD userRoot/id2entry.db page 2 ... Then the internal delete is hanging waiting for this same page lock to do a read. 2 dd=103 locks held 0 write locks 0 pid/thread 17379/139993418741504 flags 0 priority 100 2 READ 1 WAIT userRoot/id2entry.db page 2 The test case I did is the following but I was only able to reproduce once: 1 - install ipa server + replica 2 - stop the replica (I reproduced without any replication session) 3 - run the script lot_mods.sh note: I was testing on 1.3.1 where MOD/REPL kept the delete values in the entry. So the entry was growing. This explain why the test create/delete the test entry. The growth of the entry can help to reproduce. In 1.3.3 and after MOD/REPL do not keep deleted values. The deadlock occured during the 'ipa user-del'
added "Internal Whiteboard" set to GSSApproved for rhel-7.2.z
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
Build tested: 389-ds-base-1.3.5.10-5.el7.x86_64 CI test can be found at the upstream repository: dirsrvtests/tickets/ticket47976_test.py ========================== test session starts ========================== platform linux2 -- Python 2.7.5, pytest-2.9.2, py-1.4.31, pluggy-0.3.1 -- /usr/bin/python cachedir: tickets/.cache DS build: 1.3.5.10 B2016.197.39 389-ds-base: 1.3.5.10-5.el7 nss: 3.21.0-17.el7 nspr: 4.11.0-1.el7_2 openldap: 2.4.40-12.el7 svrcore: 4.1.2-1.el7 rootdir: /export/tests/tickets, inifile: plugins: beakerlib-0.5, html-1.9.0, cov-2.3.0 collected 4 items tickets/ticket47976_test.py::test_ticket47976_init PASSED tickets/ticket47976_test.py::test_ticket47976_1 PASSED tickets/ticket47976_test.py::test_ticket47976_2 PASSED tickets/ticket47976_test.py::test_ticket47976_3 PASSED ======================= 4 passed in 30.17 seconds ======================= Marking as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2594.html