Bug 1278585

Summary: deadlock in mep delete post op
Product: Red Hat Enterprise Linux 6 Reporter: Noriko Hosoi <nhosoi>
Component: 389-ds-baseAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED ERRATA QA Contact: Viktor Ashirov <vashirov>
Severity: urgent Docs Contact: Petr Bokoc <pbokoc>
Priority: urgent    
Version: 6.0CC: amsharma, jgalipea, msauton, nhosoi, nkinder, pbokoc, rmeggins, spichugi, sramling
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.2.11.15-68.el6 Doc Type: Bug Fix
Doc Text:
Fixed a deadlock during backend deletion in Directory Server Previously, transaction information was not passed to one of the database helper functions during backend deletion. This could result in a deadlock if a plug-in attempted to access data in the area locked by the transaction. With this update, transaction info is passed to all necessary database helper functions, and a deadlock no longer occurs in the described situation.
Story Points: ---
Clone Of:
: 1279564 (view as bug list) Environment:
Last Closed: 2016-05-10 19:22:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1279564    

Description Noriko Hosoi 2015-11-05 21:17:25 UTC
A deadlock occurs in a single thread processing a DEL operation. The deadlock is dynamic and there is no clear testcase.

The deadlock occurs in the database lock while the DEL operation under transaction holds a db_lock in write, then issue an internal DEL without the parent txn. The internal DEL needs the db_lock in read, but as it is not using the same transaction the lock is not granted


{{{
Thread 6 (Thread 0x7f52c1feb700 (LWP 17418)):
#0  0x00007f52f2c0c705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f52ed55d2f3 in __db_hybrid_mutex_suspend () from /lib64/libdb-5.3.so
#2  0x00007f52ed55c640 in __db_tas_mutex_lock () from /lib64/libdb-5.3.so
#3  0x00007f52ed606cea in __lock_get_internal () from /lib64/libdb-5.3.so
#4  0x00007f52ed6077d0 in __lock_get () from /lib64/libdb-5.3.so
#5  0x00007f52ed633112 in __db_lget () from /lib64/libdb-5.3.so
#6  0x00007f52ed57a5f5 in __bam_search () from /lib64/libdb-5.3.so
#7  0x00007f52ed565256 in __bamc_search () from /lib64/libdb-5.3.so
#8  0x00007f52ed566d0f in __bamc_get () from /lib64/libdb-5.3.so
#9  0x00007f52ed61fc56 in __dbc_iget () from /lib64/libdb-5.3.so
#10 0x00007f52ed62c843 in __db_get () from /lib64/libdb-5.3.so
#11 0x00007f52ed630123 in __db_get_pp () from /lib64/libdb-5.3.so
#12 0x00007f52e97535fb in id2entry (be=0x1cc48b0, id=id@entry=4, txn=txn@entry=0x0, err=err@entry=0x7f52c1fe6638) at ldap/servers/slapd/back-ldbm/id2entry.c:323
#13 0x00007f52e9777c20 in ldbm_back_delete (pb=0x7f5284061cf0) at ldap/servers/slapd/back-ldbm/ldbm_delete.c:463
#14 0x00007f52f4c1d380 in op_shared_delete (pb=pb@entry=0x7f5284061cf0) at ldap/servers/slapd/delete.c:364
#15 0x00007f52f4c1d532 in delete_internal_pb (pb=pb@entry=0x7f5284061cf0) at ldap/servers/slapd/delete.c:242
#16 0x00007f52f4c1d7e3 in slapi_delete_internal_pb (pb=pb@entry=0x7f5284061cf0) at ldap/servers/slapd/delete.c:185
#17 0x00007f52e907c3cd in mep_del_post_op (pb=<optimized out>) at ldap/servers/plugins/mep/mep.c:2556
#18 0x00007f52f4c638c5 in plugin_call_func (list=0x1bcae30, operation=operation@entry=563, pb=pb@entry=0x3184f90, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:1453
#19 0x00007f52f4c63a78 in plugin_call_list (pb=0x3184f90, operation=563, list=<optimized out>) at ldap/servers/slapd/plugin.c:1415
#20 plugin_call_plugins (pb=pb@entry=0x3184f90, whichfunction=whichfunction@entry=563) at ldap/servers/slapd/plugin.c:398
#21 0x00007f52e9776f07 in ldbm_back_delete (pb=0x3184f90) at ldap/servers/slapd/back-ldbm/ldbm_delete.c:1091
#22 0x00007f52f4c1d380 in op_shared_delete (pb=pb@entry=0x3184f90) at ldap/servers/slapd/delete.c:364
#23 0x00007f52f4c1d643 in do_delete (pb=0x3184f90) at ldap/servers/slapd/delete.c:128
#24 0x0000000000417688 in connection_dispatch_operation (pb=<optimized out>, op=0x2dbc8e0, conn=0x7f52e02b5190) at ldap/servers/slapd/connection.c:587
#25 connection_threadmain () at ldap/servers/slapd/connection.c:2372
#26 0x00007f52f32679eb in _pt_root () from /lib64/libnspr4.so
#27 0x00007f52f2c08df3 in start_thread () from /lib64/libpthread.so.0
#28 0x00007f52f293601d in clone () from /lib64/libc.so.6

}}}


The delete is done under backend-transcation and owns in write a id2entry page (2):

{{{

    800006b9 dd= 8 locks held 185  write locks 85   pid/thread 17379/139993418741504 flags 0    priority 100
    800006b9 READ         30 HELD    ipaca/ancestorid.db       page          1
    ...
    800006b9 WRITE         1 HELD    userRoot/id2entry.db      page          2
    ...

}}}


Then the internal delete is hanging waiting for this same page lock to do a read.


{{{
       2 dd=103 locks held 0    write locks 0    pid/thread 17379/139993418741504 flags 0    priority 100
       2 READ          1 WAIT    userRoot/id2entry.db      page          2

}}}


The test case I did is the following but I was only able to reproduce once:
 1 - install ipa server + replica
 2 - stop the replica (I reproduced without any replication session)
 3 - run the script lot_mods.sh
     note: I was testing on 1.3.1 where MOD/REPL kept the delete values in the entry. So the entry was growing. This explain why the test create/delete the test entry. The growth of the entry can help to reproduce. In 1.3.3 and after MOD/REPL do not keep deleted values. The deadlock occured during the 'ipa user-del'

Comment 1 Marc Sauton 2015-11-05 22:47:11 UTC
added "Internal Whiteboard" set to GSSApproved for rhel-6.8.0

Comment 4 Sankar Ramalingam 2015-11-24 11:45:26 UTC
Hi Noriko, where can I find lot_mods.sh file?

Comment 5 Amita Sharma 2015-11-24 11:52:30 UTC
(In reply to Sankar Ramalingam from comment #4)
> Hi Noriko, where can I find lot_mods.sh file?

Hi Sankar, you can use https://bugzilla.redhat.com/show_bug.cgi?id=1280210#c5 to verify bugzilla.

test case ticket47976_test.py is developed by Theirry for this fix.

Regards,
Ami

Comment 6 Simon Pichugin 2016-01-18 16:09:03 UTC
[root@qeos-22 ds]# rpm -qa | grep 389-ds-base
389-ds-base-libs-1.2.11.15-71.el6.x86_64
389-ds-base-1.2.11.15-71.el6.x86_64

We need to:
1. Download the patch from upstream ticket #47976. - https://fedorahosted.org/389/ticket/47976
2. Apply the patch
3. Run dirsrvtests/tickets/ticket47976_test.py

[root@qeos-22 ds]# py.test -v dirsrvtests/tickets/ticket47976_test.py
============================================================================================================= test session starts =============================================================================================================
platform linux2 -- Python 2.7.11, pytest-2.8.5, py-1.4.31, pluggy-0.3.1 -- /usr/local/miniconda/bin/python
cachedir: dirsrvtests/tickets/.cache
rootdir: /tmp/ds/dirsrvtests/tickets, inifile:
collected 4 items

dirsrvtests/tickets/ticket47976_test.py::test_ticket47976_init PASSED
dirsrvtests/tickets/ticket47976_test.py::test_ticket47976_1 PASSED
dirsrvtests/tickets/ticket47976_test.py::test_ticket47976_2 PASSED
dirsrvtests/tickets/ticket47976_test.py::test_ticket47976_3 PASSED

========================================================================================================= 4 passed in 118.91 seconds ==========================================================================================================

Comment 8 errata-xmlrpc 2016-05-10 19:22:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0737.html