Bug 1278585 - deadlock in mep delete post op
deadlock in mep delete post op
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: 389-ds-base (Show other bugs)
6.0
Unspecified Unspecified
urgent Severity urgent
: rc
: ---
Assigned To: Noriko Hosoi
Viktor Ashirov
Petr Bokoc
: ZStream
Depends On:
Blocks: 1279564
  Show dependency treegraph
 
Reported: 2015-11-05 16:17 EST by Noriko Hosoi
Modified: 2016-05-10 15:22 EDT (History)
9 users (show)

See Also:
Fixed In Version: 389-ds-base-1.2.11.15-68.el6
Doc Type: Bug Fix
Doc Text:
Fixed a deadlock during backend deletion in Directory Server Previously, transaction information was not passed to one of the database helper functions during backend deletion. This could result in a deadlock if a plug-in attempted to access data in the area locked by the transaction. With this update, transaction info is passed to all necessary database helper functions, and a deadlock no longer occurs in the described situation.
Story Points: ---
Clone Of:
: 1279564 (view as bug list)
Environment:
Last Closed: 2016-05-10 15:22:05 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Noriko Hosoi 2015-11-05 16:17:25 EST
A deadlock occurs in a single thread processing a DEL operation. The deadlock is dynamic and there is no clear testcase.

The deadlock occurs in the database lock while the DEL operation under transaction holds a db_lock in write, then issue an internal DEL without the parent txn. The internal DEL needs the db_lock in read, but as it is not using the same transaction the lock is not granted


{{{
Thread 6 (Thread 0x7f52c1feb700 (LWP 17418)):
#0  0x00007f52f2c0c705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f52ed55d2f3 in __db_hybrid_mutex_suspend () from /lib64/libdb-5.3.so
#2  0x00007f52ed55c640 in __db_tas_mutex_lock () from /lib64/libdb-5.3.so
#3  0x00007f52ed606cea in __lock_get_internal () from /lib64/libdb-5.3.so
#4  0x00007f52ed6077d0 in __lock_get () from /lib64/libdb-5.3.so
#5  0x00007f52ed633112 in __db_lget () from /lib64/libdb-5.3.so
#6  0x00007f52ed57a5f5 in __bam_search () from /lib64/libdb-5.3.so
#7  0x00007f52ed565256 in __bamc_search () from /lib64/libdb-5.3.so
#8  0x00007f52ed566d0f in __bamc_get () from /lib64/libdb-5.3.so
#9  0x00007f52ed61fc56 in __dbc_iget () from /lib64/libdb-5.3.so
#10 0x00007f52ed62c843 in __db_get () from /lib64/libdb-5.3.so
#11 0x00007f52ed630123 in __db_get_pp () from /lib64/libdb-5.3.so
#12 0x00007f52e97535fb in id2entry (be=0x1cc48b0, id=id@entry=4, txn=txn@entry=0x0, err=err@entry=0x7f52c1fe6638) at ldap/servers/slapd/back-ldbm/id2entry.c:323
#13 0x00007f52e9777c20 in ldbm_back_delete (pb=0x7f5284061cf0) at ldap/servers/slapd/back-ldbm/ldbm_delete.c:463
#14 0x00007f52f4c1d380 in op_shared_delete (pb=pb@entry=0x7f5284061cf0) at ldap/servers/slapd/delete.c:364
#15 0x00007f52f4c1d532 in delete_internal_pb (pb=pb@entry=0x7f5284061cf0) at ldap/servers/slapd/delete.c:242
#16 0x00007f52f4c1d7e3 in slapi_delete_internal_pb (pb=pb@entry=0x7f5284061cf0) at ldap/servers/slapd/delete.c:185
#17 0x00007f52e907c3cd in mep_del_post_op (pb=<optimized out>) at ldap/servers/plugins/mep/mep.c:2556
#18 0x00007f52f4c638c5 in plugin_call_func (list=0x1bcae30, operation=operation@entry=563, pb=pb@entry=0x3184f90, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:1453
#19 0x00007f52f4c63a78 in plugin_call_list (pb=0x3184f90, operation=563, list=<optimized out>) at ldap/servers/slapd/plugin.c:1415
#20 plugin_call_plugins (pb=pb@entry=0x3184f90, whichfunction=whichfunction@entry=563) at ldap/servers/slapd/plugin.c:398
#21 0x00007f52e9776f07 in ldbm_back_delete (pb=0x3184f90) at ldap/servers/slapd/back-ldbm/ldbm_delete.c:1091
#22 0x00007f52f4c1d380 in op_shared_delete (pb=pb@entry=0x3184f90) at ldap/servers/slapd/delete.c:364
#23 0x00007f52f4c1d643 in do_delete (pb=0x3184f90) at ldap/servers/slapd/delete.c:128
#24 0x0000000000417688 in connection_dispatch_operation (pb=<optimized out>, op=0x2dbc8e0, conn=0x7f52e02b5190) at ldap/servers/slapd/connection.c:587
#25 connection_threadmain () at ldap/servers/slapd/connection.c:2372
#26 0x00007f52f32679eb in _pt_root () from /lib64/libnspr4.so
#27 0x00007f52f2c08df3 in start_thread () from /lib64/libpthread.so.0
#28 0x00007f52f293601d in clone () from /lib64/libc.so.6

}}}


The delete is done under backend-transcation and owns in write a id2entry page (2):

{{{

    800006b9 dd= 8 locks held 185  write locks 85   pid/thread 17379/139993418741504 flags 0    priority 100
    800006b9 READ         30 HELD    ipaca/ancestorid.db       page          1
    ...
    800006b9 WRITE         1 HELD    userRoot/id2entry.db      page          2
    ...

}}}


Then the internal delete is hanging waiting for this same page lock to do a read.


{{{
       2 dd=103 locks held 0    write locks 0    pid/thread 17379/139993418741504 flags 0    priority 100
       2 READ          1 WAIT    userRoot/id2entry.db      page          2

}}}


The test case I did is the following but I was only able to reproduce once:
 1 - install ipa server + replica
 2 - stop the replica (I reproduced without any replication session)
 3 - run the script lot_mods.sh
     note: I was testing on 1.3.1 where MOD/REPL kept the delete values in the entry. So the entry was growing. This explain why the test create/delete the test entry. The growth of the entry can help to reproduce. In 1.3.3 and after MOD/REPL do not keep deleted values. The deadlock occured during the 'ipa user-del'
Comment 1 Marc Sauton 2015-11-05 17:47:11 EST
added "Internal Whiteboard" set to GSSApproved for rhel-6.8.0
Comment 4 Sankar Ramalingam 2015-11-24 06:45:26 EST
Hi Noriko, where can I find lot_mods.sh file?
Comment 5 Amita Sharma 2015-11-24 06:52:30 EST
(In reply to Sankar Ramalingam from comment #4)
> Hi Noriko, where can I find lot_mods.sh file?

Hi Sankar, you can use https://bugzilla.redhat.com/show_bug.cgi?id=1280210#c5 to verify bugzilla.

test case ticket47976_test.py is developed by Theirry for this fix.

Regards,
Ami
Comment 6 Simon Pichugin 2016-01-18 11:09:03 EST
[root@qeos-22 ds]# rpm -qa | grep 389-ds-base
389-ds-base-libs-1.2.11.15-71.el6.x86_64
389-ds-base-1.2.11.15-71.el6.x86_64

We need to:
1. Download the patch from upstream ticket #47976. - https://fedorahosted.org/389/ticket/47976
2. Apply the patch
3. Run dirsrvtests/tickets/ticket47976_test.py

[root@qeos-22 ds]# py.test -v dirsrvtests/tickets/ticket47976_test.py
============================================================================================================= test session starts =============================================================================================================
platform linux2 -- Python 2.7.11, pytest-2.8.5, py-1.4.31, pluggy-0.3.1 -- /usr/local/miniconda/bin/python
cachedir: dirsrvtests/tickets/.cache
rootdir: /tmp/ds/dirsrvtests/tickets, inifile:
collected 4 items

dirsrvtests/tickets/ticket47976_test.py::test_ticket47976_init PASSED
dirsrvtests/tickets/ticket47976_test.py::test_ticket47976_1 PASSED
dirsrvtests/tickets/ticket47976_test.py::test_ticket47976_2 PASSED
dirsrvtests/tickets/ticket47976_test.py::test_ticket47976_3 PASSED

========================================================================================================= 4 passed in 118.91 seconds ==========================================================================================================
Comment 8 errata-xmlrpc 2016-05-10 15:22:05 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0737.html

Note You need to log in before you can comment on or make changes to this bug.