Bug 1273550

Summary: Deadlock between two MODs on the same entry between entry cache and backend lock
Product: Red Hat Enterprise Linux 7 Reporter: Noriko Hosoi <nhosoi>
Component: 389-ds-baseAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED ERRATA QA Contact: Viktor Ashirov <vashirov>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: gparente, kbanerje, nkinder, rmeggins, tbordaz
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.5.2-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-03 20:33:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Noriko Hosoi 2015-10-20 16:49:36 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/47978

The deadlock occurs when two MODS target the same entry.
One MOD locks the entry in the entry cache (find_entry_internal ?) then tries to acquire the backend lock (in txn_begin).
The second MOD acquired the backend lock (txn_begin) and hangs while locking the entry in the entry cache (cache_lock_entry).

The deadlock occurs while doing performance measurement.
To do this performance and skip the IOs bottleneck, I tuned the :
        - nsslapd-threadnumber: 100
        - nsslapd-db-transaction-batch-val: 100
        - nsslapd-backend-opt-level: 7

This tuning helped to reproduce the hang but are not the cause of the hang.




Deadlock:

MOD ("cn=mr000006001,o=People,o=test_bis_create")
Thread 89 (Thread 0x7f84a97fa700 (LWP 13200)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f84d1118943 in PR_EnterMonitor (mon=0x7f84480064b0) at ../../../nspr/pr/src/pthreads/ptsynch.c:592
#2  0x00007f84c8a2795a in cache_lock_entry () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#3  0x00007f84c8a6bde0 in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#4  0x00007f84d2d1c8e1 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
#5  0x00007f84d2d1dc1f in do_modify () from /usr/lib64/dirsrv/libslapd.so.0
#6  0x00007f84d31fc3c1 in connection_threadmain ()
#7  0x00007f84d111de3b in _pt_root (arg=0x7f84d46f7b30) at ../../../nspr/pr/src/pthreads/ptthread.c:212
#8  0x00007f84d0abdee5 in start_thread (arg=0x7f84a97fa700) at pthread_create.c:309
#9  0x00007f84d07ecb8d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

MOD ("cn=mr000006001,o=People,o=test_bis_create")
Thread 87 (Thread 0x7f84a87f8700 (LWP 13202)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f84d1118943 in PR_EnterMonitor (mon=0x7f84d3feb9d0) at ../../../nspr/pr/src/pthreads/ptsynch.c:592
#2  0x00007f84c8a2c3b7 in dblayer_lock_backend () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#3  0x00007f84c8a311be in dblayer_txn_begin () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#4  0x00007f84c8a6ca77 in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#5  0x00007f84d2d1c8e1 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
#6  0x00007f84d2d1dc1f in do_modify () from /usr/lib64/dirsrv/libslapd.so.0
#7  0x00007f84d31fc3c1 in connection_threadmain ()
#8  0x00007f84d111de3b in _pt_root (arg=0x7f84d47321b0) at ../../../nspr/pr/src/pthreads/ptthread.c:212
#9  0x00007f84d0abdee5 in start_thread (arg=0x7f84a87f8700) at pthread_create.c:309


To reproduce:

On F20 - 389-DS 1.3.4
32 cores hardware machine: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
32 Gb memory

Install 389-DS 1.3.4 (master branch c3389a46c584fa39b2278a295f8b2b6dad726d31)
Create a suffix and trigger MOD update on low number of entries (using ldclt) so that several mods apply to the same entry

Comment 1 Noriko Hosoi 2015-10-27 16:08:04 UTC
*** Bug 1214459 has been marked as a duplicate of this bug. ***

Comment 2 Mike McCune 2016-03-28 23:12:48 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 4 Kamlesh 2016-08-31 12:24:27 UTC
Bug Verified.
[root@test ~]# rpm -qa | grep 389-ds*
389-ds-base-1.3.5.10-9.el7.x86_64
389-ds-base-libs-1.3.5.10-9.el7.x86_64

Steps Perform 

step 1)
Add 1000 users in the 

step 2) 
Set the value 
nsslapd-threadnumber: 100
nsslapd-db-transaction-batch-val: 100
nsslapd-backend-opt-level: 7
nsslapd-cachesize=10

step 3)
run the ldapsearch
 i=10; while [ $i > 10 ]; do ldapsearch -LLL -x -h localhost -p 389 -D "cn=directory manager" -w test1234 -b "ou=People,dc=example,dc=com" '(uid=33999)' dn sn mail givenName; i=`expr $i - 1`; done

 
Step 4)
run the ldapmodify on the entry 

#! /bin/bash
for ((  i=0 ; i<50000 ; i++ ))
do
ldapmodify -D "cn=Directory Manager" -p 389 -h localhost -w test1234 << EOF
dn: uid=33999,ou=People,dc=example,dc=com
changetype: modify
replace: sn
sn: change$i
-
replace: givenName
givenName: tll$i
EOF
done
----------

#! /bin/bash
for ((  i=0 ; i<50000 ; i++ ))
do
ldapmodify -D "cn=Directory Manager" -p 389 -h localhost -w test1234 << EOF
dn: uid=33999,ou=People,dc=example,dc=com 
changetype: modify
replace: mail
mail: change$i
EOF
done

------

Result 

modification succesful
Nothing fails

Comment 6 errata-xmlrpc 2016-11-03 20:33:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2594.html