Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1273550 - Deadlock between two MODs on the same entry between entry cache and backend lock
Deadlock between two MODs on the same entry between entry cache and backend lock
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base (Show other bugs)
7.0
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Noriko Hosoi
Viktor Ashirov
:
: 1214459 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-20 12:49 EDT by Noriko Hosoi
Modified: 2016-11-03 16:33 EDT (History)
5 users (show)

See Also:
Fixed In Version: 389-ds-base-1.3.5.2-1.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-03 16:33:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2594 normal SHIPPED_LIVE Moderate: 389-ds-base security, bug fix, and enhancement update 2016-11-03 08:11:08 EDT

  None (edit)
Description Noriko Hosoi 2015-10-20 12:49:36 EDT
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/47978

The deadlock occurs when two MODS target the same entry.
One MOD locks the entry in the entry cache (find_entry_internal ?) then tries to acquire the backend lock (in txn_begin).
The second MOD acquired the backend lock (txn_begin) and hangs while locking the entry in the entry cache (cache_lock_entry).

The deadlock occurs while doing performance measurement.
To do this performance and skip the IOs bottleneck, I tuned the :
        - nsslapd-threadnumber: 100
        - nsslapd-db-transaction-batch-val: 100
        - nsslapd-backend-opt-level: 7

This tuning helped to reproduce the hang but are not the cause of the hang.




Deadlock:

MOD ("cn=mr000006001,o=People,o=test_bis_create")
Thread 89 (Thread 0x7f84a97fa700 (LWP 13200)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f84d1118943 in PR_EnterMonitor (mon=0x7f84480064b0) at ../../../nspr/pr/src/pthreads/ptsynch.c:592
#2  0x00007f84c8a2795a in cache_lock_entry () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#3  0x00007f84c8a6bde0 in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#4  0x00007f84d2d1c8e1 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
#5  0x00007f84d2d1dc1f in do_modify () from /usr/lib64/dirsrv/libslapd.so.0
#6  0x00007f84d31fc3c1 in connection_threadmain ()
#7  0x00007f84d111de3b in _pt_root (arg=0x7f84d46f7b30) at ../../../nspr/pr/src/pthreads/ptthread.c:212
#8  0x00007f84d0abdee5 in start_thread (arg=0x7f84a97fa700) at pthread_create.c:309
#9  0x00007f84d07ecb8d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

MOD ("cn=mr000006001,o=People,o=test_bis_create")
Thread 87 (Thread 0x7f84a87f8700 (LWP 13202)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f84d1118943 in PR_EnterMonitor (mon=0x7f84d3feb9d0) at ../../../nspr/pr/src/pthreads/ptsynch.c:592
#2  0x00007f84c8a2c3b7 in dblayer_lock_backend () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#3  0x00007f84c8a311be in dblayer_txn_begin () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#4  0x00007f84c8a6ca77 in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#5  0x00007f84d2d1c8e1 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
#6  0x00007f84d2d1dc1f in do_modify () from /usr/lib64/dirsrv/libslapd.so.0
#7  0x00007f84d31fc3c1 in connection_threadmain ()
#8  0x00007f84d111de3b in _pt_root (arg=0x7f84d47321b0) at ../../../nspr/pr/src/pthreads/ptthread.c:212
#9  0x00007f84d0abdee5 in start_thread (arg=0x7f84a87f8700) at pthread_create.c:309


To reproduce:

On F20 - 389-DS 1.3.4
32 cores hardware machine: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
32 Gb memory

Install 389-DS 1.3.4 (master branch c3389a46c584fa39b2278a295f8b2b6dad726d31)
Create a suffix and trigger MOD update on low number of entries (using ldclt) so that several mods apply to the same entry
Comment 1 Noriko Hosoi 2015-10-27 12:08:04 EDT
*** Bug 1214459 has been marked as a duplicate of this bug. ***
Comment 2 Mike McCune 2016-03-28 19:12:48 EDT
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions
Comment 4 Kamlesh 2016-08-31 08:24:27 EDT
Bug Verified.
[root@test ~]# rpm -qa | grep 389-ds*
389-ds-base-1.3.5.10-9.el7.x86_64
389-ds-base-libs-1.3.5.10-9.el7.x86_64

Steps Perform 

step 1)
Add 1000 users in the 

step 2) 
Set the value 
nsslapd-threadnumber: 100
nsslapd-db-transaction-batch-val: 100
nsslapd-backend-opt-level: 7
nsslapd-cachesize=10

step 3)
run the ldapsearch
 i=10; while [ $i > 10 ]; do ldapsearch -LLL -x -h localhost -p 389 -D "cn=directory manager" -w test1234 -b "ou=People,dc=example,dc=com" '(uid=33999)' dn sn mail givenName; i=`expr $i - 1`; done

 
Step 4)
run the ldapmodify on the entry 

#! /bin/bash
for ((  i=0 ; i<50000 ; i++ ))
do
ldapmodify -D "cn=Directory Manager" -p 389 -h localhost -w test1234 << EOF
dn: uid=33999,ou=People,dc=example,dc=com
changetype: modify
replace: sn
sn: change$i
-
replace: givenName
givenName: tll$i
EOF
done
----------

#! /bin/bash
for ((  i=0 ; i<50000 ; i++ ))
do
ldapmodify -D "cn=Directory Manager" -p 389 -h localhost -w test1234 << EOF
dn: uid=33999,ou=People,dc=example,dc=com 
changetype: modify
replace: mail
mail: change$i@gmail.com
EOF
done

------

Result 

modification succesful
Nothing fails
Comment 6 errata-xmlrpc 2016-11-03 16:33:29 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2594.html

Note You need to log in before you can comment on or make changes to this bug.