Bug 1501058 - [memberOf Plugin] bulk deleting users causes deadlock when there are multiple backends
Summary: [memberOf Plugin] bulk deleting users causes deadlock when there are multiple...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base
Version: 7.4
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: mreynolds
QA Contact: Viktor Ashirov
Marc Muehlfeld
URL:
Whiteboard:
Depends On:
Blocks: 1504536
TreeView+ depends on / blocked
 
Reported: 2017-10-12 06:13 UTC by Hiroko Miura
Modified: 2021-03-11 15:58 UTC (History)
7 users (show)

Fixed In Version: 389-ds-base-1.3.7.5-5.el7
Doc Type: Bug Fix
Doc Text:
An unnecessary global lock has been removed from Directory Server Previously, when the memberOf plug-in was enabled and users and groups were stored in separate back ends, a deadlock could occur. An unnecessary global lock has been removed and, as a result, the deadlock no longer occurs in the mentioned scenario.
Clone Of:
: 1504536 (view as bug list)
Environment:
Last Closed: 2018-04-10 14:21:13 UTC
Target Upstream Version:


Attachments (Terms of Use)
reproducer including sample LDIF and tet scripts (36.76 KB, application/zip)
2017-10-12 06:13 UTC, Hiroko Miura
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github 389ds 389-ds-base issues 1566 0 None None None 2020-09-13 21:29:22 UTC
Red Hat Product Errata RHBA-2018:0811 0 None None None 2018-04-10 14:22:03 UTC

Description Hiroko Miura 2017-10-12 06:13:28 UTC
Created attachment 1337536 [details]
reproducer including sample LDIF and tet scripts

Description of problem:

In the following replication environments

   - users and groups are stored in different backend
   - memberOf plugin is enabled with memberofallbackend=on
   - MMR environment with fractional replication except memberofattr

When deleting many uesrs who belongs multiple groups continuously, deadlock happens between deleting user and deleting the user from group members (which is triggered by memberOf plugin).

Here is stack of threads which causes deadlock.

Thread 18 (Thread 0x7f96e1136700 (LWP 11750)):
#0  0x00007f97547086d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f9754d5f463 in PR_EnterMonitor () from /lib64/libnspr4.so
#2  0x00007f974af80116 in dblayer_txn_begin () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#3  0x00007f974afbb4e8 in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#4  0x00007f97569d69cb in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
#5  0x00007f97569d7544 in modify_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0
#6  0x00007f974a68b525 in memberof_del_dn_type_callback () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
#7  0x00007f97569febad in send_ldap_search_entry_ext () from /usr/lib64/dirsrv/libslapd.so.0
#8  0x00007f97569ff3ac in send_ldap_search_entry () from /usr/lib64/dirsrv/libslapd.so.0
#9  0x00007f97569dc091 in iterate.isra.0.constprop.3 () from /usr/lib64/dirsrv/libslapd.so.0
#10 0x00007f97569dc1da in send_results_ext.constprop.2 () from /usr/lib64/dirsrv/libslapd.so.0
#11 0x00007f97569ddc11 in op_shared_search () from /usr/lib64/dirsrv/libslapd.so.0
#12 0x00007f97569edc2e in search_internal_callback_pb () from /usr/lib64/dirsrv/libslapd.so.0
#13 0x00007f974a68a5fb in memberof_call_foreach_dn.isra.9 () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
#14 0x00007f974a68b1b2 in memberof_del_dn_from_groups.isra.11 () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
#15 0x00007f974a68e68d in memberof_postop_del () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
#16 0x00007f97569e8c7b in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
#17 0x00007f97569e8f13 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
#18 0x00007f974afaceab in ldbm_back_delete () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#19 0x00007f975699bff0 in op_shared_delete () from /usr/lib64/dirsrv/libslapd.so.0
#20 0x00007f975699c372 in do_delete () from /usr/lib64/dirsrv/libslapd.so.0
#21 0x00007f97572d2972 in connection_threadmain ()
#22 0x00007f9754d6496b in _pt_root () from /lib64/libnspr4.so
#23 0x00007f9754704dc5 in start_thread () from /lib64/libpthread.so.0
#24 0x00007f9753fe773d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f96da929700 (LWP 11763)):
#0  0x00007f97547086d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f9754d5f463 in PR_EnterMonitor () from /lib64/libnspr4.so
#2  0x00007f974a68d16c in memberof_lock () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
#3  0x00007f974a68da49 in memberof_postop_modify () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
#4  0x00007f97569e8c7b in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
#5  0x00007f97569e8f13 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
#6  0x00007f974afbb36c in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#7  0x00007f97569d69cb in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
#8  0x00007f97569d7dfb in do_modify () from /usr/lib64/dirsrv/libslapd.so.0
#9  0x00007f97572d2955 in connection_threadmain ()
#10 0x00007f9754d6496b in _pt_root () from /lib64/libnspr4.so
#11 0x00007f9754704dc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f9753fe773d in clone () from /lib64/libc.so.6


Version-Release number of selected component (if applicable):
RHDS-10
389-ds-base-1.3.6.1-19
RHEL-7.4


How reproducible:
delete many users by ldapmodify 

Steps to Reproduce:
attached test data and scripts(deadlock-repeoducer.zip)

1. prepare 2 DS instances(say M1 and M2) with the following suffix
==================================================================

  dc=example.dc=com (parent suffix)
      +-- ou=People,dc=example,dc=com (Sub Suffix)
      +-- ou=Groups,dc=example,dc=com (Sub Suffix)


2.  enable memberOf Plugin with memberofallbackend=on
=====================================================
i.e.
dn: cn=MemberOf Plugin,cn=plugins,cn=config
objectClass: top
objectClass: nsSlapdPlugin
objectClass: extensibleObject
cn: MemberOf Plugin
nsslapd-pluginPath: libmemberof-plugin
nsslapd-pluginInitfunc: memberof_postop_init
nsslapd-pluginType: betxnpostoperation
nsslapd-pluginEnabled: on                           <<
nsslapd-plugin-depends-on-type: database
memberofgroupattr: member
memberofattr: memberOf
nsslapd-pluginId: memberof
nsslapd-pluginVersion: 1.3.6.1
nsslapd-pluginVendor: 389 Project
nsslapd-pluginDescription: memberof plugin
memberofallbackends: on                             <<


3. increased DB locks (x10)
===========================
i.e
dn: cn=config,cn=ldbm database,cn=plugins,cn=config
  ...
nsslapd-db-locks: 100000
  ...

4. configure 2 MMR fractional replication without memberOfattr per suffix
==========================================================================
  create fractional replication agreements per suffix in both masters (6 in total)

e.g.
---
dn: cn=p_to_m2,cn=replica,cn=ou\3DPeople\2Cdc\3Dexample\2Cdc\3Dcom,cn=mapping 
 tree,cn=config
objectClass: top
objectClass: nsDS5ReplicationAgreement
description: p_to_m2
cn: p_to_m2
nsDS5ReplicaRoot: ou=People,dc=example,dc=com
nsDS5ReplicaHost: rhel71ds.example.com
nsDS5ReplicaPort: 2222
nsDS5ReplicaBindDN: cn=Replication Manager,cn=replication,cn=config
nsDS5ReplicaTransportInfo: LDAP
nsDS5ReplicaBindMethod: SIMPLE
nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE memberOf   <<
 ...
---

5. initialize DB and replication
================================
import example.ldif,people.ldif,groups.ldif to M1 and then initialize M2


6. run test scripts against M1
==============================
"deadlockTest.sh" is main test scripts which call the other scripts.
please adjust the following paramater at the beginning of this script
based on your M1 instance  before running

line
   1 LOOP=$1
   2 HOST="localhost"                <<<   
   3 PORT=1111                       <<<
   4 ROOTDN="cn=Directory Manager"   <<<
   5 PASSWORD="dirmanager"           <<<

You can run test by specifying # of test users like :

$ ./deadlockTest.sh 50

    here is Test Scenario 
    ---------------------
    1. add 50 users
    2. add these 50 users to 10 groups respectively
    3. delete 50 users
  
   => This script might got stuck user deletion (deadlock happens with M1)
      If script completes without problem
      please check if all of user deletion are replicated to M2 
      (if not, deadlock happens with M2)

Since this is indeed timing issue, you may need to run test scripts several times to reproduce the issue.

Actual results:
deadlock happens either of M1 or M2


Expected results:
deadlock never happen


Additional info:

Comment 2 mreynolds 2017-10-12 13:28:13 UTC
If we implement this ticket below, it should resolve "this" deadlock.

https://pagure.io/389-ds-base/issue/48235


However, using cross-accessed backends like this is is more likely to cause these kinds of deadlocks with plugins.  I'm just worried about other plugins like Referential Integrity.

Does the customer have a testing environment to try a potential hotfix?

Comment 3 Hiroko Miura 2017-10-12 13:37:37 UTC
I think customer can test the test-patch in their env but let me confirm.
I have also reproduction environment with 389-ds-base-1.3.6.1-19 and can test it as well.

Comment 19 Viktor Ashirov 2018-02-20 13:55:30 UTC
Build tested:
389-ds-base-1.3.7.5-18.el7.x86_64

Using reproducer from the description, I can no longer reproduce the problem.

Marking as VERIFIED.

Comment 22 errata-xmlrpc 2018-04-10 14:21:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0811


Note You need to log in before you can comment on or make changes to this bug.