Red Hat Bugzilla – Bug 1117021
Server deadlock if online import started while server is under load
Last modified: 2015-03-05 04:35:40 EST
This bug is created as a clone of upstream ticket: https://fedorahosted.org/389/ticket/47781 If a server in a MMR environment is under load (doing adds and deletes), and you try to initialize the database(ldif2db.pl), you can deadlock the server: {{{ #0 0x000000378e40e054 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x000000378e4093be in _L_lock_995 () from /lib64/libpthread.so.0 #2 0x000000378e409326 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000003a8d023fe9 in PR_Lock () from /lib64/libnspr4.so #4 0x00007f0d153113a8 in replica_get_generation (r=0x12c8790) at ../ds/ldap/servers/plugins/replication/repl5_replica.c:957 #5 0x00007f0d1530c84d in copy_operation_parameters (pb=0x7f0cfc0192c0) at ../ds/ldap/servers/plugins/replication/repl5_plugins.c:923 #6 0x00007f0d1530bab9 in multimaster_preop_delete (pb=0x7f0cfc0192c0) at ../ds/ldap/servers/plugins/replication/repl5_plugins.c:391 #7 0x00007f0d18abddb9 in plugin_call_func (list=0xf66860, operation=423, pb=0x7f0cfc0192c0, call_one=0) at ../ds/ldap/servers/slapd/plugin.c:1453 #8 0x00007f0d18abdc6c in plugin_call_list (list=0xf57ac0, operation=423, pb=0x7f0cfc0192c0) at ../ds/ldap/servers/slapd/plugin.c:1415 #9 0x00007f0d18abc200 in plugin_call_plugins (pb=0x7f0cfc0192c0, whichfunction=423) at ../ds/ldap/servers/slapd/plugin.c:398 #10 0x00007f0d18a67584 in op_shared_delete (pb=0x7f0cfc0192c0) at ../ds/ldap/servers/slapd/delete.c:355 #11 0x00007f0d18a670e6 in delete_internal_pb (pb=0x7f0cfc0192c0) at ../ds/ldap/servers/slapd/delete.c:242 #12 0x00007f0d18a66f2d in slapi_delete_internal_pb (pb=0x7f0cfc0192c0) at ../ds/ldap/servers/slapd/delete.c:185 #13 0x00007f0d15314b8e in _delete_tombstone (tombstone_dn=0x12acb60 "dc=example,dc=com", uniqueid=0x7f0d15355b10 "ffffffff-ffffffff-ffffffff-ffffffff", ext_op_flags=131072) at ../ds/ldap/servers/plugins/replication/repl5_replica.c:2723 #14 0x00007f0d15313d65 in _replica_configure_ruv (r=0x12c8790, isLocked=1) at ../ds/ldap/servers/plugins/replication/repl5_replica.c:2225 replica_reload_ruv() takes repl lock --> as does frame #3 #15 0x00007f0d15311efe in replica_reload_ruv (r=0x12c8790) at ../ds/ldap/servers/plugins/replication/repl5_replica.c:1318 #16 0x00007f0d153169ce in replica_enable_replication (r=0x12c8790) at ../ds/ldap/servers/plugins/replication/repl5_replica.c:3612 #17 0x00007f0d1530d87e in multimaster_be_state_change (handle=0x7f0d1530d7cf, be_name=0x7f0cfc00b3b0 "userRoot", old_be_state=2, new_be_state=1) at ../ds/ldap/servers/plugins/replication/repl5_plugins.c:1487 #18 0x00007f0d18a9ffd8 in mtn_be_state_change (be_name=0x7f0cfc00b3b0 "userRoot", old_state=2, new_state=1) at ../ds/ldap/servers/slapd/mapping_tree.c:237 #19 0x00007f0d18aa65a7 in mtn_internal_be_set_state (be=0xfa2310, state=1) at ../ds/ldap/servers/slapd/mapping_tree.c:3584 #20 0x00007f0d18aa6628 in slapi_mtn_be_enable (be=0xfa2310) at ../ds/ldap/servers/slapd/mapping_tree.c:3634 #21 0x00007f0d155b4132 in import_all_done (job=0x7f0c9802a790, ret=0) at ../ds/ldap/servers/slapd/back-ldbm/import.c:1118 #22 0x00007f0d155b4ec4 in import_main_offline (arg=0x7f0c9802a790) at ../ds/ldap/servers/slapd/back-ldbm/import.c:1510 #23 0x00007f0d155b4f19 in import_main (arg=0x7f0c9802a790) at ../ds/ldap/servers/slapd/back-ldbm/import.c:1530 #24 0x0000003a8d029a73 in ?? () from /lib64/libnspr4.so #25 0x000000378e407851 in start_thread () from /lib64/libpthread.so.0 #26 0x000000378e0e890d in clone () from /lib64/libc.so.6 }}}
Verification Steps: [1] Install a single instance of 389 using "dc=example,dc=com" [2] Enable the changelog Example: ldapmodify -D "cn=directory manager" ... dn: cn=changelog5,cn=config changetype: add objectClass: top objectClass: extensibleObject cn: changelog5 nsslapd-changelogdir: /var/lib/dirsrv/slapd-localhost/changelogdb [3] Enable replication Example: ldapmodify -D "cn=directory manager" ... dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config changetype: add objectClass: nsDS5Replica objectClass: top nsDS5ReplicaRoot: dc=example,dc=com nsDS5ReplicaType: 3 nsDS5Flags: 1 nsDS5ReplicaId: 1 nsds5ReplicaPurgeDelay: 604800 cn: replica [4] Create a replication agreement that points to a non-existent server on the same machine: Example: ldapmodify -D "cn=directory manager" ... dn: cn=fake agreement,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config changetype: add objectClass: top objectClass: nsDS5ReplicationAgreement description: fake agreement cn: fake agreement nsDS5ReplicaRoot: dc=example,dc=com nsDS5ReplicaHost: localhost.localdomain nsDS5ReplicaPort: 5555 nsDS5ReplicaBindDN: uid=doesn'tmatter nsDS5ReplicaTransportInfo: LDAP nsDS5ReplicaBindMethod: SIMPLE nsDS5ReplicaCredentials: nothing [5] Make some updates to the database [6] Export the database(retaining the replication state information) Example: ldapmodify -D "cn=directory manager" ... dn: cn=export1404764503038,cn=export,cn=tasks,cn=config changetype: add objectClass: top objectClass: extensibleObject cn: export1404764503038 ttl: 4 nsfilename: /tmp/deadlock.ldif nsinstance: userroot nsuseonefile: TRUE nsexportreplica: TRUE [7] Restart the server [8] Import the ldif(/tmp/export.ldif) ldapmodify -D "cn=directory manager" ... dn: cn=import1404764623289,cn=import,cn=tasks,cn=config changetype: add objectClass: top objectClass: extensibleObject cn: import1404764623289 ttl: 4 nsfilename: /tmp/deadlock.ldif nsinstance: userroot [9] Search for tombstone entries: ldapsearch -D "directory manager -w Secret123 -b "dc=example,dc=com" -xLLL objectclass=nstombstone This search should NOT hang and return at least one entry (cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config).
Fixed upstream.
$ rpm -qa | grep 389 389-ds-base-libs-1.3.3.1-12.el7.x86_64 389-ds-base-debuginfo-1.3.3.1-12.el7.x86_64 389-ds-base-1.3.3.1-12.el7.x86_64 [1] Install a single instance of 389 using "dc=example,dc=com" [2] Enable the changelog $ ldapmodify -D "cn=Directory Manager" -w Secret123 -a << EOF dn: cn=changelog5,cn=config objectclass: top objectclass: extensibleObject cn: changelog5 nsslapd-changelogdir: /var/lib/dirsrv/slapd-rhel7/changelogdb EOF adding new entry "cn=changelog5,cn=config" [3] Enable replication $ ldapmodify -D "cn=Directory Manager" -w Secret123 -a << EOF dn: cn=replica,cn="dc=example,dc=com",cn=mapping tree,cn=config changetype: add objectclass: top objectclass: nsds5replica objectclass: extensibleObject cn: replica nsds5replicaroot: dc=example,dc=com nsds5replicaid: 7 nsds5replicatype: 3 nsds5flags: 1 nsds5ReplicaPurgeDelay: 604800 nsds5ReplicaBindDN: cn=SyncManager,cn=config EOF adding new entry "cn=replica,cn="dc=example,dc=com",cn=mapping tree,cn=config" [4] Create a replication agreement that points to a non-existent server on the same machine: $ ldapmodify -D "cn=Directory Manager" -w Secret123 -a << EOF dn: cn=fake agreement,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config changetype: add objectClass: top objectClass: nsDS5ReplicationAgreement description: fake agreement cn: fake agreement nsDS5ReplicaRoot: dc=example,dc=com nsDS5ReplicaHost: localhost.localdomain nsDS5ReplicaPort: 5555 nsDS5ReplicaBindDN: uid=doesn'tmatter nsDS5ReplicaTransportInfo: LDAP nsDS5ReplicaBindMethod: SIMPLE nsDS5ReplicaCredentials: nothing EOF adding new entry "cn=fake agreement,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config" [5] Make some updates to the database ldapmodify -D "cn=Directory Manager" -w Secret123 -a -f 10users.ldif adding new entry "cn=user1,ou=People,dc=example,dc=com" adding new entry "cn=user2,ou=People,dc=example,dc=com" adding new entry "cn=user3,ou=People,dc=example,dc=com" adding new entry "cn=user4,ou=People,dc=example,dc=com" adding new entry "cn=user5,ou=People,dc=example,dc=com" adding new entry "cn=user6,ou=People,dc=example,dc=com" adding new entry "cn=user7,ou=People,dc=example,dc=com" adding new entry "cn=user8,ou=People,dc=example,dc=com" adding new entry "cn=user9,ou=People,dc=example,dc=com" adding new entry "cn=user10,ou=People,dc=example,dc=com" [6] Export the database(retaining the replication state information) $ ldapmodify -D "cn=Directory Manager" -w Secret123 -a << EOF dn: cn=export1404764503038,cn=export,cn=tasks,cn=config changetype: add objectClass: top objectClass: extensibleObject cn: export1404764503038 ttl: 4 nsfilename: /tmp/deadlock.ldif nsinstance: userroot nsuseonefile: TRUE nsexportreplica: TRUE EOF adding new entry "cn=export1404764503038,cn=export,cn=tasks,cn=config" [7] Restart the server sudo systemctl restart dirsrv.target [8] Import the ldif(/tmp/export.ldif) $ ldapmodify -D "cn=Directory Manager" -w Secret123 -a << EOF dn: cn=import1404764623289,cn=import,cn=tasks,cn=config changetype: add objectClass: top objectClass: extensibleObject cn: import1404764623289 ttl: 4 nsfilename: /tmp/deadlock.ldif nsinstance: userroot EOF adding new entry "cn=import1404764623289,cn=import,cn=tasks,cn=config" [9] Search for tombstone entries: $ ldapsearch -D "cn=Directory Manager" -w Secret123 -b "dc=example,dc=com" -LLL objectclass=nstombstone dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config objectClass: top objectClass: nsds5replica objectClass: extensibleObject cn: replica nsDS5ReplicaRoot: dc=example,dc=com nsDS5ReplicaId: 7 nsDS5ReplicaType: 3 nsDS5Flags: 1 nsds5ReplicaPurgeDelay: 604800 nsDS5ReplicaBindDN: cn=SyncManager,cn=config nsState:: BwAAAAAAAAAOzsZUAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAAAAA== nsDS5ReplicaName: 49922186-a5b311e4-8db6c639-384f4808 nsds50ruv: {replicageneration} 54c6ce0b000000070000 nsds50ruv: {replica 7 ldap://rhel7.brq.redhat.com:389} nsruvReplicaLastModified: {replica 7 ldap://rhel7.brq.redhat.com:389} 00000000 nsds5ReplicaChangeCount: 0 nsds5replicareapactive: 0 Search didn't hang and returned cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config entry. Marking as VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0416.html