Bug 1117021 - Server deadlock if online import started while server is under load
Summary: Server deadlock if online import started while server is under load
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base
Version: 7.0
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: rc
: ---
Assignee: Noriko Hosoi
QA Contact: Viktor Ashirov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-07 20:05 UTC by Noriko Hosoi
Modified: 2015-03-05 09:35 UTC (History)
4 users (show)

Fixed In Version: 389-ds-base-1.3.3.1-1.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-03-05 09:35:40 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0416 normal SHIPPED_LIVE Important: 389-ds-base security, bug fix, and enhancement update 2015-03-05 14:26:33 UTC

Description Noriko Hosoi 2014-07-07 20:05:31 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/47781

If a server in a MMR environment is under load (doing adds and deletes), and you try to initialize the database(ldif2db.pl), you can deadlock the server:


{{{
#0  0x000000378e40e054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000378e4093be in _L_lock_995 () from /lib64/libpthread.so.0
#2  0x000000378e409326 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000003a8d023fe9 in PR_Lock () from /lib64/libnspr4.so
#4  0x00007f0d153113a8 in replica_get_generation (r=0x12c8790) at ../ds/ldap/servers/plugins/replication/repl5_replica.c:957
#5  0x00007f0d1530c84d in copy_operation_parameters (pb=0x7f0cfc0192c0) at ../ds/ldap/servers/plugins/replication/repl5_plugins.c:923
#6  0x00007f0d1530bab9 in multimaster_preop_delete (pb=0x7f0cfc0192c0) at ../ds/ldap/servers/plugins/replication/repl5_plugins.c:391
#7  0x00007f0d18abddb9 in plugin_call_func (list=0xf66860, operation=423, pb=0x7f0cfc0192c0, call_one=0) at ../ds/ldap/servers/slapd/plugin.c:1453
#8  0x00007f0d18abdc6c in plugin_call_list (list=0xf57ac0, operation=423, pb=0x7f0cfc0192c0) at ../ds/ldap/servers/slapd/plugin.c:1415
#9  0x00007f0d18abc200 in plugin_call_plugins (pb=0x7f0cfc0192c0, whichfunction=423) at ../ds/ldap/servers/slapd/plugin.c:398
#10 0x00007f0d18a67584 in op_shared_delete (pb=0x7f0cfc0192c0) at ../ds/ldap/servers/slapd/delete.c:355
#11 0x00007f0d18a670e6 in delete_internal_pb (pb=0x7f0cfc0192c0) at ../ds/ldap/servers/slapd/delete.c:242
#12 0x00007f0d18a66f2d in slapi_delete_internal_pb (pb=0x7f0cfc0192c0) at ../ds/ldap/servers/slapd/delete.c:185
#13 0x00007f0d15314b8e in _delete_tombstone (tombstone_dn=0x12acb60 "dc=example,dc=com", uniqueid=0x7f0d15355b10 "ffffffff-ffffffff-ffffffff-ffffffff", ext_op_flags=131072)
    at ../ds/ldap/servers/plugins/replication/repl5_replica.c:2723
#14 0x00007f0d15313d65 in _replica_configure_ruv (r=0x12c8790, isLocked=1) at ../ds/ldap/servers/plugins/replication/repl5_replica.c:2225

replica_reload_ruv() takes repl lock --> as does frame #3

#15 0x00007f0d15311efe in replica_reload_ruv (r=0x12c8790) at ../ds/ldap/servers/plugins/replication/repl5_replica.c:1318
#16 0x00007f0d153169ce in replica_enable_replication (r=0x12c8790) at ../ds/ldap/servers/plugins/replication/repl5_replica.c:3612
#17 0x00007f0d1530d87e in multimaster_be_state_change (handle=0x7f0d1530d7cf, be_name=0x7f0cfc00b3b0 "userRoot", old_be_state=2, new_be_state=1) at ../ds/ldap/servers/plugins/replication/repl5_plugins.c:1487
#18 0x00007f0d18a9ffd8 in mtn_be_state_change (be_name=0x7f0cfc00b3b0 "userRoot", old_state=2, new_state=1) at ../ds/ldap/servers/slapd/mapping_tree.c:237
#19 0x00007f0d18aa65a7 in mtn_internal_be_set_state (be=0xfa2310, state=1) at ../ds/ldap/servers/slapd/mapping_tree.c:3584
#20 0x00007f0d18aa6628 in slapi_mtn_be_enable (be=0xfa2310) at ../ds/ldap/servers/slapd/mapping_tree.c:3634
#21 0x00007f0d155b4132 in import_all_done (job=0x7f0c9802a790, ret=0) at ../ds/ldap/servers/slapd/back-ldbm/import.c:1118
#22 0x00007f0d155b4ec4 in import_main_offline (arg=0x7f0c9802a790) at ../ds/ldap/servers/slapd/back-ldbm/import.c:1510
#23 0x00007f0d155b4f19 in import_main (arg=0x7f0c9802a790) at ../ds/ldap/servers/slapd/back-ldbm/import.c:1530
#24 0x0000003a8d029a73 in ?? () from /lib64/libnspr4.so
#25 0x000000378e407851 in start_thread () from /lib64/libpthread.so.0
#26 0x000000378e0e890d in clone () from /lib64/libc.so.6
}}}

Comment 1 mreynolds 2014-07-07 20:29:45 UTC
Verification Steps:

[1]  Install a single instance of 389 using "dc=example,dc=com"
[2]  Enable the changelog

Example:

ldapmodify  -D "cn=directory manager" ...
dn: cn=changelog5,cn=config
changetype: add
objectClass: top
objectClass: extensibleObject
cn: changelog5
nsslapd-changelogdir: /var/lib/dirsrv/slapd-localhost/changelogdb

[3]  Enable replication

Example:

ldapmodify  -D "cn=directory manager" ...
dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
changetype: add
objectClass: nsDS5Replica
objectClass: top
nsDS5ReplicaRoot: dc=example,dc=com
nsDS5ReplicaType: 3
nsDS5Flags: 1
nsDS5ReplicaId: 1
nsds5ReplicaPurgeDelay: 604800
cn: replica

[4]  Create a replication agreement that points to a non-existent server on the same machine:

Example:

ldapmodify  -D "cn=directory manager" ...
dn: cn=fake agreement,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
changetype: add
objectClass: top
objectClass: nsDS5ReplicationAgreement
description: fake agreement
cn: fake agreement
nsDS5ReplicaRoot: dc=example,dc=com
nsDS5ReplicaHost: localhost.localdomain
nsDS5ReplicaPort: 5555
nsDS5ReplicaBindDN: uid=doesn'tmatter
nsDS5ReplicaTransportInfo: LDAP
nsDS5ReplicaBindMethod: SIMPLE
nsDS5ReplicaCredentials: nothing

[5]  Make some updates to the database

[6]  Export the database(retaining the replication state information)

Example:

ldapmodify  -D "cn=directory manager" ...
dn: cn=export1404764503038,cn=export,cn=tasks,cn=config
changetype: add
objectClass: top
objectClass: extensibleObject
cn: export1404764503038
ttl: 4
nsfilename: /tmp/deadlock.ldif
nsinstance: userroot
nsuseonefile: TRUE
nsexportreplica: TRUE

[7]  Restart the server

[8]  Import the ldif(/tmp/export.ldif)

ldapmodify  -D "cn=directory manager" ...
dn: cn=import1404764623289,cn=import,cn=tasks,cn=config
changetype: add
objectClass: top
objectClass: extensibleObject
cn: import1404764623289
ttl: 4
nsfilename: /tmp/deadlock.ldif
nsinstance: userroot

[9]  Search for tombstone entries:

ldapsearch -D "directory manager -w Secret123 -b "dc=example,dc=com" -xLLL objectclass=nstombstone

This search should NOT hang and return at least one entry (cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config).

Comment 2 mreynolds 2014-07-07 20:36:51 UTC
Fixed upstream.

Comment 4 Viktor Ashirov 2015-01-26 23:35:29 UTC
$ rpm -qa | grep 389
389-ds-base-libs-1.3.3.1-12.el7.x86_64
389-ds-base-debuginfo-1.3.3.1-12.el7.x86_64
389-ds-base-1.3.3.1-12.el7.x86_64

[1]  Install a single instance of 389 using "dc=example,dc=com"
[2]  Enable the changelog
$ ldapmodify -D "cn=Directory Manager" -w Secret123 -a << EOF
dn: cn=changelog5,cn=config
objectclass: top
objectclass: extensibleObject
cn: changelog5
nsslapd-changelogdir: /var/lib/dirsrv/slapd-rhel7/changelogdb
EOF
adding new entry "cn=changelog5,cn=config"

[3]  Enable replication
$ ldapmodify -D "cn=Directory Manager" -w Secret123 -a << EOF
dn: cn=replica,cn="dc=example,dc=com",cn=mapping tree,cn=config
changetype: add
objectclass: top
objectclass: nsds5replica
objectclass: extensibleObject
cn: replica
nsds5replicaroot: dc=example,dc=com
nsds5replicaid: 7
nsds5replicatype: 3
nsds5flags: 1
nsds5ReplicaPurgeDelay: 604800
nsds5ReplicaBindDN: cn=SyncManager,cn=config
EOF
adding new entry "cn=replica,cn="dc=example,dc=com",cn=mapping tree,cn=config"

[4]  Create a replication agreement that points to a non-existent server on the same machine:
$ ldapmodify -D "cn=Directory Manager" -w Secret123 -a << EOF
dn: cn=fake agreement,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
changetype: add
objectClass: top
objectClass: nsDS5ReplicationAgreement
description: fake agreement
cn: fake agreement
nsDS5ReplicaRoot: dc=example,dc=com
nsDS5ReplicaHost: localhost.localdomain
nsDS5ReplicaPort: 5555
nsDS5ReplicaBindDN: uid=doesn'tmatter
nsDS5ReplicaTransportInfo: LDAP
nsDS5ReplicaBindMethod: SIMPLE
nsDS5ReplicaCredentials: nothing
EOF
adding new entry "cn=fake agreement,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config"

[5]  Make some updates to the database
ldapmodify -D "cn=Directory Manager" -w Secret123 -a -f 10users.ldif
adding new entry "cn=user1,ou=People,dc=example,dc=com"

adding new entry "cn=user2,ou=People,dc=example,dc=com"

adding new entry "cn=user3,ou=People,dc=example,dc=com"

adding new entry "cn=user4,ou=People,dc=example,dc=com"

adding new entry "cn=user5,ou=People,dc=example,dc=com"

adding new entry "cn=user6,ou=People,dc=example,dc=com"

adding new entry "cn=user7,ou=People,dc=example,dc=com"

adding new entry "cn=user8,ou=People,dc=example,dc=com"

adding new entry "cn=user9,ou=People,dc=example,dc=com"

adding new entry "cn=user10,ou=People,dc=example,dc=com"

[6]  Export the database(retaining the replication state information)
$ ldapmodify -D "cn=Directory Manager" -w Secret123 -a << EOF
dn: cn=export1404764503038,cn=export,cn=tasks,cn=config
changetype: add
objectClass: top
objectClass: extensibleObject
cn: export1404764503038
ttl: 4
nsfilename: /tmp/deadlock.ldif
nsinstance: userroot
nsuseonefile: TRUE
nsexportreplica: TRUE
EOF
adding new entry "cn=export1404764503038,cn=export,cn=tasks,cn=config"

[7]  Restart the server
sudo systemctl restart dirsrv.target

[8]  Import the ldif(/tmp/export.ldif)
$ ldapmodify -D "cn=Directory Manager" -w Secret123 -a << EOF
dn: cn=import1404764623289,cn=import,cn=tasks,cn=config
changetype: add
objectClass: top
objectClass: extensibleObject
cn: import1404764623289
ttl: 4
nsfilename: /tmp/deadlock.ldif
nsinstance: userroot
EOF
adding new entry "cn=import1404764623289,cn=import,cn=tasks,cn=config"

[9]  Search for tombstone entries:
$ ldapsearch -D "cn=Directory Manager" -w Secret123 -b "dc=example,dc=com" -LLL objectclass=nstombstone
dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
objectClass: top
objectClass: nsds5replica
objectClass: extensibleObject
cn: replica
nsDS5ReplicaRoot: dc=example,dc=com
nsDS5ReplicaId: 7
nsDS5ReplicaType: 3
nsDS5Flags: 1
nsds5ReplicaPurgeDelay: 604800
nsDS5ReplicaBindDN: cn=SyncManager,cn=config
nsState:: BwAAAAAAAAAOzsZUAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAAAAA==
nsDS5ReplicaName: 49922186-a5b311e4-8db6c639-384f4808
nsds50ruv: {replicageneration} 54c6ce0b000000070000
nsds50ruv: {replica 7 ldap://rhel7.brq.redhat.com:389}
nsruvReplicaLastModified: {replica 7 ldap://rhel7.brq.redhat.com:389} 00000000
nsds5ReplicaChangeCount: 0
nsds5replicareapactive: 0

Search didn't hang and returned cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config entry.

Marking as VERIFIED

Comment 6 errata-xmlrpc 2015-03-05 09:35:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0416.html


Note You need to log in before you can comment on or make changes to this bug.