Hide Forgot
+++ This bug was initially created as a clone of Bug #758830 +++ Created attachment 538758 [details] Carry the current transaction id in some new pblocks Description of problem: I'm in the process of converting the IPA password plugin to execute as a transaction plugin. This causes 389-ds to hang in the backend when the schema-compat plugin is invoked as part of the call process. Disabling schema compat resolves the issue. The problem exhibits itself when a plugin calls slapi_pblock_new() or slapi_pblock_init() when inside a transaction and doesn't get the parent TXN value. I've attached a patch that applies this narrowly. I haven't tested all possible paths but I'm able to do these which I think exercises it fairly well: 1. ipa user-add --first Tim --last User tuser1 --password 2. ldapsearch -Y GSSAPI -b cn=compat,dc=example,dc=com 3. ypcat -h `hostname` -d example.com passwd 4. ipa unit tests I didn't carry this across to all new pblock creation as I'm not that familiar with the code. Version-Release number of selected component (if applicable): slapi-nis-0.25-1.fc15 389-ds-base-1.2.10-0.5.a5.fc15 --- Additional comment from nalin on 2011-11-30 18:01:16 EST --- If the compat tree needs to support the "Read Your Writes" semantics outlined at http://www.directory.fedora.redhat.com/wiki/LDAP_Transactions , then it'll need to do more than simply carry the transaction information to the pblocks used for internal operations when it needs to update its cached data: that cache is going to have to become transaction-aware so that searching it works as expected. --- Additional comment from rcritten on 2011-12-01 11:09:05 EST --- Note that I can cause a similar deadlock with schema-compat disabled and memberof configured as betxnpostoperation, BZ https://bugzilla.redhat.com/show_bug.cgi?id=759183
Can you please provide steps to reproduce for QE without devel environment and unit test... Thanks
There is nothing to QE yet. This was discovered while converting the password plugin to use transactions. The code is not committed because it causes the server to hang in schema-compat (the root cause may lie in 389-ds and not the schema-compat plugin). If you want to see a similar hang then in dn: cn=MemberOf Plugin,cn=plugins,cn=config set nsslapd-plugintype to betxnpostoperation and restart 389-ds. Add a user and you'll see it hang.
Please add steps to verify/reproduce.
Is there anything to QE yet here? Has it been determined that this hang is the same as bug 759183? Will the same test work here?
I can confirm that this is a specific case of what appears to be described in that bug. Basically, if you are going to access the ldbm back-end (even for reading) while inside of a transaction, you need to perform those accesses while referring to the same transaction's ID, or you risk a deadlock. This mistake is easy to make if a plugin is unaware of concept of transactions, as the plugins in this package were. Configuring this plugin in combination with memberOf or mep set up as betxnpost plugins should exercise the involved code paths, provided one of the entries that is modified by the other plugins will be read by one of the plugins from this package.
Ok, so this should work as a test then? # cat /tmp/ldapmodify.test dn: cn=MemberOf Plugin,cn=plugins,cn=config changetype: modify replace: nsslapd-plugintype nsslapd-plugintype: betxnpostoperation # ldapmodify -D "$ROOTDN" -w "$ROOTDNPWD" -f /tmp/ldapmodify.test modifying entry "cn=MemberOf Plugin,cn=plugins,cn=config" # ldapsearch -LLL -x -D "$ROOTDN" -w "$ROOTDNPWD" -b "cn=MemberOf Plugin,cn=plugins,cn=config"|grep betxnpostoperation nsslapd-pluginType: betxnpostoperation # ipa group-add-member --users=admin editors Group name: editors Description: Limited admins who can edit other users GID: 3002 Member users: admin ------------------------- Number of members added 1 -------------------------
You'd also need to ensure that the schema compat plugin was enabled for groups, that the group membership was set up in a way that the schema compat plugin would need to read another entry to build up the synthetic entry it will provide, and that the schema compat plugin is not aware of transactions. The first one, I can't tell from the output above, but enabling it with ipa-compat-manage should be enough. The second requires that the group membership is recorded by setting "member" to the user's entry's DN, which IPA should already be doing correctly (you can verify using "ipa group-show editors --raw" to examine the group entry). The third is easiest to test by comparing slapi-nis 0.31 or earlier with 0.32 and later.
Ok, the first two I get but, for the third, you're saying compare the src to see if it's transaction aware, right? Is there a way to tell from the binary rpm if it's a transaction aware version of the plugin? Or, is version > 0.31 accurate there as an indicator? # echo $ADMINPW|ipa-compat-manage enable Plugin already Enabled # echo $ADMINPW|ipa-nis-manage enable Enabling plugin Restarting IPA to initialize updates before performing deletes: [1/2]: stopping directory server [2/2]: starting directory server done configuring dirsrv. This setting will not take effect until you restart Directory Server. The rpcbind service may need to be started. # service rpcbind restart Stopping rpcbind: [ OK ] Starting rpcbind: [ OK ] # service dirsrv restart Shutting down dirsrv: PKI-IPA... [ OK ] TESTRELM-COM... [ OK ] Starting dirsrv: PKI-IPA... [ OK ] TESTRELM-COM... [ OK ] # ipactl status Directory Service: RUNNING KDC Service: RUNNING KPASSWD Service: RUNNING DNS Service: RUNNING HTTP Service: RUNNING CA Service: RUNNING # cat > /tmp/bz766320_hang.ldif <<-EOF > dn: cn=MemberOf Plugin,cn=plugins,cn=config > changetype: modify > replace: nsslapd-plugintype > nsslapd-plugintype: betxnpostoperation > EOF # rlRun "ldapmodify -D \"$ROOTDN\" -w \"$ROOTDNPWD\" -f /tmp/bz766320_hang.ldif" modifying entry "cn=MemberOf Plugin,cn=plugins,cn=config" :: [ PASS ] :: Running 'ldapmodify -D "cn=Directory Manager" -w "Secret123" -f /tmp/bz766320_hang.ldif' # ldapsearch -LLL -x -D "$ROOTDN" -w "$ROOTDNPWD" -b "cn=MemberOf Plugin,cn=plugins,cn=config" dn: cn=MemberOf Plugin,cn=plugins,cn=config objectClass: top objectClass: nsSlapdPlugin objectClass: extensibleObject cn: MemberOf Plugin nsslapd-pluginPath: libmemberof-plugin nsslapd-pluginInitfunc: memberof_postop_init nsslapd-pluginType: betxnpostoperation nsslapd-pluginEnabled: on nsslapd-plugin-depends-on-type: database memberofgroupattr: member memberofgroupattr: memberUser memberofgroupattr: memberHost memberofattr: memberOf nsslapd-pluginId: memberof nsslapd-pluginVersion: 1.2.9.14 nsslapd-pluginVendor: 389 Project nsslapd-pluginDescription: memberof plugin # ipa group-add-member --users=admin editors Group name: editors Description: Limited admins who can edit other users GID: 1762200002 Member users: admin ------------------------- Number of members added 1 ------------------------- # ipa group-show editors --raw cn: editors description: Limited admins who can edit other users gidnumber: 1762200002 member: uid=admin,cn=users,cn=accounts,dc=testrelm,dc=com Looking at the source code with the example from the initial attachment, I can see some code that seems to be related to transactions for the 0.40-1 version for RHEL6: $ diff -up --recursive slapi-nis-0.26-1/slapi-nis-0.26/src/back-shr.c slapi-nis-0.40-1/slapi-nis-0.40/src/back-shr.c |grep -i txn +#ifdef USE_SLAPI_BE_TXNS + void *txn; + txn = NULL; +#ifdef SLAPI_TXN + slapi_pblock_get(cbdata->pb, SLAPI_TXN, &txn); +#ifdef SLAPI_TXN + if ((txn == NULL) && (strcmp(be_type, "ldbm database") == 0)) { +#ifdef USE_SLAPI_BE_TXNS + void *txn; + txn = NULL; +#ifdef SLAPI_TXN + slapi_pblock_get(pb, SLAPI_TXN, &txn); +#ifdef SLAPI_TXN + if ((txn == NULL) && (strcmp(be_type, "ldbm database") == 0)) { +#ifdef USE_SLAPI_BE_TXNS + void *txn; + txn = NULL; +#ifdef SLAPI_TXN + slapi_pblock_get(pb, SLAPI_TXN, &txn); +#ifdef SLAPI_TXN + if ((txn == NULL) && (strcmp(be_type, "ldbm database") == 0)) { +#ifdef USE_SLAPI_BE_TXNS + void *txn; + txn = NULL; +#ifdef SLAPI_TXN + slapi_pblock_get(pb, SLAPI_TXN, &txn); +#ifdef SLAPI_TXN + if ((txn == NULL) && (strcmp(be_type, "ldbm database") == 0)) { +#ifdef USE_SLAPI_BE_TXNS + void *txn; + txn = NULL; +#ifdef SLAPI_TXN + slapi_pblock_get(pb, SLAPI_TXN, &txn); +#ifdef SLAPI_TXN + if ((txn == NULL) && (strcmp(be_type, "ldbm database") == 0)) { +#ifdef USE_SLAPI_BE_TXNS +backend_shr_be_txn_post_add_cb(Slapi_PBlock *pb) +backend_shr_be_txn_post_modify_cb(Slapi_PBlock *pb) +backend_shr_be_txn_post_modrdn_cb(Slapi_PBlock *pb) +backend_shr_be_txn_post_delete_cb(Slapi_PBlock *pb) +backend_shr_be_txn_postop_init(Slapi_PBlock *pb, struct plugin_state *state) + if (slapi_pblock_set(pb, SLAPI_PLUGIN_BE_TXN_POST_ADD_FN, + backend_shr_be_txn_post_add_cb) != 0) { + "error hooking up be-txn-post add callback\n"); + if (slapi_pblock_set(pb, SLAPI_PLUGIN_BE_TXN_POST_MODIFY_FN, + backend_shr_be_txn_post_modify_cb) != 0) { + "error hooking up be-txn-post modify callback\n"); + if (slapi_pblock_set(pb, SLAPI_PLUGIN_BE_TXN_POST_MODRDN_FN, + backend_shr_be_txn_post_modrdn_cb) != 0) { + "error hooking up be-txn-post modrdn callback\n"); + if (slapi_pblock_set(pb, SLAPI_PLUGIN_BE_TXN_POST_DELETE_FN, + backend_shr_be_txn_post_delete_cb) != 0) { + "error hooking up be-txn-post delete callback\n");
(In reply to comment #9) > Ok, the first two I get but, for the third, you're saying compare the src to > see if it's transaction aware, right? Is there a way to tell from the binary > rpm if it's a transaction aware version of the plugin? Or, is version > 0.31 > accurate there as an indicator? The 0.31/0.32 changes included the addition of transaction awareness. If you hit a deadlock with 0.31 or ealier, and the backtrace for the deadlocked process (point pstack at the server process to check) includes one of the plugin's callback functions calling back into back-ldbm functions, then you've reproduced the bug.
Verified. Version :: slapi-nis-0.40-1.el6.x86_64 Manual Test Verification :: See output above in Comment #9. Automated Test Verification :: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ LOG ] :: nisint_bz_766320: Hang possible in schema compat when calling in a transaction :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [21:55:22] :: Machine in recipe is IPAMASTER :: [21:55:23] :: Changing nsslapd plugin type setting to betxnpostoperation to try to reproduce hang modifying entry "cn=MemberOf Plugin,cn=plugins,cn=config" :: [ PASS ] :: Running 'ldapmodify -D "cn=Directory Manager" -w "PASSWORD" -f /tmp/bz766320_hang.ldif' :: [21:55:23] :: If job/test hangs after group-add-member, then you have hit BZ 766320 Group name: editors Description: Limited admins who can edit other users GID: 834000002 Member users: admin ------------------------- Number of members added 1 ------------------------- :: [ PASS ] :: Running 'ipa group-add-member --users=admin editors' cn: editors description: Limited admins who can edit other users gidnumber: 834000002 member: uid=admin,cn=users,cn=accounts,dc=testrelm,dc=com :: [ PASS ] :: Running 'ipa group-show editors --raw' Directory Service: RUNNING KDC Service: RUNNING KPASSWD Service: RUNNING DNS Service: RUNNING MEMCACHE Service: RUNNING HTTP Service: RUNNING CA Service: RUNNING :: [ PASS ] :: Running 'ipactl status' :: [ PASS ] :: BZ 766320 not found. apparently the ipa command didnt hang :: [21:55:31] :: Returning plugintype setting back to original modifying entry "cn=MemberOf Plugin,cn=plugins,cn=config" :: [ PASS ] :: Running 'ldapmodify -D "cn=Directory Manager" -w "PASSWORD" -f /tmp/bz766320_fix.ldif' Group name: editors Description: Limited admins who can edit other users GID: 834000002 --------------------------- Number of members removed 1 --------------------------- :: [ PASS ] :: Running 'ipa group-remove-member --users=admin editors' result_server not set, assuming developer mode. Setting 192.168.122.101 to state nisint_bz_766320 :: [ PASS ] :: Running 'rhts-sync-set -s 'nisint_bz_766320' -m 192.168.122.101'
This one's a relic of when I (mistakenly) thought we were getting transaction support in 389-ds-base for 6.x. The change that disabled the incorrect attempt at supporting transactions was tracked as bug #829502.