Bug 766320

Summary: Hang possible in schema compat when calling in a transaction
Product: Red Hat Enterprise Linux 6 Reporter: Dmitri Pal <dpal>
Component: slapi-nisAssignee: Nalin Dahyabhai <nalin>
Status: CLOSED WONTFIX QA Contact: IDM QE LIST <seceng-idm-qe-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.0CC: abokovoy, jgalipea, nalin, rcritten, spoore, syeghiay
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: slapi-nis-0.37-1.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 758830 Environment:
Last Closed: 2013-03-11 19:52:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 758830, 759183    
Bug Blocks:    

Description Dmitri Pal 2011-12-11 16:59:33 UTC
+++ This bug was initially created as a clone of Bug #758830 +++

Created attachment 538758 [details]
Carry the current transaction id in some new pblocks

Description of problem:

I'm in the process of converting the IPA password plugin to execute as a transaction plugin. This causes 389-ds to hang in the backend when the schema-compat plugin is invoked as part of the call process.

Disabling schema compat resolves the issue.

The problem exhibits itself when a plugin calls slapi_pblock_new() or slapi_pblock_init() when inside a transaction and doesn't get the parent TXN value. 

I've attached a patch that applies this narrowly. I haven't tested all possible paths but I'm able to do these which I think exercises it fairly well: 

1. ipa user-add --first Tim --last User tuser1 --password
2. ldapsearch -Y GSSAPI -b cn=compat,dc=example,dc=com
3. ypcat -h `hostname` -d example.com passwd
4. ipa unit tests

I didn't carry this across to all new pblock creation as I'm not that familiar with the code.

Version-Release number of selected component (if applicable):

slapi-nis-0.25-1.fc15
389-ds-base-1.2.10-0.5.a5.fc15

--- Additional comment from nalin on 2011-11-30 18:01:16 EST ---

If the compat tree needs to support the "Read Your Writes" semantics outlined at http://www.directory.fedora.redhat.com/wiki/LDAP_Transactions , then it'll need to do more than simply carry the transaction information to the pblocks used for internal operations when it needs to update its cached data: that cache is going to have to become transaction-aware so that searching it works as expected.

--- Additional comment from rcritten on 2011-12-01 11:09:05 EST ---

Note that I can cause a similar deadlock with schema-compat disabled and memberof configured as betxnpostoperation, BZ https://bugzilla.redhat.com/show_bug.cgi?id=759183

Comment 1 Jenny Severance 2011-12-12 15:42:27 UTC
Can you please provide steps to reproduce for QE without devel environment and unit test... Thanks

Comment 2 Rob Crittenden 2011-12-12 17:54:39 UTC
There is nothing to QE yet. This was discovered while converting the password plugin to use transactions. The code is not committed because it causes the server to hang in schema-compat (the root cause may lie in 389-ds and not the schema-compat plugin).

If you want to see a similar hang then in dn: cn=MemberOf Plugin,cn=plugins,cn=config set nsslapd-plugintype to betxnpostoperation and restart 389-ds. Add a user and you'll see it hang.

Comment 4 Jenny Severance 2012-05-03 18:32:23 UTC
Please add steps to verify/reproduce.

Comment 5 Scott Poore 2012-05-04 20:28:10 UTC
Is there anything to QE yet here?   

Has it been determined that this hang is the same as bug 759183?   Will the same test work here?

Comment 6 Nalin Dahyabhai 2012-05-04 21:27:24 UTC
I can confirm that this is a specific case of what appears to be described in that bug.  Basically, if you are going to access the ldbm back-end (even for reading) while inside of a transaction, you need to perform those accesses while referring to the same transaction's ID, or you risk a deadlock.  This mistake is easy to make if a plugin is unaware of concept of transactions, as the plugins in this package were.

Configuring this plugin in combination with memberOf or mep set up as betxnpost plugins should exercise the involved code paths, provided one of the entries that is modified by the other plugins will be read by one of the plugins from this package.

Comment 7 Scott Poore 2012-05-07 21:00:16 UTC
Ok, so this should work as a test then?

# cat /tmp/ldapmodify.test 
dn: cn=MemberOf Plugin,cn=plugins,cn=config
changetype: modify
replace: nsslapd-plugintype
nsslapd-plugintype: betxnpostoperation

# ldapmodify -D "$ROOTDN" -w "$ROOTDNPWD" -f /tmp/ldapmodify.test 
modifying entry "cn=MemberOf Plugin,cn=plugins,cn=config"

# ldapsearch -LLL -x -D "$ROOTDN" -w "$ROOTDNPWD" -b "cn=MemberOf Plugin,cn=plugins,cn=config"|grep betxnpostoperation
nsslapd-pluginType: betxnpostoperation

# ipa group-add-member --users=admin editors
  Group name: editors
  Description: Limited admins who can edit other users
  GID: 3002
  Member users: admin
-------------------------
Number of members added 1
-------------------------

Comment 8 Nalin Dahyabhai 2012-05-07 21:59:07 UTC
You'd also need to ensure that the schema compat plugin was enabled for groups,  that the group membership was set up in a way that the schema compat plugin would need to read another entry to build up the synthetic entry it will provide, and that the schema compat plugin is not aware of transactions.

The first one, I can't tell from the output above, but enabling it with ipa-compat-manage should be enough.  The second requires that the group membership is recorded by setting "member" to the user's entry's DN, which IPA should already be doing correctly (you can verify using "ipa group-show editors --raw" to examine the group entry).  The third is easiest to test by comparing slapi-nis 0.31 or earlier with 0.32 and later.

Comment 9 Scott Poore 2012-05-08 00:31:40 UTC
Ok, the first two I get but, for the third, you're saying compare the src to see if it's transaction aware, right?  Is there a way to tell from the binary rpm if it's a transaction aware version of the plugin?  Or, is version > 0.31 accurate there as an indicator?

# echo $ADMINPW|ipa-compat-manage enable

Plugin already Enabled

# echo $ADMINPW|ipa-nis-manage enable

Enabling plugin
Restarting IPA to initialize updates before performing deletes:
  [1/2]: stopping directory server
  [2/2]: starting directory server
done configuring dirsrv.
This setting will not take effect until you restart Directory Server.
The rpcbind service may need to be started.

# service rpcbind restart
Stopping rpcbind:                                          [  OK  ]
Starting rpcbind:                                          [  OK  ]

# service dirsrv restart
Shutting down dirsrv: 
    PKI-IPA...                                             [  OK  ]
    TESTRELM-COM...                                        [  OK  ]
Starting dirsrv: 
    PKI-IPA...                                             [  OK  ]
    TESTRELM-COM...                                        [  OK  ]

# ipactl status
Directory Service: RUNNING
KDC Service: RUNNING
KPASSWD Service: RUNNING
DNS Service: RUNNING
HTTP Service: RUNNING
CA Service: RUNNING

# cat > /tmp/bz766320_hang.ldif <<-EOF
> dn: cn=MemberOf Plugin,cn=plugins,cn=config
> changetype: modify
> replace: nsslapd-plugintype
> nsslapd-plugintype: betxnpostoperation
> EOF

# rlRun "ldapmodify -D \"$ROOTDN\" -w \"$ROOTDNPWD\" -f /tmp/bz766320_hang.ldif"
modifying entry "cn=MemberOf Plugin,cn=plugins,cn=config"

:: [   PASS   ] :: Running 'ldapmodify -D "cn=Directory Manager" -w "Secret123" -f /tmp/bz766320_hang.ldif'

# ldapsearch -LLL -x -D "$ROOTDN" -w "$ROOTDNPWD" -b "cn=MemberOf Plugin,cn=plugins,cn=config"
dn: cn=MemberOf Plugin,cn=plugins,cn=config
objectClass: top
objectClass: nsSlapdPlugin
objectClass: extensibleObject
cn: MemberOf Plugin
nsslapd-pluginPath: libmemberof-plugin
nsslapd-pluginInitfunc: memberof_postop_init
nsslapd-pluginType: betxnpostoperation
nsslapd-pluginEnabled: on
nsslapd-plugin-depends-on-type: database
memberofgroupattr: member
memberofgroupattr: memberUser
memberofgroupattr: memberHost
memberofattr: memberOf
nsslapd-pluginId: memberof
nsslapd-pluginVersion: 1.2.9.14
nsslapd-pluginVendor: 389 Project
nsslapd-pluginDescription: memberof plugin


# ipa group-add-member --users=admin editors
  Group name: editors
  Description: Limited admins who can edit other users
  GID: 1762200002
  Member users: admin
-------------------------
Number of members added 1
-------------------------

# ipa group-show editors --raw
  cn: editors
  description: Limited admins who can edit other users
  gidnumber: 1762200002
  member: uid=admin,cn=users,cn=accounts,dc=testrelm,dc=com


Looking at the source code with the example from the initial attachment, I can see some code that seems to be related to transactions for the 0.40-1 version for RHEL6:

$ diff -up --recursive slapi-nis-0.26-1/slapi-nis-0.26/src/back-shr.c slapi-nis-0.40-1/slapi-nis-0.40/src/back-shr.c |grep -i txn
+#ifdef USE_SLAPI_BE_TXNS
+		void *txn;
+		txn = NULL;
+#ifdef SLAPI_TXN
+		slapi_pblock_get(cbdata->pb, SLAPI_TXN, &txn);
+#ifdef SLAPI_TXN
+		if ((txn == NULL) && (strcmp(be_type, "ldbm database") == 0)) {
+#ifdef USE_SLAPI_BE_TXNS
+		void *txn;
+		txn = NULL;
+#ifdef SLAPI_TXN
+		slapi_pblock_get(pb, SLAPI_TXN, &txn);
+#ifdef SLAPI_TXN
+		if ((txn == NULL) && (strcmp(be_type, "ldbm database") == 0)) {
+#ifdef USE_SLAPI_BE_TXNS
+		void *txn;
+		txn = NULL;
+#ifdef SLAPI_TXN
+		slapi_pblock_get(pb, SLAPI_TXN, &txn);
+#ifdef SLAPI_TXN
+		if ((txn == NULL) && (strcmp(be_type, "ldbm database") == 0)) {
+#ifdef USE_SLAPI_BE_TXNS
+		void *txn;
+		txn = NULL;
+#ifdef SLAPI_TXN
+		slapi_pblock_get(pb, SLAPI_TXN, &txn);
+#ifdef SLAPI_TXN
+		if ((txn == NULL) && (strcmp(be_type, "ldbm database") == 0)) {
+#ifdef USE_SLAPI_BE_TXNS
+		void *txn;
+		txn = NULL;
+#ifdef SLAPI_TXN
+		slapi_pblock_get(pb, SLAPI_TXN, &txn);
+#ifdef SLAPI_TXN
+		if ((txn == NULL) && (strcmp(be_type, "ldbm database") == 0)) {
+#ifdef USE_SLAPI_BE_TXNS
+backend_shr_be_txn_post_add_cb(Slapi_PBlock *pb)
+backend_shr_be_txn_post_modify_cb(Slapi_PBlock *pb)
+backend_shr_be_txn_post_modrdn_cb(Slapi_PBlock *pb)
+backend_shr_be_txn_post_delete_cb(Slapi_PBlock *pb)
+backend_shr_be_txn_postop_init(Slapi_PBlock *pb, struct plugin_state *state)
+	if (slapi_pblock_set(pb, SLAPI_PLUGIN_BE_TXN_POST_ADD_FN,
+			     backend_shr_be_txn_post_add_cb) != 0) {
+				"error hooking up be-txn-post add callback\n");
+	if (slapi_pblock_set(pb, SLAPI_PLUGIN_BE_TXN_POST_MODIFY_FN,
+			     backend_shr_be_txn_post_modify_cb) != 0) {
+				"error hooking up be-txn-post modify callback\n");
+	if (slapi_pblock_set(pb, SLAPI_PLUGIN_BE_TXN_POST_MODRDN_FN,
+			     backend_shr_be_txn_post_modrdn_cb) != 0) {
+				"error hooking up be-txn-post modrdn callback\n");
+	if (slapi_pblock_set(pb, SLAPI_PLUGIN_BE_TXN_POST_DELETE_FN,
+			     backend_shr_be_txn_post_delete_cb) != 0) {
+				"error hooking up be-txn-post delete callback\n");

Comment 10 Nalin Dahyabhai 2012-05-08 17:04:46 UTC
(In reply to comment #9)
> Ok, the first two I get but, for the third, you're saying compare the src to
> see if it's transaction aware, right?  Is there a way to tell from the binary
> rpm if it's a transaction aware version of the plugin?  Or, is version > 0.31
> accurate there as an indicator?

The 0.31/0.32 changes included the addition of transaction awareness.  If you hit a deadlock with 0.31 or ealier, and the backtrace for the deadlocked process (point pstack at the server process to check) includes one of the plugin's callback functions calling back into back-ldbm functions, then you've reproduced the bug.

Comment 11 Scott Poore 2012-05-18 03:01:00 UTC
Verified.

Version ::

slapi-nis-0.40-1.el6.x86_64

Manual Test Verification ::

See output above in Comment #9.

Automated Test Verification ::

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: nisint_bz_766320: Hang possible in schema compat when calling in a transaction
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [21:55:22] ::  Machine in recipe is IPAMASTER
:: [21:55:23] ::  Changing nsslapd plugin type setting to betxnpostoperation to try to reproduce hang
modifying entry "cn=MemberOf Plugin,cn=plugins,cn=config"

:: [   PASS   ] :: Running 'ldapmodify -D "cn=Directory Manager" -w "PASSWORD" -f /tmp/bz766320_hang.ldif'
:: [21:55:23] ::  If job/test hangs after group-add-member, then you have hit BZ 766320
  Group name: editors
  Description: Limited admins who can edit other users
  GID: 834000002
  Member users: admin
-------------------------
Number of members added 1
-------------------------
:: [   PASS   ] :: Running 'ipa group-add-member --users=admin editors'
  cn: editors
  description: Limited admins who can edit other users
  gidnumber: 834000002
  member: uid=admin,cn=users,cn=accounts,dc=testrelm,dc=com
:: [   PASS   ] :: Running 'ipa group-show editors --raw'
Directory Service: RUNNING
KDC Service: RUNNING
KPASSWD Service: RUNNING
DNS Service: RUNNING
MEMCACHE Service: RUNNING
HTTP Service: RUNNING
CA Service: RUNNING
:: [   PASS   ] :: Running 'ipactl status'
:: [   PASS   ] :: BZ 766320 not found.  apparently the ipa command didnt hang
:: [21:55:31] ::  Returning plugintype setting back to original
modifying entry "cn=MemberOf Plugin,cn=plugins,cn=config"

:: [   PASS   ] :: Running 'ldapmodify -D "cn=Directory Manager" -w "PASSWORD" -f /tmp/bz766320_fix.ldif'
  Group name: editors
  Description: Limited admins who can edit other users
  GID: 834000002
---------------------------
Number of members removed 1
---------------------------
:: [   PASS   ] :: Running 'ipa group-remove-member --users=admin editors'
result_server not set, assuming developer mode.
Setting 192.168.122.101 to state nisint_bz_766320
:: [   PASS   ] :: Running 'rhts-sync-set -s 'nisint_bz_766320' -m 192.168.122.101'

Comment 15 Nalin Dahyabhai 2013-03-11 19:52:48 UTC
This one's a relic of when I (mistakenly) thought we were getting transaction support in 389-ds-base for 6.x.  The change that disabled the incorrect attempt at supporting transactions was tracked as bug #829502.