Description of problem: When two volume set operations are run in two different volumes simultaneously in a loop some volume set transactions fail with a remote lock failure. Version-Release number of selected component (if applicable): Mainline How reproducible: Always Steps to Reproduce: 1. Setup a 2 node cluster 2. Create two volumes say vol1 & vol2 & start them 3. Run following script from any one of the node in the cluster for i in {1..10} do gluster v set vol1 diagnostics.client-log-level DEBUG & gluster v set vol2 features.barrier on done Actual results: Some of the transaction fails saying "Locking failed in <Peer node>, Please check log file for details" Expected results: Local locking might fail, but remote locking should never fail here. Additional info:
REVIEW: http://review.gluster.org/9269 (glusterd: Maintain per transaction xaction_peers list in syncop) posted (#1) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/9269 (glusterd: Maintain per transaction xaction_peers list in syncop & mgmt_v3) posted (#2) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/9269 (glusterd: Maintain per transaction xaction_peers list in syncop & mgmt_v3) posted (#3) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/9269 (glusterd: Maintain per transaction xaction_peers list in syncop & mgmt_v3) posted (#4) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/9269 (glusterd: Maintain per transaction xaction_peers list in syncop & mgmt_v3) posted (#5) for review on master by Atin Mukherjee (amukherj)
COMMIT: http://review.gluster.org/9269 committed in master by Kaushal M (kaushal) ------ commit da9deb54df91dedc51ebe165f3a0be646455cb5b Author: Atin Mukherjee <amukherj> Date: Fri Dec 12 07:21:19 2014 +0530 glusterd: Maintain per transaction xaction_peers list in syncop & mgmt_v3 In current implementation xaction_peers list is maintained in a global variable (glustrd_priv_t) for syncop/mgmt_v3. This means consistency and atomicity of peerinfo list across transactions is not guranteed when multiple syncop/mgmt_v3 transaction are going through. We had got into a problem in mgmt_v3-locks.t which was failing spuriously, the reason for that was two volume set operations (in two different volume) was going through simultaneouly and both of these transaction were manipulating the same xaction_peers structure which lead to a corrupted list. Because of which in some cases unlock request to peer was never triggered and we end up with having stale locks. Solution is to maintain a per transaction local xaction_peers list for every syncop. Please note I've identified this problem in op-sm area as well and a separate patch will be attempted to fix it. Finally thanks to Krishnan Parthasarathi and Kaushal M for your constant help to get to the root cause. Change-Id: Ib1eaac9e5c8fc319f4e7f8d2ad965bc1357a7c63 BUG: 1173414 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: http://review.gluster.org/9269 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Kaushal M <kaushal>
REVIEW: http://review.gluster.org/9350 (glusterd: cluster qourum count check correction) posted (#1) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/9350 (glusterd: cluster qourum count check correction) posted (#2) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/9350 (glusterd: cluster qourum count check correction) posted (#3) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/9350 (glusterd: cluster quorum count check correction) posted (#4) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/9350 (glusterd: cluster quorum count check correction) posted (#5) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/9350 (glusterd: cluster quorum count check correction) posted (#6) for review on master by Atin Mukherjee (amukherj)
COMMIT: http://review.gluster.org/9350 committed in master by Kaushal M (kaushal) ------ commit 6e2318f0821d7c58eddc837b2d218247243a5c8d Author: Atin Mukherjee <amukherj> Date: Fri Dec 26 12:18:31 2014 +0530 glusterd: cluster quorum count check correction Due to the recent change introduced by commit da9deb54df91dedc51ebe165f3a0be646455cb5b cluster quorum count calucation now depends on whether the peer list is either all peers or global transaction peer list or the local transaction peer list. Change-Id: I9f63af9a0cb3cfd6369b050247d0ef3ac93d760f BUG: 1173414 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: http://review.gluster.org/9350 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Krishnan Parthasarathi <kparthas> Reviewed-by: Raghavendra Bhat <raghavendra> Reviewed-by: Avra Sengupta <asengupt> Reviewed-by: Kaushal M <kaushal>
REVIEW: http://review.gluster.org/9416 (glusterd: use list_for_each_entry_safe for cleanup) posted (#1) for review on master by Avra Sengupta (asengupt)
COMMIT: http://review.gluster.org/9416 committed in master by Krishnan Parthasarathi (kparthas) ------ commit 05d3dfb9623f0939fa807cce3b9336a09fadab2a Author: Avra Sengupta <asengupt> Date: Thu Jan 8 08:35:33 2015 +0000 glusterd: use list_for_each_entry_safe for cleanup Use list_for_each_entry_safe() instead of list_for_each_entry() for cleanup of local xaction_peers list. Change-Id: I6d70c04dfb90cbbcd8d9fc4155b8e5e7d7612460 BUG: 1173414 Signed-off-by: Avra Sengupta <asengupt> Reviewed-on: http://review.gluster.org/9416 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Krishnan Parthasarathi <kparthas> Tested-by: Krishnan Parthasarathi <kparthas>
REVIEW: http://review.gluster.org/9422 (glusterd: quorum calculation should happen on global peer_list) posted (#1) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/9422 (glusterd: quorum calculation should happen on global peer_list) posted (#2) for review on master by Atin Mukherjee (amukherj)
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user