Bug 1176756 - glusterd: remote locking failure when multiple synctask transactions are run
Summary: glusterd: remote locking failure when multiple synctask transactions are run
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Atin Mukherjee
QA Contact:
URL:
Whiteboard:
Depends On: 1173414 1182458
Blocks: glusterfs-3.6.3
TreeView+ depends on / blocked
 
Reported: 2014-12-23 04:23 UTC by Atin Mukherjee
Modified: 2016-01-08 09:17 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1173414
Environment:
Last Closed: 2016-01-08 09:17:59 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Atin Mukherjee 2014-12-23 04:23:26 UTC
+++ This bug was initially created as a clone of Bug #1173414 +++

Description of problem:

When two volume set operations are run in two different volumes simultaneously in a loop some volume set transactions fail with a remote lock failure. 

Version-Release number of selected component (if applicable):
Mainline

How reproducible:
Always

Steps to Reproduce:
1. Setup a 2 node cluster
2. Create two volumes say vol1 & vol2  & start them
3. Run following script from any one of the node in the cluster
for i in {1..10} 
do
gluster v set vol1 diagnostics.client-log-level DEBUG &
gluster v set vol2 features.barrier on
done

Actual results:
Some of the transaction fails saying "Locking failed in <Peer node>, Please check log file for details"

Expected results:
Local locking might fail, but remote locking should never fail here.

Additional info:

--- Additional comment from Anand Avati on 2014-12-12 00:50:13 EST ---

REVIEW: http://review.gluster.org/9269 (glusterd: Maintain  per transaction xaction_peers list in syncop) posted (#1) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-12-16 07:05:30 EST ---

REVIEW: http://review.gluster.org/9269 (glusterd: Maintain  per transaction xaction_peers list in syncop & mgmt_v3) posted (#2) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-12-17 01:52:55 EST ---

REVIEW: http://review.gluster.org/9269 (glusterd: Maintain  per transaction xaction_peers list in syncop & mgmt_v3) posted (#3) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-12-22 02:00:50 EST ---

REVIEW: http://review.gluster.org/9269 (glusterd: Maintain  per transaction xaction_peers list in syncop & mgmt_v3) posted (#4) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-12-22 03:39:26 EST ---

REVIEW: http://review.gluster.org/9269 (glusterd: Maintain  per transaction xaction_peers list in syncop & mgmt_v3) posted (#5) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-12-22 23:14:19 EST ---

COMMIT: http://review.gluster.org/9269 committed in master by Kaushal M (kaushal) 
------
commit da9deb54df91dedc51ebe165f3a0be646455cb5b
Author: Atin Mukherjee <amukherj>
Date:   Fri Dec 12 07:21:19 2014 +0530

    glusterd: Maintain  per transaction xaction_peers list in syncop & mgmt_v3
    
    In current implementation xaction_peers list is maintained in a global variable
    (glustrd_priv_t) for syncop/mgmt_v3. This means consistency and atomicity of
    peerinfo list across transactions is not guranteed when multiple syncop/mgmt_v3
    transaction are going through.
    
    We had got into a problem in mgmt_v3-locks.t which was failing spuriously, the
    reason for that was two volume set operations (in two different volume) was
    going through simultaneouly and both of these transaction were manipulating the
    same xaction_peers structure which lead to a corrupted list. Because of which in
    some cases unlock request to peer was never triggered and we end up with having
    stale locks.
    
    Solution is to maintain a per transaction local xaction_peers list for every
    syncop.
    
    Please note I've identified this problem in op-sm area as well and a separate
    patch will be attempted to fix it.
    
    Finally thanks to Krishnan Parthasarathi and Kaushal M for your constant help to
    get to the root cause.
    
    Change-Id: Ib1eaac9e5c8fc319f4e7f8d2ad965bc1357a7c63
    BUG: 1173414
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/9269
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Kaushal M <kaushal>

Comment 1 Anand Avati 2014-12-23 05:35:45 UTC
REVIEW: http://review.gluster.org/9328 (glusterd: Maintain  per transaction xaction_peers list in syncop & mgmt_v3) posted (#1) for review on release-3.6 by Atin Mukherjee (amukherj)

Comment 2 Anand Avati 2014-12-26 08:05:09 UTC
REVIEW: http://review.gluster.org/9328 (glusterd: Maintain  per transaction xaction_peers list in syncop & mgmt_v3) posted (#2) for review on release-3.6 by Atin Mukherjee (amukherj)

Comment 3 Anand Avati 2015-01-11 12:12:34 UTC
REVIEW: http://review.gluster.org/9328 (glusterd: Maintain  per transaction xaction_peers list in syncop & mgmt_v3) posted (#3) for review on release-3.6 by Atin Mukherjee (amukherj)

Comment 4 Anand Avati 2015-01-11 12:51:08 UTC
REVIEW: http://review.gluster.org/9328 (glusterd: Maintain  per transaction xaction_peers list in syncop & mgmt_v3) posted (#4) for review on release-3.6 by Atin Mukherjee (amukherj)

Comment 5 Anand Avati 2015-02-26 11:57:47 UTC
REVIEW: http://review.gluster.org/9328 (glusterd: Maintain  per transaction xaction_peers list in syncop & mgmt_v3) posted (#5) for review on release-3.6 by Atin Mukherjee (amukherj)

Comment 6 Anand Avati 2015-02-26 12:00:37 UTC
COMMIT: http://review.gluster.org/9328 committed in release-3.6 by Raghavendra Bhat (raghavendra) 
------
commit a1d9f01b28267fc333aebc49cb81ee69dc2c24f8
Author: Atin Mukherjee <amukherj>
Date:   Fri Dec 12 07:21:19 2014 +0530

    glusterd: Maintain  per transaction xaction_peers list in syncop & mgmt_v3
    
    In current implementation xaction_peers list is maintained in a global variable
    (glustrd_priv_t) for syncop/mgmt_v3. This means consistency and atomicity of
    peerinfo list across transactions is not guranteed when multiple syncop/mgmt_v3
    transaction are going through.
    
    We had got into a problem in mgmt_v3-locks.t which was failing spuriously, the
    reason for that was two volume set operations (in two different volume) was
    going through simultaneouly and both of these transaction were manipulating the
    same xaction_peers structure which lead to a corrupted list. Because of which in
    some cases unlock request to peer was never triggered and we end up with having
    stale locks.
    
    Solution is to maintain a per transaction local xaction_peers list for every
    syncop.
    
    Please note I've identified this problem in op-sm area as well and a separate
    patch will be attempted to fix it.
    
    Finally thanks to Krishnan Parthasarathi and Kaushal M for your constant help to
    get to the root cause.
    
    Backport URL : http://review.gluster.org/#/c/9269/
                   http://review.gluster.org/#/c/9422/
                   http://review.gluster.org/#/c/9350/
    
    Change-Id: Ib1eaac9e5c8fc319f4e7f8d2ad965bc1357a7c63
    BUG: 1176756
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/9269
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/9328
    Reviewed-by: Raghavendra Bhat <raghavendra>
    Tested-by: Raghavendra Bhat <raghavendra>


Note You need to log in before you can comment on or make changes to this bug.