Bug 1249921

Summary: [upgrade] After upgrade from 3.5 to 3.6 onwards version, bumping up op-version failed
Product: [Community] GlusterFS Reporter: Atin Mukherjee <amukherj>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.7.3CC: bugs, gluster-bugs, nlevinki, rhs-bugs, sasundar, vbellur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: glusterd
Fixed In Version: glusterfs-3.7.4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1248298 Environment:
Last Closed: 2015-09-09 09:38:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1248298, 1250836    
Bug Blocks:    

Description Atin Mukherjee 2015-08-04 07:18:21 UTC
+++ This bug was initially created as a clone of Bug #1248298 +++

+++ This bug was initially created as a clone of Bug #1247947 +++

Description of problem:
------------------------
Upgraded 3.5 nodes to 3.6/3.7.
After upgrade, bumping up op-version to 30703 failed

Version-Release number of selected component (if applicable):
--------------------------------------------------------------
mainline

How reproducible:
------------------
Always

Steps to Reproduce:
--------------------
1. Upgrade 3.5 Nodes to 3.6/3.7
2. After upgrade bump up op-version to 30703

Actual results:
---------------
Bumping up op-version failed

Expected results:
-----------------
Bumping up op-version should succeed

Additional info:
----------------
[2015-07-29 11:50:31.860731]  : volume set all cluster.op-version 30703 : FAILED :

[root@ ~]# gluster volume get drvol op-version
Option Value
------ -----
cluster.op-version                      30703

Following are the logs from 2 nodes.

NODE-1
----------
[2015-07-29 11:50:31.860355] E [MSGID: 106116] [glusterd-mgmt.c:134:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on dhcp37-126.lab.eng.blr.redhat.com. Please check log file for details.
[2015-07-29 11:50:31.860493] E [MSGID: 106152] [glusterd-syncop.c:1562:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s)
[2015-07-29 11:50:31.860587] E [MSGID: 106025] [glusterd-locks.c:641:glusterd_mgmt_v3_unlock] 0-management: name is null. [Invalid argument]
[2015-07-29 11:50:31.860666] E [MSGID: 106118] [glusterd-syncop.c:1588:gd_unlock_op_phase] 0-management: Unable to release lock for (null)
[2015-07-29 11:50:31.875251] I [run.c:190:runner_log] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fdcd220c5e0] (--> /usr/lib64/libglusterfs.so.0(runner_log+0x105)[0x7fdcd225ff95] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x4cc)[0x7fdcc6cac10c] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(+0xed422)[0x7fdcc6cac422] (--> /lib64/libpthread.so.0(+0x3429c07a51)[0x7fdcd12f3a51] ))))) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh --volname=all -o cluster.op-version=30703 --gd-workdir=/var/lib/glusterd
[2015-07-29 11:50:31.893561] I [run.c:190:runner_log] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fdcd220c5e0] (--> /usr/lib64/libglusterfs.so.0(runner_log+0x105)[0x7fdcd225ff95] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x4cc)[0x7fdcc6cac10c] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(+0xed422)[0x7fdcc6cac422] (--> /lib64/libpthread.so.0(+0x3429c07a51)[0x7fdcd12f3a51] ))))) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh --volname=all -o cluster.op-version=30703 --gd-workdir=/var/lib/glusterd


NODE-2
-----------

[2015-07-29 11:50:31.622533] E [MSGID: 106118] [glusterd-op-sm.c:3619:glusterd_op_ac_unlock] 0-management: Unable to release lock for all
[2015-07-29 11:50:31.622788] E [MSGID: 106376] [glusterd-op-sm.c:7286:glusterd_op_sm] 0-management: handler returned: -1

--- Additional comment from SATHEESARAN on 2015-07-29 07:32:24 EDT ---

The volume set fails, but the op-version actually got bumped up.
There are no problems functionally

--- Additional comment from Anand Avati on 2015-07-30 00:20:01 EDT ---

REVIEW: http://review.gluster.org/11798 (glusterd: fix op-version bump up flow) posted (#1) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2015-08-04 00:26:01 EDT ---

COMMIT: http://review.gluster.org/11798 committed in master by Kaushal M (kaushal) 
------
commit b467b97e4c4546b7f870a3ac624d56c62bfa5cf9
Author: Atin Mukherjee <amukherj>
Date:   Thu Jul 30 09:40:24 2015 +0530

    glusterd: fix op-version bump up flow
    
    If a cluster is upgraded from 3.5 to latest version, gluster volume set all
    cluster.op-version <VERSION> will throw an error message back to the user saying
    unlocking failed. This is because of trying to release a volume wise lock in
    unlock phase as the lock was taken cluster wide. The problem surfaced because
    the op-version is updated in commit phase and unlocking works in the v3
    framework where it should have used cluster unlock.
    
    Fix is to decide which lock/unlock is to be followed before invoking lock phase
    
    Change-Id: Iefb271a058431fe336a493c24d240ed833f279c5
    BUG: 1248298
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/11798
    Reviewed-by: Avra Sengupta <asengupt>
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Anand Nekkunti <anekkunt>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Kaushal M <kaushal>

Comment 1 Atin Mukherjee 2015-08-04 07:20:24 UTC
Patch link : http://review.gluster.org/#/c/11822

Comment 2 Anand Avati 2015-08-06 06:44:44 UTC
COMMIT: http://review.gluster.org/11822 committed in release-3.7 by Atin Mukherjee (amukherj) 
------
commit 10864cdcc039f4c1a85ecd8dbeb6fba0fc539d4e
Author: Atin Mukherjee <amukherj>
Date:   Thu Jul 30 09:40:24 2015 +0530

    glusterd: fix op-version bump up flow
    
    Backport of http://review.gluster.org/#/c/11798/
    
    If a cluster is upgraded from 3.5 to latest version, gluster volume set all
    cluster.op-version <VERSION> will throw an error message back to the user saying
    unlocking failed. This is because of trying to release a volume wise lock in
    unlock phase as the lock was taken cluster wide. The problem surfaced because
    the op-version is updated in commit phase and unlocking works in the v3
    framework where it should have used cluster unlock.
    
    Fix is to decide which lock/unlock is to be followed before invoking lock phase
    
    Change-Id: Iefb271a058431fe336a493c24d240ed833f279c5
    BUG: 1249921
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/11798
    Reviewed-by: Avra Sengupta <asengupt>
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Anand Nekkunti <anekkunt>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/11822

Comment 3 Kaushal 2015-09-09 09:38:55 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.4, please open a new bug report.

glusterfs-3.7.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12496
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user