Bug 1100567

Summary:	[barrier] reconfiguration of barrier time out does not work
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Atin Mukherjee <amukherj>
Component:	core	Assignee:	Atin Mukherjee <amukherj>
Status:	CLOSED ERRATA	QA Contact:	SATHEESARAN <sasundar>
Severity:	medium	Docs Contact:
Priority:	high
Version:	rhgs-3.0	CC:	gluster-bugs, kparthas, nsathyan, rhs-bugs, sasundar, sdharane, storage-qa-internal
Target Milestone:	---
Target Release:	RHGS 3.0.0
Hardware:	Unspecified
OS:	All
Whiteboard:
Fixed In Version:	glusterfs-3.6.0.3-1.el6rhs	Doc Type:	Bug Fix
Doc Text:	barrier timeout reconfiguration was not working through volume set command. Investigation revealed the following: Reconfiguration of barrier timeout through gluster volume set shows a success but it never changes the default timeout value which is 120 seconds. After digging into the code deeper, it was found that timeout is never modified in reconfigure() as the first check i.e. whether barrier is already enabled or disabled always fails since barrier option is not modified in this request. Fix --- Introduced notify() in barrier translator which will take care of the rpc request to enable/disable barrier. reconfigure() will simply set barrier enable/disable and timeout options blindly without any validation.	Story Points:	---
Clone Of:	1085671	Environment:
Last Closed:	2014-09-22 19:39:12 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1085671
Bug Blocks:

Description Atin Mukherjee 2014-05-23 06:15:54 UTC

+++ This bug was initially created as a clone of Bug #1085671 +++

Description of problem:

When seting barrier timeout to a x seconds, the timeout event still relies on the default timeout value.
Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Set the barrier timeout to 18000 seconds. 
2. Take a statedump of the volume and verify the timeout


Actual results:
timeout still reflects a default value.

Expected results:
timeout should be reconfigured.

Additional info:

--- Additional comment from Anand Avati on 2014-04-09 02:32:24 EDT ---

REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#1) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-04-09 03:11:40 EDT ---

REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#2) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-04-09 04:47:58 EDT ---

REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#3) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-04-09 05:17:34 EDT ---

REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#4) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-04-09 05:19:08 EDT ---

REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#5) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-04-10 00:58:55 EDT ---

REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#6) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-04-10 04:59:59 EDT ---

REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#7) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-04-10 05:08:50 EDT ---

REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#8) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-04-10 06:00:11 EDT ---

REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#9) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-04-11 02:19:21 EDT ---

REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#10) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-04-15 02:33:03 EDT ---

REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#11) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-04-15 02:55:27 EDT ---

REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#12) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Anand Avati on 2014-04-22 16:03:18 EDT ---

COMMIT: http://review.gluster.org/7428 committed in master by Vijay Bellur (vbellur) 
------
commit b6cc23204f1941184cb08ec3d84beecd2d06fd91
Author: Atin Mukherjee <amukherj>
Date:   Wed Apr 9 11:53:33 2014 +0530

    glusterfs-server : barrier timeout tuning fix
    
    Problem : Reconfiguration of barrier timeout through gluster volume set shows a
    success but it never changes the default timeout value which is 120 seconds.
    After digging into the code deeper, it was found that timeout is never modified
    in reconfigure() as the first check i.e. whether barrier is already enabled or
    disabled always fails since barrier option is not modified in this request.
    
    Fix : Introduced notify() in barrier translator which will take care of the rpc
    request to enable/disable barrier. reconfigure() will simply set barrier
    enable/disable and timeout options blindly without any validation.
    
    Please note this patch only contains the changes in barrier translator however
    from complete code flow perspective the caller in the glusterfsd mgmt should
    call notify instead of reconfigure to fix this problem.
    
    Change-Id: I1371b294935f6054da7c1dc6a9a19f1d861e60fb
    BUG: 1085671
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/7428
    Reviewed-by: Varun Shastry <vshastry>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 1 Atin Mukherjee 2014-05-23 06:20:21 UTC

RCA
---
configuration of barrier timeout through gluster volume set shows a success but it never changes the default timeout value which is 120 seconds. After digging into the code deeper, it was found that timeout is never modified in reconfigure() as the first check i.e. whether barrier is already enabled or   disabled always fails since barrier option is not modified in this request.

Fix
---
Introduced notify() in barrier translator which will take care of the rpc   request to enable/disable barrier. reconfigure() will simply set barrier    enable/disable and timeout options blindly without any validation.
    
Please note this patch only contains the changes in barrier translator however    from complete code flow perspective the caller in the glusterfsd mgmt should    call notify instead of reconfigure to fix this problem.

Fix http://review.gluster.org/7428 is backported in downstream.

Comment 3 SATHEESARAN 2014-05-28 08:39:50 UTC

Verified with glusterfs-3.6.0.8-1.el6rhs

Followed the following steps,
1. Set the barrier-timeout 600 seconds
(ie.) gluster volume set <vol-name> barrier-timeout 600
2. Enable barrier on the volume
3. Take the statedump of the volume
(i.e) gluster volume statedump <vol-name>
4. Remove a file from the mount and calculate the time taken
(i.e) time rm -rf <file-on-mount>

Result :
1. Statedump had barrier-timeout value as 600

2. It took ~10 min for which unlink operation was hung
[root@rhs-client10 test]# time rm -rf file5

real    9m52.024s
user    0m0.001s
sys     0m0.001s

Repeated above test for various values of barrier-timeout and found it was set correctly

Comment 5 errata-xmlrpc 2014-09-22 19:39:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html