+++ This bug was initially created as a clone of Bug #1085671 +++ Description of problem: When seting barrier timeout to a x seconds, the timeout event still relies on the default timeout value. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Set the barrier timeout to 18000 seconds. 2. Take a statedump of the volume and verify the timeout Actual results: timeout still reflects a default value. Expected results: timeout should be reconfigured. Additional info: --- Additional comment from Anand Avati on 2014-04-09 02:32:24 EDT --- REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#1) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Anand Avati on 2014-04-09 03:11:40 EDT --- REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#2) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Anand Avati on 2014-04-09 04:47:58 EDT --- REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#3) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Anand Avati on 2014-04-09 05:17:34 EDT --- REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#4) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Anand Avati on 2014-04-09 05:19:08 EDT --- REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#5) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Anand Avati on 2014-04-10 00:58:55 EDT --- REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#6) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Anand Avati on 2014-04-10 04:59:59 EDT --- REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#7) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Anand Avati on 2014-04-10 05:08:50 EDT --- REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#8) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Anand Avati on 2014-04-10 06:00:11 EDT --- REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#9) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Anand Avati on 2014-04-11 02:19:21 EDT --- REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#10) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Anand Avati on 2014-04-15 02:33:03 EDT --- REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#11) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Anand Avati on 2014-04-15 02:55:27 EDT --- REVIEW: http://review.gluster.org/7428 (glusterfs-server : barrier timeout tuning fix) posted (#12) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Anand Avati on 2014-04-22 16:03:18 EDT --- COMMIT: http://review.gluster.org/7428 committed in master by Vijay Bellur (vbellur) ------ commit b6cc23204f1941184cb08ec3d84beecd2d06fd91 Author: Atin Mukherjee <amukherj> Date: Wed Apr 9 11:53:33 2014 +0530 glusterfs-server : barrier timeout tuning fix Problem : Reconfiguration of barrier timeout through gluster volume set shows a success but it never changes the default timeout value which is 120 seconds. After digging into the code deeper, it was found that timeout is never modified in reconfigure() as the first check i.e. whether barrier is already enabled or disabled always fails since barrier option is not modified in this request. Fix : Introduced notify() in barrier translator which will take care of the rpc request to enable/disable barrier. reconfigure() will simply set barrier enable/disable and timeout options blindly without any validation. Please note this patch only contains the changes in barrier translator however from complete code flow perspective the caller in the glusterfsd mgmt should call notify instead of reconfigure to fix this problem. Change-Id: I1371b294935f6054da7c1dc6a9a19f1d861e60fb BUG: 1085671 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: http://review.gluster.org/7428 Reviewed-by: Varun Shastry <vshastry> Reviewed-by: Krishnan Parthasarathi <kparthas> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Vijay Bellur <vbellur>
RCA --- configuration of barrier timeout through gluster volume set shows a success but it never changes the default timeout value which is 120 seconds. After digging into the code deeper, it was found that timeout is never modified in reconfigure() as the first check i.e. whether barrier is already enabled or disabled always fails since barrier option is not modified in this request. Fix --- Introduced notify() in barrier translator which will take care of the rpc request to enable/disable barrier. reconfigure() will simply set barrier enable/disable and timeout options blindly without any validation. Please note this patch only contains the changes in barrier translator however from complete code flow perspective the caller in the glusterfsd mgmt should call notify instead of reconfigure to fix this problem. Fix http://review.gluster.org/7428 is backported in downstream.
Verified with glusterfs-3.6.0.8-1.el6rhs Followed the following steps, 1. Set the barrier-timeout 600 seconds (ie.) gluster volume set <vol-name> barrier-timeout 600 2. Enable barrier on the volume 3. Take the statedump of the volume (i.e) gluster volume statedump <vol-name> 4. Remove a file from the mount and calculate the time taken (i.e) time rm -rf <file-on-mount> Result : 1. Statedump had barrier-timeout value as 600 2. It took ~10 min for which unlink operation was hung [root@rhs-client10 test]# time rm -rf file5 real 9m52.024s user 0m0.001s sys 0m0.001s Repeated above test for various values of barrier-timeout and found it was set correctly
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html