Description of problem: If a volume's barrier is removed because the barrier timeout expires, the volume options still show the barrier as being enabled. Version-Release number of selected component (if applicable): 3.7 How reproducible: Always Steps to Reproduce: 1. Enable the barrier on a volume. (gluster volume barrier VOLUME enable) 2. Wait for the barrier timeout to expire. By default, this is two minutes. To see the number of seconds required, run "gluster volume get VOLUME features.barrier-timeout". Actual results: Running "gluster volume get VOLUME features.barrier" shows that the barrier is still enabled, but the barrier has been automatically disabled by Gluster. Expected results: The barrier should be shown as disabled.
I think the barrier option is turned off internally when barrier-timeout is met, but not reflected anywhere else. If you take a brick statedump ( by using gluster volume statedump <volname> all ), you can see that barrier is turned off after barrier-timeout seconds # gluster volume statedump <vol> all # grep barrier.enabled /var/run/gluster/<brickpath>.<brick-pid>.dump.<timestamp> Since the state of barrier is not restored in volume options, volume get is also returning the old value which is 'enable', but its actually disabled Atin, do you think the state of barrier should be dynamically obtained from 'gluster volume get' ?
Jack, You have 2 options to check whether barrier is enabled. 1. Get the volume statedump and grep for 'barrier.enabled' [root@ ~]# grep barrier.enabled /var/run/gluster/rhs-brick1-b1.28902.dump.1458098783 barrier.enabled=0 2. Check for brick logs ( /var/log/glusterfs/bricks/* ) [root@ ~]# grep -i 'barrier timeout' /var/log/glusterfs/bricks/rhs-brick1-b1.log [2016-03-16 03:23:51.353413] C [barrier.c:409:barrier_timeout] 0-test-barrier: Disabling barrier because of the barrier timeout. I am also curious the reason behind using barrier directly. What is the requirement behind it ? For your infromation, barrier is indirectly internally called by snapshot feature, while snapshot is taken.
(In reply to SATHEESARAN from comment #1) > I think the barrier option is turned off internally when barrier-timeout is > met, but not reflected anywhere else. > > If you take a brick statedump ( by using gluster volume statedump <volname> > all ), you can see that barrier is turned off after barrier-timeout seconds > # gluster volume statedump <vol> all > # grep barrier.enabled > /var/run/gluster/<brickpath>.<brick-pid>.dump.<timestamp> > > Since the state of barrier is not restored in volume options, volume get is > also returning the old value which is 'enable', but its actually disabled > > Atin, do you think the state of barrier should be dynamically obtained from > 'gluster volume get' ? That could be better but it involves lot of work. In volume get work flow by default we pick up the value from volume's dictionary, if its not there then we load the xlator dynamically and fetch it from there. In this case we'd have to handle this in a special way otherwise. My take is since barrier as a feature is been used internally by snapshot I think we can still live up with it.
Jack, Could you please share your use case on why are you using barrier feature directly, based on that we can take a call. ~Atin
Satheesaran: Thank you for your speedy response and for the two workarounds. Atin: I use the barrier feature to quiesce the Gluster volume before a backup. This allows my backups to be point-in-time consistent. I am not using LVM so I cannot use the snapshot feature.
I do not plan to fix it considering we have other workarounds.