Bug 1318068

Summary: features.barrier is not set to "disable" after barrier timeout expires
Product: [Community] GlusterFS Reporter: jack.wong
Component: barrierAssignee: bugs <bugs>
Status: CLOSED WONTFIX QA Contact:
Severity: low Docs Contact:
Priority: unspecified    
Version: 3.7.9CC: amukherj, bugs, jack.wong, sasundar
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-20 13:40:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description jack.wong 2016-03-15 23:37:07 UTC
Description of problem:
If a volume's barrier is removed because the barrier timeout expires, the volume options still show the barrier as being enabled.

Version-Release number of selected component (if applicable):
3.7

How reproducible:
Always

Steps to Reproduce:
1. Enable the barrier on a volume. (gluster volume barrier VOLUME enable)
2. Wait for the barrier timeout to expire. By default, this is two minutes. To see the number of seconds required, run "gluster volume get VOLUME features.barrier-timeout".

Actual results:
Running "gluster volume get VOLUME features.barrier" shows that the barrier is still enabled, but the barrier has been automatically disabled by Gluster.

Expected results:
The barrier should be shown as disabled.

Comment 1 SATHEESARAN 2016-03-16 02:56:09 UTC
I think the barrier option is turned off internally when barrier-timeout is met, but not reflected anywhere else.

If you take a brick statedump ( by using gluster volume statedump <volname> all ), you can see that barrier is turned off after barrier-timeout seconds
# gluster volume statedump <vol> all
# grep barrier.enabled /var/run/gluster/<brickpath>.<brick-pid>.dump.<timestamp>

Since the state of barrier is not restored in volume options, volume get is also returning the old value which is 'enable', but its actually disabled

Atin, do you think the state of barrier should be dynamically obtained from 'gluster volume get' ?

Comment 2 SATHEESARAN 2016-03-16 03:01:14 UTC
Jack,

You have 2 options to check whether barrier is enabled.

1. Get the volume statedump and grep for 'barrier.enabled'
[root@ ~]# grep barrier.enabled /var/run/gluster/rhs-brick1-b1.28902.dump.1458098783 
barrier.enabled=0

2. Check for brick logs ( /var/log/glusterfs/bricks/* )
[root@ ~]# grep -i 'barrier timeout' /var/log/glusterfs/bricks/rhs-brick1-b1.log 
[2016-03-16 03:23:51.353413] C [barrier.c:409:barrier_timeout] 0-test-barrier: Disabling barrier because of the barrier timeout.

I am also curious the reason behind using barrier directly.
What is the requirement behind it ?

For your infromation, barrier is indirectly internally called by snapshot feature, while snapshot is taken.

Comment 3 Atin Mukherjee 2016-03-16 04:26:05 UTC
(In reply to SATHEESARAN from comment #1)
> I think the barrier option is turned off internally when barrier-timeout is
> met, but not reflected anywhere else.
> 
> If you take a brick statedump ( by using gluster volume statedump <volname>
> all ), you can see that barrier is turned off after barrier-timeout seconds
> # gluster volume statedump <vol> all
> # grep barrier.enabled
> /var/run/gluster/<brickpath>.<brick-pid>.dump.<timestamp>
> 
> Since the state of barrier is not restored in volume options, volume get is
> also returning the old value which is 'enable', but its actually disabled
> 
> Atin, do you think the state of barrier should be dynamically obtained from
> 'gluster volume get' ?

That could be better but it involves lot of work. In volume get work flow by default we pick up the value from volume's dictionary, if its not there then we load the xlator dynamically and fetch it from there. In this case we'd have to handle this in a special way otherwise. My take is since barrier as a feature is been used internally by snapshot I think we can still live up with it.

Comment 4 Atin Mukherjee 2016-03-16 04:27:42 UTC
Jack,

Could you please share your use case on why are you using barrier feature directly, based on that we can take a call.

~Atin

Comment 5 jack.wong 2016-03-16 17:54:50 UTC
Satheesaran: Thank you for your speedy response and for the two workarounds.

Atin: I use the barrier feature to quiesce the Gluster volume before a backup. This allows my backups to be point-in-time consistent. I am not using LVM so I cannot use the snapshot feature.

Comment 6 Atin Mukherjee 2016-03-20 13:40:48 UTC
I do not plan to fix it considering we have other workarounds.