Bug 1318068 - features.barrier is not set to "disable" after barrier timeout expires
Summary: features.barrier is not set to "disable" after barrier timeout expires
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: barrier
Version: 3.7.9
Hardware: All
OS: All
unspecified
low
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-15 23:37 UTC by jack.wong
Modified: 2016-03-20 13:40 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-20 13:40:48 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description jack.wong 2016-03-15 23:37:07 UTC
Description of problem:
If a volume's barrier is removed because the barrier timeout expires, the volume options still show the barrier as being enabled.

Version-Release number of selected component (if applicable):
3.7

How reproducible:
Always

Steps to Reproduce:
1. Enable the barrier on a volume. (gluster volume barrier VOLUME enable)
2. Wait for the barrier timeout to expire. By default, this is two minutes. To see the number of seconds required, run "gluster volume get VOLUME features.barrier-timeout".

Actual results:
Running "gluster volume get VOLUME features.barrier" shows that the barrier is still enabled, but the barrier has been automatically disabled by Gluster.

Expected results:
The barrier should be shown as disabled.

Comment 1 SATHEESARAN 2016-03-16 02:56:09 UTC
I think the barrier option is turned off internally when barrier-timeout is met, but not reflected anywhere else.

If you take a brick statedump ( by using gluster volume statedump <volname> all ), you can see that barrier is turned off after barrier-timeout seconds
# gluster volume statedump <vol> all
# grep barrier.enabled /var/run/gluster/<brickpath>.<brick-pid>.dump.<timestamp>

Since the state of barrier is not restored in volume options, volume get is also returning the old value which is 'enable', but its actually disabled

Atin, do you think the state of barrier should be dynamically obtained from 'gluster volume get' ?

Comment 2 SATHEESARAN 2016-03-16 03:01:14 UTC
Jack,

You have 2 options to check whether barrier is enabled.

1. Get the volume statedump and grep for 'barrier.enabled'
[root@ ~]# grep barrier.enabled /var/run/gluster/rhs-brick1-b1.28902.dump.1458098783 
barrier.enabled=0

2. Check for brick logs ( /var/log/glusterfs/bricks/* )
[root@ ~]# grep -i 'barrier timeout' /var/log/glusterfs/bricks/rhs-brick1-b1.log 
[2016-03-16 03:23:51.353413] C [barrier.c:409:barrier_timeout] 0-test-barrier: Disabling barrier because of the barrier timeout.

I am also curious the reason behind using barrier directly.
What is the requirement behind it ?

For your infromation, barrier is indirectly internally called by snapshot feature, while snapshot is taken.

Comment 3 Atin Mukherjee 2016-03-16 04:26:05 UTC
(In reply to SATHEESARAN from comment #1)
> I think the barrier option is turned off internally when barrier-timeout is
> met, but not reflected anywhere else.
> 
> If you take a brick statedump ( by using gluster volume statedump <volname>
> all ), you can see that barrier is turned off after barrier-timeout seconds
> # gluster volume statedump <vol> all
> # grep barrier.enabled
> /var/run/gluster/<brickpath>.<brick-pid>.dump.<timestamp>
> 
> Since the state of barrier is not restored in volume options, volume get is
> also returning the old value which is 'enable', but its actually disabled
> 
> Atin, do you think the state of barrier should be dynamically obtained from
> 'gluster volume get' ?

That could be better but it involves lot of work. In volume get work flow by default we pick up the value from volume's dictionary, if its not there then we load the xlator dynamically and fetch it from there. In this case we'd have to handle this in a special way otherwise. My take is since barrier as a feature is been used internally by snapshot I think we can still live up with it.

Comment 4 Atin Mukherjee 2016-03-16 04:27:42 UTC
Jack,

Could you please share your use case on why are you using barrier feature directly, based on that we can take a call.

~Atin

Comment 5 jack.wong 2016-03-16 17:54:50 UTC
Satheesaran: Thank you for your speedy response and for the two workarounds.

Atin: I use the barrier feature to quiesce the Gluster volume before a backup. This allows my backups to be point-in-time consistent. I am not using LVM so I cannot use the snapshot feature.

Comment 6 Atin Mukherjee 2016-03-20 13:40:48 UTC
I do not plan to fix it considering we have other workarounds.


Note You need to log in before you can comment on or make changes to this bug.