Bug 1347329 - a two node glusterfs seems not possible anymore?!
Summary: a two node glusterfs seems not possible anymore?!
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.8
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1352277
TreeView+ depends on / blocked
 
Reported: 2016-06-16 14:04 UTC by Jules
Modified: 2017-04-11 11:34 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1352277 (view as bug list)
Environment:
Last Closed: 2017-04-11 11:34:27 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Jules 2016-06-16 14:04:59 UTC
Description of problem:
a two node glusterfs seems not possible anymore?! 

Version-Release number of selected component (if applicable):
3.7-11, 3.8

How reproducible:

set: 
cluster.quorum-type: fixed
cluster.quorum-count: 1
cluster.server-quorum-type: none


Steps to Reproduce:
1. Shutdown (kill) other node glusterd/glusterfs
2. Check: gluster volume start nfs-storage
3. Get error like bellow.

Actual results:

volume start: nfs-storage: failed: Quorum not met. Volume operation not allowed.

Expected results:

Running bricks on one node.

Additional info:

Latest Debian Jessie

Comment 1 Joe Julian 2016-06-16 19:37:19 UTC
With a 2 server replica 2 started volume - no other volumes:

Without any quorum settings being set, the brick will not start on boot if the other server is down. This is a departure from prior behavior and cannot be worked around.

Setting cluster.server-quorum-type to none does not allow the brick to be started by glusterd. It looks like as long as glusterd_is_any_volume_in_server_quorum returns gf_false it should. In my test case there was only one volume and as long as no volumes are set to "server" that function should return gf_false.

If cluster.server-quorum-type is set to server and cluster-server-quorum-ratio is set to 0, the brick will start, but neither nfs nor glustershd start.

Comment 2 Atin Mukherjee 2016-06-21 12:11:56 UTC
Is there any specific reason why are we considering quorum tunables with a two node set up? Ideally we should consider the quorum options in case of at least three node set up. Also as Joe pointed out on a two node set up, if one of the node goes for a reboot while the other is down, the daemons do not get spawned as data consistency is not guaranteed here. The moment the peer update is received the daemons are spawned. IMO, this is an expected behaviour as per the design. I am closing this bug, please feel free to reopen if you think otherwise.

Comment 3 Jules 2016-06-21 12:19:21 UTC
So what are the switches "none" for if they doesn't function?

Comment 4 Atin Mukherjee 2016-06-21 12:23:11 UTC
(In reply to Jules from comment #3)
> So what are the switches "none" for if they doesn't function?

That's the default value. If you don't set server quorum type to server, its basically considered to be off and that's what it implies here.

Comment 5 Jules 2016-06-21 12:26:35 UTC
Have you tested the second part that JoeJulian mentioned?

Comment 6 Atin Mukherjee 2016-06-21 12:55:59 UTC
"If cluster.server-quorum-type is set to server and cluster-server-quorum-ratio is set to 0, the brick will start, but neither nfs nor glustershd start." - is this valid for a set up having more than 2 nodes?

Anyways, I will test this and get back.

Comment 7 Atin Mukherjee 2016-06-22 05:49:43 UTC
I tested the same with a three node set up and bricks don't come up in that case. As I mentioned earlier, that having quorum tunables with a 2 node setup doesn't make sense to me, I'd not consider it as a bug.

Comment 8 Joe Julian 2016-06-22 07:02:31 UTC
It breaks prior production behavior with no workaround and should thus be considered a bug. 

If you want to protect users from themselves by default, I'm all behind this, but if a user knows the risks and wishes to override the safety defaults to retain prior behavior, this should be allowed.

Comment 9 Atin Mukherjee 2016-06-22 07:06:05 UTC
Well, you always have an option to use volume start force as a work around in this case, isn't it?

Comment 10 Jules 2016-07-02 09:15:56 UTC
(In reply to Atin Mukherjee from comment #9)
> Well, you always have an option to use volume start force as a work around
> in this case, isn't it?

Well, since this needs to be done by manual intervention this is not a good work around i recommend. How about a new config switch to get this working without using the force option like it was in the past.
As Joe Julian mentioned. A user which knows the risks should be able to override the safety defaults.

Comment 11 Niels de Vos 2016-09-12 05:37:14 UTC
All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

Comment 12 Jules 2016-11-12 00:03:25 UTC
Any news yet? When will The Patch getting merged from master to release?

Comment 13 Atin Mukherjee 2016-11-12 08:09:53 UTC
Jules - this change will be part of 3.9.0 which will be released any time now. But we won't be porting this patch back to 3.8 as this is an user experience change and the same should be done in a major release not the bug fixes ones which currently 3.8 is into.

Comment 14 Jules 2016-11-12 08:36:11 UTC
I've read that in v3.9 etcd will be used. Do that even work with two Node Setup?

Comment 15 Atin Mukherjee 2016-11-12 08:41:10 UTC
No. etcd as a backend store for GlusterD is slated for GlusterD 2.0 (part of Gluster 4.0 release)

Comment 16 Jules 2016-11-12 08:56:25 UTC
So at v4.0 will be The end for two Node setups? :-(

Comment 17 Atin Mukherjee 2016-11-12 11:21:59 UTC
Not really, we will continue to support two node deployments too.

Comment 18 Atin Mukherjee 2017-04-11 11:34:27 UTC
This BZ has been fixed as part of BZ 1352277 . As mentioned in comment 13 we won't be porting this fix in 3.8 and you should upgrade to 3.10 to see this behaviour change.


Note You need to log in before you can comment on or make changes to this bug.