Bug 1308402 - Newly created volume start, starting the bricks when server quorum not met
Newly created volume start, starting the bricks when server quorum not met
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: glusterd (Show other bugs)
mainline
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Satish Mohan
:
Depends On: 1306667
Blocks: 1310630 1310631 1310632
  Show dependency treegraph
 
Reported: 2016-02-15 00:17 EST by Gaurav Kumar Garg
Modified: 2016-06-16 09:57 EDT (History)
6 users (show)

See Also:
Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1306667
: 1310630 1310631 1310632 (view as bug list)
Environment:
Last Closed: 2016-06-16 09:57:29 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Gaurav Kumar Garg 2016-02-15 00:17:32 EST
+++ This bug was initially created as a clone of Bug #1306667 +++

Description of problem:
=======================
Had a four node cluster and created a Distributed volume using one brick and enabled the server quorum and server-quorum ratio was 90, and stopped glusterd in one of the node to achieve server quorum not met condition based on server quorum ratio set and started the volume, *it's started and bricks are online**



Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.5-19.el7rhgs

How reproducible:
=================
Every time


Steps to Reproduce:
===================
1.Have 4 node cluster (node-1..4)
2.Create a simple distributed volume using one brick
3.Enabled the server quorum 
4.Set the server quorum ration to 90
5.Stop glusterd on one of the node (eg :node-4)
6.Try to start the volume now //will start and bricks will be online

Actual results:
===============
Bricks are online when server quorum not met


Expected results:
=================
Bricks should be in offline when server quorum not met


Additional info:

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-02-11 09:41:38 EST ---

This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Byreddy on 2016-02-11 09:43:57 EST ---

Console log for reference
==========================

[root@dhcp42-67 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.43.107
Uuid: df468eed-713c-46b2-8136-81d9f7835c0a
State: Peer in Cluster (Connected)

Hostname: 10.70.42.185
Uuid: 6ec3558a-3b11-469d-b4d6-6f2e516a2706
State: Peer in Cluster (Connected)

Hostname: 10.70.42.62
Uuid: d520b270-dd3b-4cc7-a1e4-f7be5cf4677b
State: Peer in Cluster (Connected)
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# gluster volume create Dis 10.70.42.67:/bricks/brick0/az0
volume create: Dis: success: please start the volume to access data
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# gluster volume set Dis cluster.server-quorum-type server
volume set: success
[root@dhcp42-67 ~]# gluster volume info
 
Volume Name: Dis
Type: Distribute
Volume ID: 94443936-9265-4646-b666-64fafcb01e1d
Status: Created
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.70.42.67:/bricks/brick0/az0
Options Reconfigured:
cluster.server-quorum-type: server
performance.readdir-ahead: on
cluster.server-quorum-ratio: 90
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.43.107
Uuid: df468eed-713c-46b2-8136-81d9f7835c0a
State: Peer in Cluster (Connected)

Hostname: 10.70.42.185
Uuid: 6ec3558a-3b11-469d-b4d6-6f2e516a2706
State: Peer in Cluster (Connected)

Hostname: 10.70.42.62
Uuid: d520b270-dd3b-4cc7-a1e4-f7be5cf4677b
State: Peer in Cluster (Disconnected)
[root@dhcp42-67 ~]# gluster volume start Dis
volume start: Dis: success
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# gluster volume status
Status of volume: Dis
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.42.67:/bricks/brick0/az0        49213     0          Y       14586
NFS Server on localhost                     2049      0          Y       14606
NFS Server on 10.70.43.107                  2049      0          Y       31819
NFS Server on 10.70.42.185                  2049      0          Y       14282
 
Task Status of Volume Dis
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# gluster volume stop Dis
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: Dis: failed: Quorum not met. Volume operation not allowed.
[root@dhcp42-67 ~]#

--- Additional comment from Atin Mukherjee on 2016-02-11 10:59:00 EST ---

Gaurav,

Could you please check this issue?

~Atin

--- Additional comment from Atin Mukherjee on 2016-02-11 12:59:04 EST ---

This looks like a regression caused by http://review.gluster.org/12718

--- Additional comment from Byreddy on 2016-02-12 00:28:00 EST ---

Yes Atin, this is Regression issue,

I just verified the scenario using 3.1.1 build, there it's working as per the expectation. below is the console log 

[root@dhcp42-67 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.43.107
Uuid: 74da4065-4c5a-4e8d-ba69-8d1bf3ae3e8b
State: Peer in Cluster (Connected)

Hostname: 10.70.42.185
Uuid: 7cb8e3da-e56b-488f-a84c-afa74b2ddda0
State: Peer in Cluster (Connected)

Hostname: 10.70.42.62
Uuid: 32b464a0-1c93-46a3-ad76-42a150266d42
State: Peer in Cluster (Connected)
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# gluster volume set all cluster.server-quorum-ratio 90
volume set: success
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# gluster volume create Dis 10.70.42.67:/bricks/brick0/aj0
volume create: Dis: success: please start the volume to access data
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# gluster volume set Dis cluster.server-quorum-type server
volume set: success
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# gluster volume info
 
Volume Name: Dis
Type: Distribute
Volume ID: bfbc78ce-4303-4743-839d-ffe5dfca4863
Status: Created
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.70.42.67:/bricks/brick0/aj0
Options Reconfigured:
cluster.server-quorum-type: server
performance.readdir-ahead: on
cluster.server-quorum-ratio: 90
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.43.107
Uuid: 74da4065-4c5a-4e8d-ba69-8d1bf3ae3e8b
State: Peer in Cluster (Connected)

Hostname: 10.70.42.185
Uuid: 7cb8e3da-e56b-488f-a84c-afa74b2ddda0
State: Peer in Cluster (Connected)

Hostname: 10.70.42.62
Uuid: 32b464a0-1c93-46a3-ad76-42a150266d42
State: Peer in Cluster (Disconnected)
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# gluster volume start Dis
volume start: Dis: failed: Quorum not met. Volume operation not allowed.
[root@dhcp42-67 ~]# 
[root@dhcp42-67 ~]# rpm -qa |grep gluster
glusterfs-client-xlators-3.7.1-16.el7rhgs.x86_64
glusterfs-rdma-3.7.1-16.el7rhgs.x86_64
glusterfs-libs-3.7.1-16.el7rhgs.x86_64
glusterfs-3.7.1-16.el7rhgs.x86_64
glusterfs-api-3.7.1-16.el7rhgs.x86_64
glusterfs-fuse-3.7.1-16.el7rhgs.x86_64
glusterfs-cli-3.7.1-16.el7rhgs.x86_64
nfs-ganesha-gluster-2.2.0-9.el7rhgs.x86_64
glusterfs-geo-replication-3.7.1-16.el7rhgs.x86_64
glusterfs-server-3.7.1-16.el7rhgs.x86_64
glusterfs-ganesha-3.7.1-16.el7rhgs.x86_64
[root@dhcp42-67 ~]#

--- Additional comment from Byreddy on 2016-02-12 00:29:11 EST ---

Marking this bug as Regression failed based on above details

--- Additional comment from RHEL Product and Program Management on 2016-02-12 00:32:29 EST ---

This bug report has Keywords: Regression or TestBlocker.

Since no regressions or test blockers are allowed between releases,
it is also being identified as a blocker for this release.

Please resolve ASAP.

--- Additional comment from Atin Mukherjee on 2016-02-12 02:14:44 EST ---

Although its a regression and starting a volume when quorum is not met can increase the probability of data split brains, historically server side quorum doesn't guarantee that split brains can never happen. Also enabling server side quorum as a recommendation is no where documented. So I hardly believe customer uses it in production. Considering all these parameters and the stage we are in for the release, my vote would be to mark it as a known issue.

--- Additional comment from Laura Bailey on 2016-02-14 21:55:02 EST ---
Comment 1 Vijay Bellur 2016-02-15 00:21:40 EST
REVIEW: http://review.gluster.org/13442 (glusterd: volume should not start when server quorum is not met) posted (#1) for review on master by Gaurav Kumar Garg (ggarg@redhat.com)
Comment 5 Vijay Bellur 2016-02-18 06:42:22 EST
REVIEW: http://review.gluster.org/13442 (glusterd: volume should not start when server quorum is not met) posted (#2) for review on master by Gaurav Kumar Garg (ggarg@redhat.com)
Comment 6 Vijay Bellur 2016-02-18 07:10:22 EST
REVIEW: http://review.gluster.org/13442 (glusterd: volume should not start when server quorum is not met) posted (#3) for review on master by Gaurav Kumar Garg (ggarg@redhat.com)
Comment 7 Vijay Bellur 2016-02-22 07:01:19 EST
COMMIT: http://review.gluster.org/13442 committed in master by Atin Mukherjee (amukherj@redhat.com) 
------
commit 62db11fa017004aa6cb1d91ec6b0117ac3e96a13
Author: Gaurav Kumar Garg <garg.gaurav52@gmail.com>
Date:   Mon Feb 15 10:48:18 2016 +0530

    glusterd: volume should not start when server quorum is not met
    
    Currently when server quorum is not met then upon executing
     # gluster volume start [force] command its starting the volume.
    
    With this patch if server side quorum is not met then it will
    prevent starting of the volume.
    
    Change-Id: I39734b2dcf8e90c3c68bf2762d8350aecc82cc38
    BUG: 1308402
    Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com>
    Reviewed-on: http://review.gluster.org/13442
    Smoke: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Comment 10 Niels de Vos 2016-06-16 09:57:29 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.