Bug 861560

Summary: Quorum issues in case of network issues
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sachidananda Urs <sac>
Component: glusterdAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED WONTFIX QA Contact: storage-qa-internal <storage-qa-internal>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 2.0CC: cww, pkarampu, rcyriac, rhs-bugs, rwheeler, vbellur
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1066140    

Description Sachidananda Urs 2012-09-29 07:25:59 UTC
Description of problem:

In case of a 2x2 setup and a quorum percentage of 100%, if a network disrupt happens between just machines. The quorum will be lost between only those two machines, and they will end up killing glusterfsd on their respective nodes. However the cluster remains operational in degraded state despite the quorum loss.

For example: Consider the setup


[root@rhs-client19 ~]# gluster volume info
 
Volume Name: quo
Type: Distributed-Replicate
Volume ID: 96852dd0-e8f6-48f8-94e2-ef80e8c70778
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: rhs-client19.lab.eng.blr.redhat.com:/home/A
Brick2: rhs-client20.lab.eng.blr.redhat.com:/home/B
Brick3: rhs-client21.lab.eng.blr.redhat.com:/home/C
Brick4: rhs-client23.lab.eng.blr.redhat.com:/home/D
Options Reconfigured:
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 100
global-option-version: 35
[root@rhs-client19 ~]# 


[root@rhs-client19 ~]# gluster peer status
Number of Peers: 3

Hostname: rhs-client20.lab.eng.blr.redhat.com
Uuid: b7f33530-25c1-406c-8c76-2c5feabaf7b0
State: Peer in Cluster (Disconnected)

Hostname: rhs-client21.lab.eng.blr.redhat.com
Uuid: 5b315725-90dd-41f9-abe8-827d27db8210
State: Peer in Cluster (Connected)

Hostname: rhs-client23.lab.eng.blr.redhat.com
Uuid: 230ae9f2-310e-49a6-b9f6-440bb5962da3
State: Peer in Cluster (Connected)
[root@rhs-client19 ~]# 

===
[root@rhs-client20 ~]# gluster peer status
Number of Peers: 3

Hostname: rhs-client21.lab.eng.blr.redhat.com
Uuid: 5b315725-90dd-41f9-abe8-827d27db8210
State: Peer in Cluster (Connected)

Hostname: rhs-client23.lab.eng.blr.redhat.com
Uuid: 230ae9f2-310e-49a6-b9f6-440bb5962da3
State: Peer in Cluster (Connected)

Hostname: 10.70.36.43
Uuid: 772396e0-ccae-4b64-99f9-84f7e836d101
State: Peer in Cluster (Disconnected)
[root@rhs-client20 ~]# 

============
[root@rhs-client21 ~]# gluster peer status
Number of Peers: 3

Hostname: rhs-client23.lab.eng.blr.redhat.com
Uuid: 230ae9f2-310e-49a6-b9f6-440bb5962da3
State: Peer in Cluster (Connected)

Hostname: 10.70.36.43
Uuid: 772396e0-ccae-4b64-99f9-84f7e836d101
State: Peer in Cluster (Connected)

Hostname: rhs-client20.lab.eng.blr.redhat.com
Uuid: b7f33530-25c1-406c-8c76-2c5feabaf7b0
State: Peer in Cluster (Connected)
[root@rhs-client21 ~]# 

===============

[root@rhs-client23 ~]# gluster peer status
Number of Peers: 3

Hostname: rhs-client20.lab.eng.blr.redhat.com
Uuid: b7f33530-25c1-406c-8c76-2c5feabaf7b0
State: Peer in Cluster (Connected)

Hostname: rhs-client21.lab.eng.blr.redhat.com
Uuid: 5b315725-90dd-41f9-abe8-827d27db8210
State: Peer in Cluster (Connected)

Hostname: 10.70.36.43
Uuid: 772396e0-ccae-4b64-99f9-84f7e836d101
State: Peer in Cluster (Connected)
[root@rhs-client23 ~]# 

==================

Now in the above scenario, machines 19 and 20 are disconnected from each other. And quorum ratio is 100%. And the disconnect between 19 and 20 breaks the quorum. But the processes are killed only on 19 and 20, since the others don't see this disconnect.

I am not sure if this is a bug or a limitation. But, however the mount point is active with another pair serving.

Comment 2 Pranith Kumar K 2012-09-29 10:08:29 UTC
Sac,
  I discussed this bug with Vijay. We are in agreement that this situation can not be handled using current implementation.
Lets keep the bug open for now. We are not going to fix it for 2.0.z though.

Pranith.

Comment 5 Vivek Agarwal 2015-03-23 07:40:05 UTC
The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html

Comment 6 Vivek Agarwal 2015-03-23 07:40:35 UTC
The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html