861560 – Quorum issues in case of network issues

Bug 861560 - Quorum issues in case of network issues

Summary: Quorum issues in case of network issues

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1066140
TreeView+	depends on / blocked

Reported:	2012-09-29 07:25 UTC by Sachidananda Urs
Modified:	2015-03-23 07:40 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Sachidananda Urs 2012-09-29 07:25:59 UTC

Description of problem:

In case of a 2x2 setup and a quorum percentage of 100%, if a network disrupt happens between just machines. The quorum will be lost between only those two machines, and they will end up killing glusterfsd on their respective nodes. However the cluster remains operational in degraded state despite the quorum loss.

For example: Consider the setup


[root@rhs-client19 ~]# gluster volume info
 
Volume Name: quo
Type: Distributed-Replicate
Volume ID: 96852dd0-e8f6-48f8-94e2-ef80e8c70778
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: rhs-client19.lab.eng.blr.redhat.com:/home/A
Brick2: rhs-client20.lab.eng.blr.redhat.com:/home/B
Brick3: rhs-client21.lab.eng.blr.redhat.com:/home/C
Brick4: rhs-client23.lab.eng.blr.redhat.com:/home/D
Options Reconfigured:
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 100
global-option-version: 35
[root@rhs-client19 ~]# 


[root@rhs-client19 ~]# gluster peer status
Number of Peers: 3

Hostname: rhs-client20.lab.eng.blr.redhat.com
Uuid: b7f33530-25c1-406c-8c76-2c5feabaf7b0
State: Peer in Cluster (Disconnected)

Hostname: rhs-client21.lab.eng.blr.redhat.com
Uuid: 5b315725-90dd-41f9-abe8-827d27db8210
State: Peer in Cluster (Connected)

Hostname: rhs-client23.lab.eng.blr.redhat.com
Uuid: 230ae9f2-310e-49a6-b9f6-440bb5962da3
State: Peer in Cluster (Connected)
[root@rhs-client19 ~]# 

===
[root@rhs-client20 ~]# gluster peer status
Number of Peers: 3

Hostname: rhs-client21.lab.eng.blr.redhat.com
Uuid: 5b315725-90dd-41f9-abe8-827d27db8210
State: Peer in Cluster (Connected)

Hostname: rhs-client23.lab.eng.blr.redhat.com
Uuid: 230ae9f2-310e-49a6-b9f6-440bb5962da3
State: Peer in Cluster (Connected)

Hostname: 10.70.36.43
Uuid: 772396e0-ccae-4b64-99f9-84f7e836d101
State: Peer in Cluster (Disconnected)
[root@rhs-client20 ~]# 

============
[root@rhs-client21 ~]# gluster peer status
Number of Peers: 3

Hostname: rhs-client23.lab.eng.blr.redhat.com
Uuid: 230ae9f2-310e-49a6-b9f6-440bb5962da3
State: Peer in Cluster (Connected)

Hostname: 10.70.36.43
Uuid: 772396e0-ccae-4b64-99f9-84f7e836d101
State: Peer in Cluster (Connected)

Hostname: rhs-client20.lab.eng.blr.redhat.com
Uuid: b7f33530-25c1-406c-8c76-2c5feabaf7b0
State: Peer in Cluster (Connected)
[root@rhs-client21 ~]# 

===============

[root@rhs-client23 ~]# gluster peer status
Number of Peers: 3

Hostname: rhs-client20.lab.eng.blr.redhat.com
Uuid: b7f33530-25c1-406c-8c76-2c5feabaf7b0
State: Peer in Cluster (Connected)

Hostname: rhs-client21.lab.eng.blr.redhat.com
Uuid: 5b315725-90dd-41f9-abe8-827d27db8210
State: Peer in Cluster (Connected)

Hostname: 10.70.36.43
Uuid: 772396e0-ccae-4b64-99f9-84f7e836d101
State: Peer in Cluster (Connected)
[root@rhs-client23 ~]# 

==================

Now in the above scenario, machines 19 and 20 are disconnected from each other. And quorum ratio is 100%. And the disconnect between 19 and 20 breaks the quorum. But the processes are killed only on 19 and 20, since the others don't see this disconnect.

I am not sure if this is a bug or a limitation. But, however the mount point is active with another pair serving.

Comment 2 Pranith Kumar K 2012-09-29 10:08:29 UTC

Sac,
  I discussed this bug with Vijay. We are in agreement that this situation can not be handled using current implementation.
Lets keep the bug open for now. We are not going to fix it for 2.0.z though.

Pranith.

Comment 5 Vivek Agarwal 2015-03-23 07:40:05 UTC

The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html

Comment 6 Vivek Agarwal 2015-03-23 07:40:35 UTC

The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html

Note You need to log in before you can comment on or make changes to this bug.