Bug 1758438

Summary:

[Brick Multiplexing] With server quorum enabled, inconsistent bricks are spawned, when quorum is lost and regained

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

SATHEESARAN <sasundar>

Component:

glusterd

Assignee:

Sanju <srakonde>

Status:

CLOSED NEXTRELEASE

QA Contact:

SATHEESARAN <sasundar>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

rhgs-3.5

CC:

nchilaka, rhs-bugs, srakonde, storage-qa-internal, vbellur

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-08-17 14:16:42 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1609451

Attachments:

Description	Flags
glusterd.log from node1	none
glusterd.log from node2	none
glusterd.log from node3	none
Recording that describes the issue	none
glusterd log file from node1	none
glusterd log file from node2	none
glusterd log file from node3	none

Description SATHEESARAN 2019-10-04 07:11:07 UTC

Description of problem:
-----------------------
In the Commvault Hyperscale like setup, there are 3 volumes - engine (replica 3), commserve_vol ( replica 3 ), backupvol (disperse). Brick multipexing feature is enabled on disperse volume, as the result enabled on all the volumes.

RHHI-V specific replica 3 volumes will have server-side quorum and client-side quorum enabled. When server quorum is not met, bricks are killed, but when the quorum is regained, there are more number of brick processes that are running on that host

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHGS 3.5.0 ( glusterfs-6.0-13 )

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Create a replica 3 volume with 3 node gluster-cluster and start it
2. Enable brick multipexing on that volume
3. Enable server-quorum on that volume
4. Stop glusterd on node2 and node3
5. On node1, server-quorum is lost and bricks would be killed
6. Restart glusterd on other 2 nodes
7. Check for glusterfsd(brick) processes running on the node1

Actual results:
---------------
There are many brick (glusterfsd) process running on that host for the same brick

Expected results:
-----------------
There should be only one glusterfsd(brick) process running

Comment 3 SATHEESARAN 2019-10-04 07:35:59 UTC

Created attachment 1622520 [details]
glusterd.log from node1

Comment 4 SATHEESARAN 2019-10-04 07:36:14 UTC

Created attachment 1622521 [details]
glusterd.log from node2

Comment 5 SATHEESARAN 2019-10-04 07:36:29 UTC

Created attachment 1622522 [details]
glusterd.log from node3

Comment 6 SATHEESARAN 2019-10-04 07:38:54 UTC

At one point of time, I could observe that there are more than 21 glusterfsd (brick) process running for the same brick, consuming different ports.
This is again the resource leaks and wasting the resource on that machine, but **no** functional impact observed

Comment 13 SATHEESARAN 2019-10-16 06:00:20 UTC

Created attachment 1626284 [details]
Recording that describes the issue

Comment 14 SATHEESARAN 2019-10-16 06:02:54 UTC

Created attachment 1626285 [details]
glusterd log file from node1

Comment 15 SATHEESARAN 2019-10-16 06:03:22 UTC

Created attachment 1626286 [details]
glusterd log file from node2

Comment 16 SATHEESARAN 2019-10-16 06:03:49 UTC

Created attachment 1626287 [details]
glusterd log file from node3

Comment 28 Sanju 2020-07-08 07:09:21 UTC

*** Bug 1609450 has been marked as a duplicate of this bug. ***