Bug 1758438

Summary: [Brick Multiplexing] With server quorum enabled, inconsistent bricks are spawned, when quorum is lost and regained
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: SATHEESARAN <sasundar>
Component: glusterdAssignee: Sanju <srakonde>
Status: CLOSED NEXTRELEASE QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.5CC: nchilaka, rhs-bugs, srakonde, storage-qa-internal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-17 14:16:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1609451    
Attachments:
Description Flags
glusterd.log from node1
none
glusterd.log from node2
none
glusterd.log from node3
none
Recording that describes the issue
none
glusterd log file from node1
none
glusterd log file from node2
none
glusterd log file from node3 none

Description SATHEESARAN 2019-10-04 07:11:07 UTC
Description of problem:
-----------------------
In the Commvault Hyperscale like setup, there are 3 volumes - engine (replica 3), commserve_vol ( replica 3 ), backupvol (disperse). Brick multipexing feature is enabled on disperse volume, as the result enabled on all the volumes.

RHHI-V specific replica 3 volumes will have server-side quorum and client-side quorum enabled. When server quorum is not met, bricks are killed, but when the quorum is regained, there are more number of brick processes that are running on that host

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHGS 3.5.0 ( glusterfs-6.0-13 )

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Create a replica 3 volume with 3 node gluster-cluster and start it
2. Enable brick multipexing on that volume
3. Enable server-quorum on that volume
4. Stop glusterd on node2 and node3
5. On node1, server-quorum is lost and bricks would be killed
6. Restart glusterd on other 2 nodes
7. Check for glusterfsd(brick) processes running on the node1

Actual results:
---------------
There are many brick (glusterfsd) process running on that host for the same brick

Expected results:
-----------------
There should be only one glusterfsd(brick) process running

Comment 3 SATHEESARAN 2019-10-04 07:35:59 UTC
Created attachment 1622520 [details]
glusterd.log from node1

Comment 4 SATHEESARAN 2019-10-04 07:36:14 UTC
Created attachment 1622521 [details]
glusterd.log from node2

Comment 5 SATHEESARAN 2019-10-04 07:36:29 UTC
Created attachment 1622522 [details]
glusterd.log from node3

Comment 6 SATHEESARAN 2019-10-04 07:38:54 UTC
At one point of time, I could observe that there are more than 21 glusterfsd (brick) process running for the same brick, consuming different ports.
This is again the resource leaks and wasting the resource on that machine, but **no** functional impact observed

Comment 13 SATHEESARAN 2019-10-16 06:00:20 UTC
Created attachment 1626284 [details]
Recording that describes the issue

Comment 14 SATHEESARAN 2019-10-16 06:02:54 UTC
Created attachment 1626285 [details]
glusterd log file from node1

Comment 15 SATHEESARAN 2019-10-16 06:03:22 UTC
Created attachment 1626286 [details]
glusterd log file from node2

Comment 16 SATHEESARAN 2019-10-16 06:03:49 UTC
Created attachment 1626287 [details]
glusterd log file from node3

Comment 28 Sanju 2020-07-08 07:09:21 UTC
*** Bug 1609450 has been marked as a duplicate of this bug. ***