Bug 1758438 - [Brick Multiplexing] With server quorum enabled, inconsistent bricks are spawned, when quorum is lost and regained
Summary: [Brick Multiplexing] With server quorum enabled, inconsistent bricks are spaw...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Sanju
QA Contact: SATHEESARAN
URL:
Whiteboard:
: 1609450 (view as bug list)
Depends On:
Blocks: 1609451
TreeView+ depends on / blocked
 
Reported: 2019-10-04 07:11 UTC by SATHEESARAN
Modified: 2020-08-17 14:16 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-17 14:16:42 UTC
Embargoed:


Attachments (Terms of Use)
glusterd.log from node1 (543 bytes, application/octet-stream)
2019-10-04 07:35 UTC, SATHEESARAN
no flags Details
glusterd.log from node2 (543 bytes, application/octet-stream)
2019-10-04 07:36 UTC, SATHEESARAN
no flags Details
glusterd.log from node3 (543 bytes, application/octet-stream)
2019-10-04 07:36 UTC, SATHEESARAN
no flags Details
Recording that describes the issue (4.38 MB, application/ogg)
2019-10-16 06:00 UTC, SATHEESARAN
no flags Details
glusterd log file from node1 (82.92 KB, text/plain)
2019-10-16 06:02 UTC, SATHEESARAN
no flags Details
glusterd log file from node2 (87.89 KB, text/plain)
2019-10-16 06:03 UTC, SATHEESARAN
no flags Details
glusterd log file from node3 (87.43 KB, text/plain)
2019-10-16 06:03 UTC, SATHEESARAN
no flags Details

Description SATHEESARAN 2019-10-04 07:11:07 UTC
Description of problem:
-----------------------
In the Commvault Hyperscale like setup, there are 3 volumes - engine (replica 3), commserve_vol ( replica 3 ), backupvol (disperse). Brick multipexing feature is enabled on disperse volume, as the result enabled on all the volumes.

RHHI-V specific replica 3 volumes will have server-side quorum and client-side quorum enabled. When server quorum is not met, bricks are killed, but when the quorum is regained, there are more number of brick processes that are running on that host

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHGS 3.5.0 ( glusterfs-6.0-13 )

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Create a replica 3 volume with 3 node gluster-cluster and start it
2. Enable brick multipexing on that volume
3. Enable server-quorum on that volume
4. Stop glusterd on node2 and node3
5. On node1, server-quorum is lost and bricks would be killed
6. Restart glusterd on other 2 nodes
7. Check for glusterfsd(brick) processes running on the node1

Actual results:
---------------
There are many brick (glusterfsd) process running on that host for the same brick

Expected results:
-----------------
There should be only one glusterfsd(brick) process running

Comment 3 SATHEESARAN 2019-10-04 07:35:59 UTC
Created attachment 1622520 [details]
glusterd.log from node1

Comment 4 SATHEESARAN 2019-10-04 07:36:14 UTC
Created attachment 1622521 [details]
glusterd.log from node2

Comment 5 SATHEESARAN 2019-10-04 07:36:29 UTC
Created attachment 1622522 [details]
glusterd.log from node3

Comment 6 SATHEESARAN 2019-10-04 07:38:54 UTC
At one point of time, I could observe that there are more than 21 glusterfsd (brick) process running for the same brick, consuming different ports.
This is again the resource leaks and wasting the resource on that machine, but **no** functional impact observed

Comment 13 SATHEESARAN 2019-10-16 06:00:20 UTC
Created attachment 1626284 [details]
Recording that describes the issue

Comment 14 SATHEESARAN 2019-10-16 06:02:54 UTC
Created attachment 1626285 [details]
glusterd log file from node1

Comment 15 SATHEESARAN 2019-10-16 06:03:22 UTC
Created attachment 1626286 [details]
glusterd log file from node2

Comment 16 SATHEESARAN 2019-10-16 06:03:49 UTC
Created attachment 1626287 [details]
glusterd log file from node3

Comment 28 Sanju 2020-07-08 07:09:21 UTC
*** Bug 1609450 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.