1683602 – [Brick-mux] Observing multiple brick processes on node reboot with volume start

Bug 1683602 - [Brick-mux] Observing multiple brick processes on node reboot with volume start

Summary: [Brick-mux] Observing multiple brick processes on node reboot with volume start

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.4
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.5.z Batch Update 3
Assignee:	Srijan Sivakumar
QA Contact:	milind
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-02-27 10:22 UTC by Kshithij Iyer
Modified:	2020-12-17 04:50 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glusterfs-6.0-38
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-12-17 04:50:16 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:5603	0	None	None	None	2020-12-17 04:50:47 UTC

Description Kshithij Iyer 2019-02-27 10:22:37 UTC

Description of problem:
On a 3 node gluster cluster with cluster.brick-multiplex set to enabled, when we start volumes and reboot one node while volume start is happening. Multiple birck processes were observed on the node where reboot was done.

Node 1:
# pidof glusterfsd
6106

Node 2:
# pidof glusterfsd
6106

Node 3:(Rebooted node)
# pidof glusterfsd
6461 6353 6254 6189 6165 6138 6114 6086 6060 6034 6009 5951 5815

################################################################################
# ps -ef | grep glusterfsd
root      5815     1  0 15:14 ?        00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_10.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_10 -p /var/run/gluster/vols/volume_10/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_10.pid -S /var/run/gluster/393ad8cfc7b36661.socket --brick-name /bricks/brick1/volume_10 -l /var/log/glusterfs/bricks/bricks-brick1-volume_10.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49152 --xlator-option volume_10-server.listen-port=49152
root      5951     1  0 15:14 ?        00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_6.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_6 -p /var/run/gluster/vols/volume_6/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_6.pid -S /var/run/gluster/ec547bb79ce04cee.socket --brick-name /bricks/brick1/volume_6 -l /var/log/glusterfs/bricks/bricks-brick1-volume_6.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49153 --xlator-option volume_6-server.listen-port=49153
root      6009     1  0 15:14 ?        00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_12.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_12 -p /var/run/gluster/vols/volume_12/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_12.pid -S /var/run/gluster/5e3f1ba13aea6ef4.socket --brick-name /bricks/brick1/volume_12 -l /var/log/glusterfs/bricks/bricks-brick1-volume_12.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49154 --xlator-option volume_12-server.listen-port=49154
root      6034     1  0 15:14 ?        00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_13.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_13 -p /var/run/gluster/vols/volume_13/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_13.pid -S /var/run/gluster/3b4e349e1dd71410.socket --brick-name /bricks/brick1/volume_13 -l /var/log/glusterfs/bricks/bricks-brick1-volume_13.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49155 --xlator-option volume_13-server.listen-port=49155
root      6060     1  0 15:14 ?        00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_14.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_14 -p /var/run/gluster/vols/volume_14/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_14.pid -S /var/run/gluster/f03f82234f5e1bd2.socket --brick-name /bricks/brick1/volume_14 -l /var/log/glusterfs/bricks/bricks-brick1-volume_14.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49156 --xlator-option volume_14-server.listen-port=49156
root      6086     1  0 15:14 ?        00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_15.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_15 -p /var/run/gluster/vols/volume_15/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_15.pid -S /var/run/gluster/40a785a07a9eb911.socket --brick-name /bricks/brick1/volume_15 -l /var/log/glusterfs/bricks/bricks-brick1-volume_15.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49157 --xlator-option volume_15-server.listen-port=49157
root      6114     1  0 15:14 ?        00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_2.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_2 -p /var/run/gluster/vols/volume_2/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_2.pid -S /var/run/gluster/b7d0b9936575d900.socket --brick-name /bricks/brick1/volume_2 -l /var/log/glusterfs/bricks/bricks-brick1-volume_2.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49158 --xlator-option volume_2-server.listen-port=49158
root      6138     1  0 15:14 ?        00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_3.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_3 -p /var/run/gluster/vols/volume_3/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_3.pid -S /var/run/gluster/3292bfcc257a5866.socket --brick-name /bricks/brick1/volume_3 -l /var/log/glusterfs/bricks/bricks-brick1-volume_3.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49159 --xlator-option volume_3-server.listen-port=49159
root      6165     1  0 15:14 ?        00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_4.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_4 -p /var/run/gluster/vols/volume_4/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_4.pid -S /var/run/gluster/3da670a19f480606.socket --brick-name /bricks/brick1/volume_4 -l /var/log/glusterfs/bricks/bricks-brick1-volume_4.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49160 --xlator-option volume_4-server.listen-port=49160
root      6189     1  0 15:14 ?        00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_5.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_5 -p /var/run/gluster/vols/volume_5/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_5.pid -S /var/run/gluster/5c5df1315902ea82.socket --brick-name /bricks/brick1/volume_5 -l /var/log/glusterfs/bricks/bricks-brick1-volume_5.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49161 --xlator-option volume_5-server.listen-port=49161
root      6254     1  0 15:14 ?        00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_7.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_7 -p /var/run/gluster/vols/volume_7/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_7.pid -S /var/run/gluster/b10e8b7ce17d1caa.socket --brick-name /bricks/brick1/volume_7 -l /var/log/glusterfs/bricks/bricks-brick1-volume_7.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49162 --xlator-option volume_7-server.listen-port=49162
root      6353     1  0 15:15 ?        00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_8.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_8 -p /var/run/gluster/vols/volume_8/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_8.pid -S /var/run/gluster/481a8db656b4787d.socket --brick-name /bricks/brick1/volume_8 -l /var/log/glusterfs/bricks/bricks-brick1-volume_8.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49163 --xlator-option volume_8-server.listen-port=49163
root      6461     1  0 15:15 ?        00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_9.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_9 -p /var/run/gluster/vols/volume_9/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_9.pid -S /var/run/gluster/4ea7013f9d77e427.socket --brick-name /bricks/brick1/volume_9 -l /var/log/glusterfs/bricks/bricks-brick1-volume_9.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49164 --xlator-option volume_9-server.listen-port=49164
root      6931  5853  0 15:36 pts/0    00:00:00 grep --color=auto glusterfsd
################################################################################

Version-Release number of selected component (if applicable):
glusterfs-3.12.2-45

How reproducible:
4/4

Steps to Reproduce:
1. Create a 3 node cluster.
2. Set cluster.brick-multiplex to enable.
3. Create 15 volumes of type replica 1x3. 
4. Start all the volumes one by one.
5. While the volumes are starting reboot one node.

Actual results:
Multiple brick processes are observed.

Expected results:
Single brick process should be observed.

Comment 8 Darrell 2019-03-26 22:43:06 UTC

I've encountered this on gluster 5.5 while upgrading servers. Multiple glusterfsd processes per volume spawned, disrupting healing and causing significant problems for the VMs using these volumes.

Comment 9 Darrell 2019-04-09 16:27:11 UTC

Realized I'm probably seeing something different, albeit maybe with the same root cause, so opened https://bugzilla.redhat.com/show_bug.cgi?id=1698131. I have multiplexing disabled on my systems.

Comment 10 Atin Mukherjee 2019-04-30 02:51:07 UTC

Upstream patch : https://review.gluster.org/22635

Comment 11 Sanju 2019-11-22 06:05:37 UTC

(In reply to Atin Mukherjee from comment #10)
> Upstream patch : https://review.gluster.org/22635

The patch is in abandoned state, moving it to assigned state.

Comment 13 Sanju 2020-02-25 09:14:39 UTC

https://review.gluster.org/#/c/glusterfs/+/23724/ fixes the issue and the patch is already merged upstream.

Comment 18 milind 2020-09-28 12:21:50 UTC

Steps:
1. Create a 3 node cluster.
2. Set cluster.brick-multiplex to enable.
3. Create 15 volumes of type replica 1x3. 
4. Start all the volumes one by one.
5. While the volumes are starting reboot one node.
----------------------------------
brick-mux is enabled 

[root@rhel7-node1 ~]# pidof glusterfsd
4101
[root@rhel7-node2 ~]# pidof glusterfsd
2191
[root@rhel7-node3 ~]# pidof glusterfsd
28544

[node1-rhel7 ~]# gluster v get all all
Option                                  Value                                   
------                                  -----                                   
cluster.server-quorum-ratio             51                                      
cluster.enable-shared-storage           disable                                 
cluster.op-version                      70000                                   
cluster.max-op-version                  70000                                   
cluster.brick-multiplex                 on                                      
cluster.max-bricks-per-process          250                                     
glusterd.vol_count_per_thread           100                                     
cluster.daemon-log-level                INFO  

[root@rhel7-node1 ~]# rpm -qa | grep -i glusterfs
glusterfs-client-xlators-6.0-45.el7rhgs.x86_64
glusterfs-libs-6.0-45.el7rhgs.x86_64
glusterfs-events-6.0-45.el7rhgs.x86_64
glusterfs-6.0-45.el7rhgs.x86_64
glusterfs-cli-6.0-45.el7rhgs.x86_64
glusterfs-rdma-6.0-45.el7rhgs.x86_64
glusterfs-server-6.0-45.el7rhgs.x86_64
glusterfs-fuse-6.0-45.el7rhgs.x86_64
glusterfs-api-6.0-45.el7rhgs.x86_64

============rhl8============
[root@rhel8-node1 ~]# pgrep glusterfsd
72721
[root@rhel8-node2 ~]# pgrep glusterfsd
1822
[root@rhel8-node3 ~]# pgrep glusterfsd
70504

[root@rhel8-node1 ~]# rpm -qa | grep -i glusterfs
glusterfs-6.0-45.el8rhgs.x86_64
glusterfs-fuse-6.0-45.el8rhgs.x86_64
glusterfs-api-6.0-45.el8rhgs.x86_64
glusterfs-selinux-1.0-1.el8rhgs.noarch
glusterfs-client-xlators-6.0-45.el8rhgs.x86_64
glusterfs-server-6.0-45.el8rhgs.x86_64
glusterfs-cli-6.0-45.el8rhgs.x86_64
glusterfs-libs-6.0-45.el8rhgs.x86_64

[root@rhel8-node1 ~]# gluster v get all all
Option                                  Value                                   
------                                  -----                                   
cluster.server-quorum-ratio             51%                                     
cluster.enable-shared-storage           disable                                 
cluster.op-version                      70000                                   
cluster.max-op-version                  70000                                   
cluster.brick-multiplex                 on                                      
cluster.max-bricks-per-process          250                                     
glusterd.vol_count_per_thread           100                                     
cluster.daemon-log-level                DEBUG     


As i see only one pid marking this bug as verified

Comment 20 errata-xmlrpc 2020-12-17 04:50:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5603

Note You need to log in before you can comment on or make changes to this bug.