Description of problem: On a 3 node gluster cluster with cluster.brick-multiplex set to enabled, when we start volumes and reboot one node while volume start is happening. Multiple birck processes were observed on the node where reboot was done. Node 1: # pidof glusterfsd 6106 Node 2: # pidof glusterfsd 6106 Node 3:(Rebooted node) # pidof glusterfsd 6461 6353 6254 6189 6165 6138 6114 6086 6060 6034 6009 5951 5815 ################################################################################ # ps -ef | grep glusterfsd root 5815 1 0 15:14 ? 00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_10.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_10 -p /var/run/gluster/vols/volume_10/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_10.pid -S /var/run/gluster/393ad8cfc7b36661.socket --brick-name /bricks/brick1/volume_10 -l /var/log/glusterfs/bricks/bricks-brick1-volume_10.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49152 --xlator-option volume_10-server.listen-port=49152 root 5951 1 0 15:14 ? 00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_6.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_6 -p /var/run/gluster/vols/volume_6/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_6.pid -S /var/run/gluster/ec547bb79ce04cee.socket --brick-name /bricks/brick1/volume_6 -l /var/log/glusterfs/bricks/bricks-brick1-volume_6.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49153 --xlator-option volume_6-server.listen-port=49153 root 6009 1 0 15:14 ? 00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_12.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_12 -p /var/run/gluster/vols/volume_12/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_12.pid -S /var/run/gluster/5e3f1ba13aea6ef4.socket --brick-name /bricks/brick1/volume_12 -l /var/log/glusterfs/bricks/bricks-brick1-volume_12.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49154 --xlator-option volume_12-server.listen-port=49154 root 6034 1 0 15:14 ? 00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_13.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_13 -p /var/run/gluster/vols/volume_13/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_13.pid -S /var/run/gluster/3b4e349e1dd71410.socket --brick-name /bricks/brick1/volume_13 -l /var/log/glusterfs/bricks/bricks-brick1-volume_13.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49155 --xlator-option volume_13-server.listen-port=49155 root 6060 1 0 15:14 ? 00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_14.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_14 -p /var/run/gluster/vols/volume_14/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_14.pid -S /var/run/gluster/f03f82234f5e1bd2.socket --brick-name /bricks/brick1/volume_14 -l /var/log/glusterfs/bricks/bricks-brick1-volume_14.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49156 --xlator-option volume_14-server.listen-port=49156 root 6086 1 0 15:14 ? 00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_15.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_15 -p /var/run/gluster/vols/volume_15/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_15.pid -S /var/run/gluster/40a785a07a9eb911.socket --brick-name /bricks/brick1/volume_15 -l /var/log/glusterfs/bricks/bricks-brick1-volume_15.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49157 --xlator-option volume_15-server.listen-port=49157 root 6114 1 0 15:14 ? 00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_2.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_2 -p /var/run/gluster/vols/volume_2/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_2.pid -S /var/run/gluster/b7d0b9936575d900.socket --brick-name /bricks/brick1/volume_2 -l /var/log/glusterfs/bricks/bricks-brick1-volume_2.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49158 --xlator-option volume_2-server.listen-port=49158 root 6138 1 0 15:14 ? 00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_3.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_3 -p /var/run/gluster/vols/volume_3/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_3.pid -S /var/run/gluster/3292bfcc257a5866.socket --brick-name /bricks/brick1/volume_3 -l /var/log/glusterfs/bricks/bricks-brick1-volume_3.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49159 --xlator-option volume_3-server.listen-port=49159 root 6165 1 0 15:14 ? 00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_4.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_4 -p /var/run/gluster/vols/volume_4/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_4.pid -S /var/run/gluster/3da670a19f480606.socket --brick-name /bricks/brick1/volume_4 -l /var/log/glusterfs/bricks/bricks-brick1-volume_4.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49160 --xlator-option volume_4-server.listen-port=49160 root 6189 1 0 15:14 ? 00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_5.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_5 -p /var/run/gluster/vols/volume_5/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_5.pid -S /var/run/gluster/5c5df1315902ea82.socket --brick-name /bricks/brick1/volume_5 -l /var/log/glusterfs/bricks/bricks-brick1-volume_5.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49161 --xlator-option volume_5-server.listen-port=49161 root 6254 1 0 15:14 ? 00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_7.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_7 -p /var/run/gluster/vols/volume_7/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_7.pid -S /var/run/gluster/b10e8b7ce17d1caa.socket --brick-name /bricks/brick1/volume_7 -l /var/log/glusterfs/bricks/bricks-brick1-volume_7.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49162 --xlator-option volume_7-server.listen-port=49162 root 6353 1 0 15:15 ? 00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_8.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_8 -p /var/run/gluster/vols/volume_8/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_8.pid -S /var/run/gluster/481a8db656b4787d.socket --brick-name /bricks/brick1/volume_8 -l /var/log/glusterfs/bricks/bricks-brick1-volume_8.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49163 --xlator-option volume_8-server.listen-port=49163 root 6461 1 0 15:15 ? 00:00:00 /usr/sbin/glusterfsd -s dhcp35-132.lab.eng.blr.redhat.com --volfile-id volume_9.dhcp35-132.lab.eng.blr.redhat.com.bricks-brick1-volume_9 -p /var/run/gluster/vols/volume_9/dhcp35-132.lab.eng.blr.redhat.com-bricks-brick1-volume_9.pid -S /var/run/gluster/4ea7013f9d77e427.socket --brick-name /bricks/brick1/volume_9 -l /var/log/glusterfs/bricks/bricks-brick1-volume_9.log --xlator-option *-posix.glusterd-uuid=f5c3337f-790d-4426-a608-83168b0f904b --brick-port 49164 --xlator-option volume_9-server.listen-port=49164 root 6931 5853 0 15:36 pts/0 00:00:00 grep --color=auto glusterfsd ################################################################################ Version-Release number of selected component (if applicable): glusterfs-3.12.2-45 How reproducible: 4/4 Steps to Reproduce: 1. Create a 3 node cluster. 2. Set cluster.brick-multiplex to enable. 3. Create 15 volumes of type replica 1x3. 4. Start all the volumes one by one. 5. While the volumes are starting reboot one node. Actual results: Multiple brick processes are observed. Expected results: Single brick process should be observed.
I've encountered this on gluster 5.5 while upgrading servers. Multiple glusterfsd processes per volume spawned, disrupting healing and causing significant problems for the VMs using these volumes.
Realized I'm probably seeing something different, albeit maybe with the same root cause, so opened https://bugzilla.redhat.com/show_bug.cgi?id=1698131. I have multiplexing disabled on my systems.
Upstream patch : https://review.gluster.org/22635
(In reply to Atin Mukherjee from comment #10) > Upstream patch : https://review.gluster.org/22635 The patch is in abandoned state, moving it to assigned state.
https://review.gluster.org/#/c/glusterfs/+/23724/ fixes the issue and the patch is already merged upstream.
Steps: 1. Create a 3 node cluster. 2. Set cluster.brick-multiplex to enable. 3. Create 15 volumes of type replica 1x3. 4. Start all the volumes one by one. 5. While the volumes are starting reboot one node. ---------------------------------- brick-mux is enabled [root@rhel7-node1 ~]# pidof glusterfsd 4101 [root@rhel7-node2 ~]# pidof glusterfsd 2191 [root@rhel7-node3 ~]# pidof glusterfsd 28544 [node1-rhel7 ~]# gluster v get all all Option Value ------ ----- cluster.server-quorum-ratio 51 cluster.enable-shared-storage disable cluster.op-version 70000 cluster.max-op-version 70000 cluster.brick-multiplex on cluster.max-bricks-per-process 250 glusterd.vol_count_per_thread 100 cluster.daemon-log-level INFO [root@rhel7-node1 ~]# rpm -qa | grep -i glusterfs glusterfs-client-xlators-6.0-45.el7rhgs.x86_64 glusterfs-libs-6.0-45.el7rhgs.x86_64 glusterfs-events-6.0-45.el7rhgs.x86_64 glusterfs-6.0-45.el7rhgs.x86_64 glusterfs-cli-6.0-45.el7rhgs.x86_64 glusterfs-rdma-6.0-45.el7rhgs.x86_64 glusterfs-server-6.0-45.el7rhgs.x86_64 glusterfs-fuse-6.0-45.el7rhgs.x86_64 glusterfs-api-6.0-45.el7rhgs.x86_64 ============rhl8============ [root@rhel8-node1 ~]# pgrep glusterfsd 72721 [root@rhel8-node2 ~]# pgrep glusterfsd 1822 [root@rhel8-node3 ~]# pgrep glusterfsd 70504 [root@rhel8-node1 ~]# rpm -qa | grep -i glusterfs glusterfs-6.0-45.el8rhgs.x86_64 glusterfs-fuse-6.0-45.el8rhgs.x86_64 glusterfs-api-6.0-45.el8rhgs.x86_64 glusterfs-selinux-1.0-1.el8rhgs.noarch glusterfs-client-xlators-6.0-45.el8rhgs.x86_64 glusterfs-server-6.0-45.el8rhgs.x86_64 glusterfs-cli-6.0-45.el8rhgs.x86_64 glusterfs-libs-6.0-45.el8rhgs.x86_64 [root@rhel8-node1 ~]# gluster v get all all Option Value ------ ----- cluster.server-quorum-ratio 51% cluster.enable-shared-storage disable cluster.op-version 70000 cluster.max-op-version 70000 cluster.brick-multiplex on cluster.max-bricks-per-process 250 glusterd.vol_count_per_thread 100 cluster.daemon-log-level DEBUG As i see only one pid marking this bug as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5603