Description of problem: ======================== When you reboot a node with brick mux enabled and multi volume setup, I see that many glusterfsd are spawned and hence we lose the brick mux feature/ Version-Release number of selected component (if applicable): ======== 3.8.4-25 How reproducible: ======== always Steps to Reproduce: 1.have a 3 node setup with brick mux enabled, and vols say from v1..v10 with each volume being a 1x3 and one brick per node (all independent LVs) 2.we can see that only one glusterfsd per node exists 3.now reboot node1 4. on successful reboot, following is the status Last login: Mon May 15 15:56:55 2017 from dhcp35-77.lab.eng.blr.redhat.com [root@dhcp35-45 ~]# ps -ef|grep glusterfsd root 4693 1 42 15:56 ? 00:02:07 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id 1.10.70.35.45.rhs-brick1-1 -p /var/lib/glusterd/vols/1/run/10.70.35.45-rhs-brick1-1.pid -S /var/run/gluster/a19832cf9844ad10112aba39eba569a6.socket --brick-name /rhs/brick1/1 -l /var/log/glusterfs/bricks/rhs-brick1-1.log --xlator-option *-posix.glusterd-uuid=e4f737cd-59a2-4392-aa3d-4230f698f128 --brick-port 49152 --xlator-option 1-server.listen-port=49152 root 4701 1 0 15:56 ? 00:00:00 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id 10.10.70.35.45.rhs-brick10-10 -p /var/lib/glusterd/vols/10/run/10.70.35.45-rhs-brick10-10.pid -S /var/run/gluster/fd40f022ab677d36e57793a60cc16166.socket --brick-name /rhs/brick10/10 -l /var/log/glusterfs/bricks/rhs-brick10-10.log --xlator-option *-posix.glusterd-uuid=e4f737cd-59a2-4392-aa3d-4230f698f128 --brick-port 49153 --xlator-option 10-server.listen-port=49153 root 4709 1 0 15:56 ? 00:00:00 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id 2.10.70.35.45.rhs-brick2-2 -p /var/lib/glusterd/vols/2/run/10.70.35.45-rhs-brick2-2.pid -S /var/run/gluster/898f4e556d871cfb1613d6ff121bd5e6.socket --brick-name /rhs/brick2/2 -l /var/log/glusterfs/bricks/rhs-brick2-2.log --xlator-option *-posix.glusterd-uuid=e4f737cd-59a2-4392-aa3d-4230f698f128 --brick-port 49154 --xlator-option 2-server.listen-port=49154 root 4719 1 0 15:56 ? 00:00:00 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id 3.10.70.35.45.rhs-brick3-3 -p /var/lib/glusterd/vols/3/run/10.70.35.45-rhs-brick3-3.pid -S /var/run/gluster/af3354d92921146c0e8d3bebdcbec907.socket --brick-name /rhs/brick3/3 -l /var/log/glusterfs/bricks/rhs-brick3-3.log --xlator-option *-posix.glusterd-uuid=e4f737cd-59a2-4392-aa3d-4230f698f128 --brick-port 49155 --xlator-option 3-server.listen-port=49155 root 4728 1 44 15:56 ? 00:02:13 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id 4.10.70.35.45.rhs-brick4-4 -p /var/lib/glusterd/vols/4/run/10.70.35.45-rhs-brick4-4.pid -S /var/run/gluster/cafb15e7ed1d462ddf513e7cf80ca718.socket --brick-name /rhs/brick4/4 -l /var/log/glusterfs/bricks/rhs-brick4-4.log --xlator-option *-posix.glusterd-uuid=e4f737cd-59a2-4392-aa3d-4230f698f128 --brick-port 49156 --xlator-option 4-server.listen-port=49156 root 4734 1 0 15:56 ? 00:00:00 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id 5.10.70.35.45.rhs-brick5-5 -p /var/lib/glusterd/vols/5/run/10.70.35.45-rhs-brick5-5.pid -S /var/run/gluster/5a92ed518f554fe96a3c3f4a1ecf5cb3.socket --brick-name /rhs/brick5/5 -l /var/log/glusterfs/bricks/rhs-brick5-5.log --xlator-option *-posix.glusterd-uuid=e4f737cd-59a2-4392-aa3d-4230f698f128 --brick-port 49157 --xlator-option 5-server.listen-port=49157
upstream patch : https://review.gluster.org/17307
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/106803/
there is one more downstream patch : https://code.engineering.redhat.com/gerrit/#/c/107204/
validation on 3.8.4-27 1)checked the same case with about 40 volumes all 1x3 and on reboot, the brick mux feature is retained===>single brick pid=-==>PASS 2)created a new volume 1x3 with uss enabled, got a different brick pid as expected, on reboot, all the remaining bricks got one PID, and this vol got different PID as expected=-==>PASS 3)create a new volume with same config as the orginal 40 vols and one more with uss enabled, they got respective brick PIDs===>pass 4)able to do IOs post reboot hence marking as passed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774