Bug 1450889

Summary: Brick Multiplexing: On reboot of a node Brick multiplexing feature lost on that node as multiple brick processes get spawned
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: coreAssignee: Samikshan Bairagya <sbairagy>
Status: CLOSED ERRATA QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: amukherj, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: brick-multiplexing
Fixed In Version: glusterfs-3.8.4-26 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1451248 (view as bug list) Environment:
Last Closed: 2017-09-21 04:43:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1451248, 1453086, 1453087    
Bug Blocks: 1417151    

Description Nag Pavan Chilakam 2017-05-15 10:32:20 UTC
Description of problem:
========================
When you reboot a node with brick mux enabled and multi volume setup, I see that many glusterfsd are spawned and hence we lose the brick mux feature/


Version-Release number of selected component (if applicable):
========
3.8.4-25

How reproducible:
========
always

Steps to Reproduce:
1.have a 3 node setup with brick mux enabled, and vols say from v1..v10 with each volume being a 1x3 and one brick per node (all independent LVs)
2.we can see that only one glusterfsd per node exists
3.now reboot node1
4. on successful reboot, following is the status 



Last login: Mon May 15 15:56:55 2017 from dhcp35-77.lab.eng.blr.redhat.com
[root@dhcp35-45 ~]# ps -ef|grep glusterfsd
root      4693     1 42 15:56 ?        00:02:07 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id 1.10.70.35.45.rhs-brick1-1 -p /var/lib/glusterd/vols/1/run/10.70.35.45-rhs-brick1-1.pid -S /var/run/gluster/a19832cf9844ad10112aba39eba569a6.socket --brick-name /rhs/brick1/1 -l /var/log/glusterfs/bricks/rhs-brick1-1.log --xlator-option *-posix.glusterd-uuid=e4f737cd-59a2-4392-aa3d-4230f698f128 --brick-port 49152 --xlator-option 1-server.listen-port=49152
root      4701     1  0 15:56 ?        00:00:00 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id 10.10.70.35.45.rhs-brick10-10 -p /var/lib/glusterd/vols/10/run/10.70.35.45-rhs-brick10-10.pid -S /var/run/gluster/fd40f022ab677d36e57793a60cc16166.socket --brick-name /rhs/brick10/10 -l /var/log/glusterfs/bricks/rhs-brick10-10.log --xlator-option *-posix.glusterd-uuid=e4f737cd-59a2-4392-aa3d-4230f698f128 --brick-port 49153 --xlator-option 10-server.listen-port=49153
root      4709     1  0 15:56 ?        00:00:00 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id 2.10.70.35.45.rhs-brick2-2 -p /var/lib/glusterd/vols/2/run/10.70.35.45-rhs-brick2-2.pid -S /var/run/gluster/898f4e556d871cfb1613d6ff121bd5e6.socket --brick-name /rhs/brick2/2 -l /var/log/glusterfs/bricks/rhs-brick2-2.log --xlator-option *-posix.glusterd-uuid=e4f737cd-59a2-4392-aa3d-4230f698f128 --brick-port 49154 --xlator-option 2-server.listen-port=49154
root      4719     1  0 15:56 ?        00:00:00 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id 3.10.70.35.45.rhs-brick3-3 -p /var/lib/glusterd/vols/3/run/10.70.35.45-rhs-brick3-3.pid -S /var/run/gluster/af3354d92921146c0e8d3bebdcbec907.socket --brick-name /rhs/brick3/3 -l /var/log/glusterfs/bricks/rhs-brick3-3.log --xlator-option *-posix.glusterd-uuid=e4f737cd-59a2-4392-aa3d-4230f698f128 --brick-port 49155 --xlator-option 3-server.listen-port=49155
root      4728     1 44 15:56 ?        00:02:13 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id 4.10.70.35.45.rhs-brick4-4 -p /var/lib/glusterd/vols/4/run/10.70.35.45-rhs-brick4-4.pid -S /var/run/gluster/cafb15e7ed1d462ddf513e7cf80ca718.socket --brick-name /rhs/brick4/4 -l /var/log/glusterfs/bricks/rhs-brick4-4.log --xlator-option *-posix.glusterd-uuid=e4f737cd-59a2-4392-aa3d-4230f698f128 --brick-port 49156 --xlator-option 4-server.listen-port=49156
root      4734     1  0 15:56 ?        00:00:00 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id 5.10.70.35.45.rhs-brick5-5 -p /var/lib/glusterd/vols/5/run/10.70.35.45-rhs-brick5-5.pid -S /var/run/gluster/5a92ed518f554fe96a3c3f4a1ecf5cb3.socket --brick-name /rhs/brick5/5 -l /var/log/glusterfs/bricks/rhs-brick5-5.log --xlator-option *-posix.glusterd-uuid=e4f737cd-59a2-4392-aa3d-4230f698f128 --brick-port 49157 --xlator-option 5-server.listen-port=49157

Comment 4 Atin Mukherjee 2017-05-16 11:12:51 UTC
upstream patch : https://review.gluster.org/17307

Comment 5 Atin Mukherjee 2017-05-22 07:36:17 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/106803/

Comment 6 Atin Mukherjee 2017-05-24 12:18:58 UTC
there is one more downstream patch : https://code.engineering.redhat.com/gerrit/#/c/107204/

Comment 8 Nag Pavan Chilakam 2017-06-10 06:02:27 UTC
validation on 3.8.4-27
1)checked the same case with about 40 volumes all 1x3 and on reboot, the brick mux feature is retained===>single brick pid=-==>PASS
2)created a new volume 1x3 with uss enabled, got a different brick pid as expected, on reboot, all the remaining bricks got one PID, and this vol got different PID as expected=-==>PASS
3)create a new volume with same config as the orginal 40 vols and one more with uss enabled, they got respective brick PIDs===>pass
4)able to do IOs post reboot

hence marking as passed

Comment 10 errata-xmlrpc 2017-09-21 04:43:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774