Bug 1471794

Summary: Brick Multiplexing:Different brick processes pointing to same socket, process file and volfile-id
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: coreAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED WONTFIX QA Contact: Rahul Hinduja <rhinduja>
Severity: urgent Docs Contact:
Priority: medium    
Version: rhgs-3.3CC: amukherj, atumball, nchilaka, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: brick-multiplexing
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-13 12:47:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nag Pavan Chilakam 2017-07-17 12:54:32 UTC
Description of problem:
===========
I am raising a seperate bug to track the issue I mentioned originally in BZ#1444086 - Brick Multiplexing:Different brick processes pointing to same socket, process file and volfile-id must not lead to IO loss when one of the volume is down


I see that we can end up having different glusterfsd pointing to same socket and volfiles




Same problem is seen even on 3.8.4-33

The reason I am raising seperate bug, is based on discussion in BZ#1444086 (comment 11,12,13,14)




Description of problem:
=========================
with brick multiplexing it is quite easy to end up with two brick processes(glusterfsd) pointing to same socket and volfile-id

While, I need to understand the implications(which i suspect can be severe), this problem is consistently reproducible

Version-Release number of selected component (if applicable):
=====================
3.8.4-22 to 3.8.4-33(all builds)

How reproducible:
======
2/2

Steps to Reproduce:
1.have a gluster setup (i have 6 nodes) with brick multiplexing enabled
2.create a volume say v1 which is 1x3 spanning on n1,n2,n3
Now the glusterfsd will be something like this when checked on a node n1

root     20014     1  0 19:22 ?        00:00:00 /usr/sbin/glusterfsd -s 10.70.35.23 --volfile-id cross31.10.70.35.23.rhs-brick10-cross31 -p /var/lib/glusterd/vols/cross31/run/10.70.35.23-rhs-brick10-cross31.pid -S /var/lib/glusterd/vols/cross31/run/daemon-10.70.35.23.socket --brick-name /rhs/brick10/cross31 -l /var/log/glusterfs/bricks/rhs-brick10-cross31.log --xlator-option *-posix.glusterd-uuid=2b1a4ca7-5c9b-4169-add4-23530cea101a --brick-port 49153 --xlator-option cross31-server.listen-port=49153

3.Now create another 1x3 vol say v2, the bricks of this vol will be attached to same PIDS

4. Now enable USS on v1(or any option which will result in the v1 to get new PIDS for bricks on a restart)
5. Now stop and start v1


Actual results:
==========
it can be seen that a new PID for the bricks is spawned(due to vol option changed and hence avoiding to be connected to the first PID)

but the problem is the new PID is connected with same socket as first PID and so is the volfile-id and log file as below



[root@dhcp35-23 3.8.4-22]# ps -ef|grep glusterfsd
===>old PID
root     20014     1  0 19:22 ?        00:00:00 /usr/sbin/glusterfsd -s 10.70.35.23 --volfile-id cross31.10.70.35.23.rhs-brick10-cross31 -p /var/lib/glusterd/vols/cross31/run/10.70.35.23-rhs-brick10-cross31.pid -S /var/lib/glusterd/vols/cross31/run/daemon-10.70.35.23.socket --brick-name /rhs/brick10/cross31 -l /var/log/glusterfs/bricks/rhs-brick10-cross31.log --xlator-option *-posix.glusterd-uuid=2b1a4ca7-5c9b-4169-add4-23530cea101a --brick-port 49153 --xlator-option cross31-server.listen-port=49153

==>new PID
root     20320     1  0 19:27 ?        00:00:00 /usr/sbin/glusterfsd -s 10.70.35.23 --volfile-id cross31.10.70.35.23.rhs-brick10-cross31 -p /var/lib/glusterd/vols/cross31/run/10.70.35.23-rhs-brick10-cross31.pid -S /var/lib/glusterd/vols/cross31/run/daemon-10.70.35.23.socket --brick-name /rhs/brick10/cross31 -l /var/log/glusterfs/bricks/rhs-brick10-cross31.log --xlator-option *-posix.glusterd-uuid=2b1a4ca7-5c9b-4169-add4-23530cea101a --brick-port 49152 --xlator-option cross31-server.listen-port=49152
root     20340     1  0 19:27 ?        00:00:00 /usr/sbin/glusterfsd -s localhost --volfile-id snapd/cross31 -p /var/lib/glusterd/vols/cross31/run/cross31-snapd.pid -l /var/log/glusterfs/snaps/cross31/snapd.log --brick-name snapd-cross31 -S /var/run/gluster/d451ea3d83a68af025cee105cafdd8a2.socket --brick-port 49154 --xlator-option cross31-server.listen-port=49154 --no-mem-accounting
root     20472 30155  0 19:38 pts/0    00:00:00 grep --color=auto glusterfs