Bug 1444861

Summary:	Brick Multiplexing: bricks of volume going offline possibly because the brick PID is associated with another volume which was brought down
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Nag Pavan Chilakam <nchilaka>
Component:	core	Assignee:	Mohit Agrawal <moagrawa>
Status:	CLOSED ERRATA	QA Contact:	Nag Pavan Chilakam <nchilaka>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.3	CC:	amukherj, nchilaka, rhs-bugs, storage-qa-internal
Target Milestone:	---
Target Release:	RHGS 3.3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	brick-multiplexing
Fixed In Version:	glusterfs-3.8.4-25	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-09-21 04:39:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1450630
Bug Blocks:	1417151

Description Nag Pavan Chilakam 2017-04-24 12:47:01 UTC

Description of problem:
=======================
I am able to recreate a scenario where bricks of one volume can go down if there is a second volume pointing to the same socket file and pid file and that second volume is brought down even after the first volume gets a different PID on vol configuration change.
Steps to reproduce will help understand better the situation

Version-Release number of selected component (if applicable):
====
3.8.4-23

How reproducible:
==============
2/2


Steps to Reproduce:
=====================
1.have a 6 node setup with multiple bricks, DO NOT enable brick mux yet
2.create a 2x2 volume on n3..n6 say the vol is v1 and start it 
3.create another volume same as above say, v3 and start it (note i am creating v3 before v2)
4. again create another vol v2 same as above and start it (note i am creating v3 before v2)
5. now stop v1 and delete it
6. enable brick multiplexing
7. now create v4 volume, same config as previous volumes and start it
8. now enable uss on v2
9. now stop v2  and restart v2
10. now stop v4 and it can be seen that even v2 bricks go down (possibly coz the pid and socket file of v4 were pointing to old v2 details)
11. now check the status of all volumes, it can be seen that v2 is still down and  hence cannot mount the volume



################### pasting exact commands############
 1134  gluster peer status
 1135  history
 1136  '
 1137  cd ~
 1138  ls
 1139  gluster v create v1 rep 2 10.70.35.122:/rhs/brick1/v1 10.70.35.23:/rhs/brick1/v1 10.70.35.112:/rhs/brick1/v1 10.70.35.138:/rhs/brick1/v1
 1140  gluster v start v1
 1141  gluster v get all all
 1142  gluster v status v1
 1143  gluster v create v2 rep 2 10.70.35.122:/rhs/brick2/v2 10.70.35.23:/rhs/brick2/v2 10.70.35.112:/rhs/brick2/v2 10.70.35.138:/rhs/brick2/v2
 1144  gluster v create v3 rep 2 10.70.35.122:/rhs/brick3/v3 10.70.35.23:/rhs/brick3/v3 10.70.35.112:/rhs/brick3/v3 10.70.35.138:/rhs/brick3/v3
 1145  gluster v start v3
 1146  gluster  v start v2
 1147  gluster v status 
 1148  clear
 1149  gluster v status 
 1150   gluster v stop v1
 1151  gluster v dele v1
 1152  gluster v status
 1153  gluster v set all  cluster.brick-multiplex enable
 1154  gluster v create v4 rep 2 10.70.35.122:/rhs/brick4/v4 10.70.35.23:/rhs/brick4/v4 10.70.35.112:/rhs/brick4/v4 10.70.35.138:/rhs/brick4/v4
 1155  gluster v start v4
 1156  gluster v status
 1157  gluster v set v2 features.uss enable
 1158  gluster v stop v2
 1159  gluster v start v2
 1160  gluster v status v2
 1161  gluster v status v4
 1162  gluster v status v2
 1163  gluster v status v4
 1164  gluster v status v2
 1165  gluster v status v1
 1166  gluster v status v2
 1167  gluster v status v3
 1168  gluster v status v4
 1169  gluster v stop v4
 1170  gluster v status v2
 1171  history
 1172  gluster v status v4
 1173  gluster v status v2
 1174  gluster v start v4
 1175  gluster v status v4
 1176  gluster v status v2
 1177  history|grep gluster
 1178  history


[root@dhcp35-45 ~]# gluster v info
 
Volume Name: v2
Type: Distributed-Replicate
Volume ID: 02261f5c-b7df-4dbb-86ce-6419efd93152
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.35.122:/rhs/brick2/v2
Brick2: 10.70.35.23:/rhs/brick2/v2
Brick3: 10.70.35.112:/rhs/brick2/v2
Brick4: 10.70.35.138:/rhs/brick2/v2
Options Reconfigured:
features.uss: enable
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: enable
 
Volume Name: v3
Type: Distributed-Replicate
Volume ID: 8fb3daca-03ff-4022-ba2f-b475231fdcce
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.35.122:/rhs/brick3/v3
Brick2: 10.70.35.23:/rhs/brick3/v3
Brick3: 10.70.35.112:/rhs/brick3/v3
Brick4: 10.70.35.138:/rhs/brick3/v3
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: enable
 
Volume Name: v4
Type: Distributed-Replicate
Volume ID: c5477eda-eaea-474a-b1ee-a55dee58c461
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.35.122:/rhs/brick4/v4
Brick2: 10.70.35.23:/rhs/brick4/v4
Brick3: 10.70.35.112:/rhs/brick4/v4
Brick4: 10.70.35.138:/rhs/brick4/v4
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: enable
[root@dhcp35-45 ~]# gluster v status
Status of volume: v2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.122:/rhs/brick2/v2           N/A       N/A        N       N/A  
Brick 10.70.35.23:/rhs/brick2/v2            N/A       N/A        N       N/A  
Brick 10.70.35.112:/rhs/brick2/v2           N/A       N/A        N       N/A  
Brick 10.70.35.138:/rhs/brick2/v2           N/A       N/A        N       N/A  
Snapshot Daemon on localhost                49152     0          Y       23875
Self-heal Daemon on localhost               N/A       N/A        Y       24303
Snapshot Daemon on 10.70.35.130             49152     0          Y       12063
Self-heal Daemon on 10.70.35.130            N/A       N/A        Y       12312
Snapshot Daemon on 10.70.35.112             49155     0          Y       31066
Self-heal Daemon on 10.70.35.112            N/A       N/A        Y       31328
Snapshot Daemon on 10.70.35.23              49155     0          Y       31262
Self-heal Daemon on 10.70.35.23             N/A       N/A        Y       31523
Snapshot Daemon on 10.70.35.138             49155     0          Y       11405
Self-heal Daemon on 10.70.35.138            N/A       N/A        Y       11667
Snapshot Daemon on 10.70.35.122             49155     0          Y       13063
Self-heal Daemon on 10.70.35.122            N/A       N/A        Y       13324
 
Task Status of Volume v2
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: v3
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.122:/rhs/brick3/v3           49153     0          Y       12743
Brick 10.70.35.23:/rhs/brick3/v3            49153     0          Y       30943
Brick 10.70.35.112:/rhs/brick3/v3           49153     0          Y       30745
Brick 10.70.35.138:/rhs/brick3/v3           49153     0          Y       11084
Self-heal Daemon on localhost               N/A       N/A        Y       24303
Self-heal Daemon on 10.70.35.130            N/A       N/A        Y       12312
Self-heal Daemon on 10.70.35.23             N/A       N/A        Y       31523
Self-heal Daemon on 10.70.35.122            N/A       N/A        Y       13324
Self-heal Daemon on 10.70.35.112            N/A       N/A        Y       31328
Self-heal Daemon on 10.70.35.138            N/A       N/A        Y       11667
 
Task Status of Volume v3
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: v4
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.122:/rhs/brick4/v4           49153     0          Y       12743
Brick 10.70.35.23:/rhs/brick4/v4            49153     0          Y       30943
Brick 10.70.35.112:/rhs/brick4/v4           49153     0          Y       30745
Brick 10.70.35.138:/rhs/brick4/v4           49153     0          Y       11084
Self-heal Daemon on localhost               N/A       N/A        Y       24303
Self-heal Daemon on 10.70.35.130            N/A       N/A        Y       12312
Self-heal Daemon on 10.70.35.112            N/A       N/A        Y       31328
Self-heal Daemon on 10.70.35.23             N/A       N/A        Y       31523
Self-heal Daemon on 10.70.35.122            N/A       N/A        Y       13324
Self-heal Daemon on 10.70.35.138            N/A       N/A        Y       11667
 
Task Status of Volume v4
------------------------------------------------------------------------------
There are no active volume tasks

Comment 2 Nag Pavan Chilakam 2017-04-24 13:33:53 UTC

sosreports http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1444861/

Comment 5 Atin Mukherjee 2017-05-04 13:04:42 UTC

upstream patch : https://review.gluster.org/#/c/17101/

Comment 6 Atin Mukherjee 2017-05-09 06:35:10 UTC

Upstream patches : https://review.gluster.org/#/q/topic:bug-1444596

Downstream patches:

https://code.engineering.redhat.com/gerrit/#/c/105595/
https://code.engineering.redhat.com/gerrit/#/c/105596/

Comment 8 Nag Pavan Chilakam 2017-05-15 09:33:08 UTC

QA validation:
Moving to failed_qa
If i bring down a brick of one volume , it still is disconnecting all the bricks from the glusterfsd

did below steps
1)have a cluster with brick mux enabled
2) created 10 vols of 1x3 type 
3) brought down b1 of vol7 (using umount of lv)
4) now mount vol7 and vol1(base vol) and vol3(any other vol)
5)do IOs to all the above vols
==>you will see that all the bricks associated with the same glusterfsd as b1 of vol7 would not be receiving any IO, effectively losing the brick availability

you can check even the heal info for the volume, it will show files as heal pending

and checked the backend brick


test version
====
3.8.4-25

Comment 9 Mohit Agrawal 2017-05-15 09:46:17 UTC

Nag,

 This is a known issue and currently it(scenario) is not handled completely.The issue will come only when brick has down in some ungraceful manner and as per bugzilla earlier brick was down in some graceful way(through the cli).

 So please verified to this bugzilla followed same procedure as you mentioned in comment 1

 For specific to handle this kind of scenario fix in under progress from below patch 
  https://review.gluster.org/17287

Regards
Mohit Agrawal

Comment 10 Atin Mukherjee 2017-05-15 12:39:16 UTC

I agree with Mohit. The steps which were followed to file this bug and the steps which were followed to verify this bug are different. please follow the same steps and reconfirm.

Comment 11 Nag Pavan Chilakam 2017-06-13 09:36:27 UTC

I cannot verify this until BZ#1450630 is fixed

Comment 12 Mohit Agrawal 2017-06-13 09:54:27 UTC

Patch for (BZ#1450630) is already merged in downstream from this bugzilla 
https://bugzilla.redhat.com/show_bug.cgi?id=1450806. 


Regards
Mohit Agrawal

Comment 13 Atin Mukherjee 2017-06-13 09:59:17 UTC

Mohit - the current build doesn't have the fix, so Nag's comment is valid.
Nag - as this bug has been moved to MODIFIED state, expect this fix to land in the next build.

Comment 14 Nag Pavan Chilakam 2017-07-17 12:00:27 UTC

On_qa validation:
3.8.4-33 is the test build

ran both the cases mentioned in 
1)description
2)comment#8


not seeing the issue anymore hence moving to verified

Comment 16 errata-xmlrpc 2017-09-21 04:39:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774