1436595 – Brick Multiplexing: A brick going down will result in all the bricks sharing the same PID go down

Bug 1436595 - Brick Multiplexing: A brick going down will result in all the bricks sharing the same PID go down

Summary: Brick Multiplexing: A brick going down will result in all the bricks sharing ...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	3.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Jeff Darcy
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-03-28 09:19 UTC by Nag Pavan Chilakam
Modified:	2018-06-20 18:28 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-06-20 18:28:42 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nag Pavan Chilakam 2017-03-28 09:19:38 UTC

Description of problem:
=======================
I umounted one of the LV hosting a gluster brick , this resulted in all bricks going offline on that node.
This is a very serious issue given that a brick can go offline for different purposes , like say a xfs corruption or disk failure, etc .but that should be isolated instead of bringing down all the other bricks. Note that I am NOT killing a PID of the brick.
Had a 3 node setup, with each having 4 thin-LVs being used to host gluster bricks 
say the LVs are mounted on /rhs/brick{1..4}
Brick multiplexing is enabled
I create about 3 volumes as below
V1->1x2->n1:b1 n2:b1
v2->2x2->n1:b2 n2:b2 n1:b3 n2:b2
v3->1x3->n1:b4 n2:b4 n3:b4

Now I unmounted the b4 so as to bring only one brick of v3 offline using umount -l
this resulted in all the bricks on node1 go offline as below


[root@dhcp35-192 bricks]# gluster v status
Status of volume: distrep
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.192:/rhs/brick2/distrep      N/A       N/A        N       N/A  
Brick 10.70.35.214:/rhs/brick2/distrep      49154     0          Y       20321
Brick 10.70.35.192:/rhs/brick3/distrep      N/A       N/A        N       N/A  
Brick 10.70.35.215:/rhs/brick3/distrep      49154     0          Y       13393
Self-heal Daemon on localhost               N/A       N/A        Y       6007 
Self-heal Daemon on 10.70.35.214            N/A       N/A        Y       20583
Self-heal Daemon on 10.70.35.215            N/A       N/A        Y       13643
 
Task Status of Volume distrep
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: spencer
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.192:/rhs/brick4/spencer      N/A       N/A        N       N/A  
Brick 10.70.35.214:/rhs/brick4/spencer      49154     0          Y       20321
Brick 10.70.35.215:/rhs/brick4/spencer      49154     0          Y       13393
Self-heal Daemon on localhost               N/A       N/A        Y       6007 
Self-heal Daemon on 10.70.35.214            N/A       N/A        Y       20583
Self-heal Daemon on 10.70.35.215            N/A       N/A        Y       13643
 
Task Status of Volume spencer
------------------------------------------------------------------------------
There are no active volume tasks


note that I did a umount of brick3 "umount  -l /rhs/brick3" which was being used by distrep volume for second dht-subvol

Version-Release number of selected component (if applicable):


How reproducible:
[root@dhcp35-192 bricks]# rpm -qa|grep glust
glusterfs-fuse-3.10.0-1.el7.x86_64
glusterfs-rdma-3.10.0-1.el7.x86_64
glusterfs-libs-3.10.0-1.el7.x86_64
glusterfs-client-xlators-3.10.0-1.el7.x86_64
glusterfs-api-3.10.0-1.el7.x86_64
glusterfs-server-3.10.0-1.el7.x86_64
glusterfs-debuginfo-3.10.0-1.el7.x86_64
glusterfs-3.10.0-1.el7.x86_64
glusterfs-cli-3.10.0-1.el7.x86_64



Steps to Reproduce:
1.have a 2 or more node setup with multiple disks(or say LVs) for using them as bricks
2.create 2 or more volumes of any type such that each node hosts atleast one brick of each volume. Make sure that none of the brick is hosted on the same path (ie same LV or physical device)
3.now bring down one brick by doing a disk down or umount the lv


Actual results:
=======
all bricks on that node go down, which are associated with same PID

Expected results:
===========
the brick down should not result in all other bricks going down

Comment 1 Shyamsundar 2018-06-20 18:28:42 UTC

This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.

Note You need to log in before you can comment on or make changes to this bug.