Description of problem: ======================= I umounted one of the LV hosting a gluster brick , this resulted in all bricks going offline on that node. This is a very serious issue given that a brick can go offline for different purposes , like say a xfs corruption or disk failure, etc .but that should be isolated instead of bringing down all the other bricks. Note that I am NOT killing a PID of the brick. Had a 3 node setup, with each having 4 thin-LVs being used to host gluster bricks say the LVs are mounted on /rhs/brick{1..4} Brick multiplexing is enabled I create about 3 volumes as below V1->1x2->n1:b1 n2:b1 v2->2x2->n1:b2 n2:b2 n1:b3 n2:b2 v3->1x3->n1:b4 n2:b4 n3:b4 Now I unmounted the b4 so as to bring only one brick of v3 offline using umount -l this resulted in all the bricks on node1 go offline as below [root@dhcp35-192 bricks]# gluster v status Status of volume: distrep Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.192:/rhs/brick2/distrep N/A N/A N N/A Brick 10.70.35.214:/rhs/brick2/distrep 49154 0 Y 20321 Brick 10.70.35.192:/rhs/brick3/distrep N/A N/A N N/A Brick 10.70.35.215:/rhs/brick3/distrep 49154 0 Y 13393 Self-heal Daemon on localhost N/A N/A Y 6007 Self-heal Daemon on 10.70.35.214 N/A N/A Y 20583 Self-heal Daemon on 10.70.35.215 N/A N/A Y 13643 Task Status of Volume distrep ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: spencer Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.192:/rhs/brick4/spencer N/A N/A N N/A Brick 10.70.35.214:/rhs/brick4/spencer 49154 0 Y 20321 Brick 10.70.35.215:/rhs/brick4/spencer 49154 0 Y 13393 Self-heal Daemon on localhost N/A N/A Y 6007 Self-heal Daemon on 10.70.35.214 N/A N/A Y 20583 Self-heal Daemon on 10.70.35.215 N/A N/A Y 13643 Task Status of Volume spencer ------------------------------------------------------------------------------ There are no active volume tasks note that I did a umount of brick3 "umount -l /rhs/brick3" which was being used by distrep volume for second dht-subvol Version-Release number of selected component (if applicable): How reproducible: [root@dhcp35-192 bricks]# rpm -qa|grep glust glusterfs-fuse-3.10.0-1.el7.x86_64 glusterfs-rdma-3.10.0-1.el7.x86_64 glusterfs-libs-3.10.0-1.el7.x86_64 glusterfs-client-xlators-3.10.0-1.el7.x86_64 glusterfs-api-3.10.0-1.el7.x86_64 glusterfs-server-3.10.0-1.el7.x86_64 glusterfs-debuginfo-3.10.0-1.el7.x86_64 glusterfs-3.10.0-1.el7.x86_64 glusterfs-cli-3.10.0-1.el7.x86_64 Steps to Reproduce: 1.have a 2 or more node setup with multiple disks(or say LVs) for using them as bricks 2.create 2 or more volumes of any type such that each node hosts atleast one brick of each volume. Make sure that none of the brick is hosted on the same path (ie same LV or physical device) 3.now bring down one brick by doing a disk down or umount the lv Actual results: ======= all bricks on that node go down, which are associated with same PID Expected results: =========== the brick down should not result in all other bricks going down
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained. As a result this bug is being closed. If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.