Description of problem: ==================== with brick mux enabled, if we bring down a brick using umount of lv, I see that heal info shows the brick online instead of transport end point error see in below case the first brick is offline, but shows as below [root@dhcp35-45 ~]# time gluster v heal test3_9 info Brick 10.70.35.45:/rhs/brick9/test3_9 Status: Connected Number of entries: 0 Brick 10.70.35.130:/rhs/brick9/test3_9 / Status: Connected Number of entries: 1 Brick 10.70.35.122:/rhs/brick9/test3_9 / Status: Connected Number of entries: 1 note: root cause could be the same as 1450806 - Brick Multiplexing: Brick process shows as online in vol status even when brick is offline
upstream patch : https://review.gluster.org/17287
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/106263
on_qa validation:3.8.4-27 I am now seeing transport end point error when the brick is down, and it is seen only on the volume where the brick is brought down(and all associated volumes whose brick is same pid) (but not on volumes who don't share the brick pid) [root@dhcp35-45 ~]# gluster v heal test3_31 Launching heal operation to perform index self heal on volume test3_31 has been unsuccessful on bricks that are down. Please check if all brick processes are running. [root@dhcp35-45 ~]# gluster v heal test3_31 info Brick 10.70.35.45:/rhs/brick31/test3_31 Status: Transport endpoint is not connected Number of entries: - Brick 10.70.35.130:/rhs/brick31/test3_31 Status: Connected Number of entries: 0 Brick 10.70.35.122:/rhs/brick31/test3_31 Status: Connected Number of entries: 0 [root@dhcp35-45 ~]# gluster v status Status of volume: test3_31 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.45:/rhs/brick31/test3_31 N/A N/A N N/A Brick 10.70.35.130:/rhs/brick31/test3_31 49152 0 Y 30495 Brick 10.70.35.122:/rhs/brick31/test3_31 49152 0 Y 14828 Self-heal Daemon on localhost N/A N/A Y 27795 Self-heal Daemon on 10.70.35.23 N/A N/A Y 26963 Self-heal Daemon on 10.70.35.130 N/A N/A Y 807 Self-heal Daemon on 10.70.35.122 N/A N/A Y 17576 Task Status of Volume test3_31 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: test3_32 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.45:/rhs/brick32/test3_32 N/A N/A N N/A Brick 10.70.35.130:/rhs/brick32/test3_32 49152 0 Y 30495 Brick 10.70.35.122:/rhs/brick32/test3_32 49152 0 Y 14828 Self-heal Daemon on localhost N/A N/A Y 27795 Self-heal Daemon on 10.70.35.23 N/A N/A Y 26963 Self-heal Daemon on 10.70.35.130 N/A N/A Y 807 Self-heal Daemon on 10.70.35.122 N/A N/A Y 17576 hence moving to verified 3.8.4-27 is test version on el7.4 beta
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774