Bug 1450813 - Brick Multiplexing: heal info shows brick as online even when it is brought down
Summary: Brick Multiplexing: heal info shows brick as online even when it is brought down
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.3.0
Assignee: Atin Mukherjee
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard: brick-multiplexing
Depends On: 1450630 1458570
Blocks: 1417151
TreeView+ depends on / blocked
 
Reported: 2017-05-15 08:45 UTC by Nag Pavan Chilakam
Modified: 2017-09-21 04:41 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.8.4-26
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-21 04:41:45 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Nag Pavan Chilakam 2017-05-15 08:45:33 UTC
Description of problem:
====================
with  brick mux enabled, if we bring down a brick using umount of lv, I see that heal info shows the brick online instead of transport end point error

see in below case  the first brick is offline, but shows as below


[root@dhcp35-45 ~]# time gluster v heal test3_9 info
Brick 10.70.35.45:/rhs/brick9/test3_9
Status: Connected
Number of entries: 0

Brick 10.70.35.130:/rhs/brick9/test3_9
/ 
Status: Connected
Number of entries: 1

Brick 10.70.35.122:/rhs/brick9/test3_9
/ 
Status: Connected
Number of entries: 1





note: root cause could be the same as 1450806 - Brick Multiplexing: Brick process shows as online in vol status even when brick is offline

Comment 3 Atin Mukherjee 2017-05-16 02:03:12 UTC
upstream patch : https://review.gluster.org/17287

Comment 4 Atin Mukherjee 2017-05-16 04:26:56 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/106263

Comment 9 Nag Pavan Chilakam 2017-06-07 14:29:36 UTC
on_qa validation:3.8.4-27
I am now seeing transport end point error when the brick is down, and it is seen only on the volume where the brick is brought down(and all associated volumes whose brick is same pid) (but not on volumes who don't share the brick pid)


[root@dhcp35-45 ~]# gluster v heal test3_31
Launching heal operation to perform index self heal on volume test3_31 has been unsuccessful on bricks that are down. Please check if all brick processes are running.
[root@dhcp35-45 ~]# gluster v heal test3_31 info
Brick 10.70.35.45:/rhs/brick31/test3_31
Status: Transport endpoint is not connected
Number of entries: -

Brick 10.70.35.130:/rhs/brick31/test3_31
Status: Connected
Number of entries: 0

Brick 10.70.35.122:/rhs/brick31/test3_31
Status: Connected
Number of entries: 0

[root@dhcp35-45 ~]# gluster v status
Status of volume: test3_31
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.45:/rhs/brick31/test3_31     N/A       N/A        N       N/A  
Brick 10.70.35.130:/rhs/brick31/test3_31    49152     0          Y       30495
Brick 10.70.35.122:/rhs/brick31/test3_31    49152     0          Y       14828
Self-heal Daemon on localhost               N/A       N/A        Y       27795
Self-heal Daemon on 10.70.35.23             N/A       N/A        Y       26963
Self-heal Daemon on 10.70.35.130            N/A       N/A        Y       807  
Self-heal Daemon on 10.70.35.122            N/A       N/A        Y       17576
 
Task Status of Volume test3_31
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: test3_32
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.45:/rhs/brick32/test3_32     N/A       N/A        N       N/A  
Brick 10.70.35.130:/rhs/brick32/test3_32    49152     0          Y       30495
Brick 10.70.35.122:/rhs/brick32/test3_32    49152     0          Y       14828
Self-heal Daemon on localhost               N/A       N/A        Y       27795
Self-heal Daemon on 10.70.35.23             N/A       N/A        Y       26963
Self-heal Daemon on 10.70.35.130            N/A       N/A        Y       807  
Self-heal Daemon on 10.70.35.122            N/A       N/A        Y       17576
 


hence moving to verified
3.8.4-27 is test version on el7.4 beta

Comment 11 errata-xmlrpc 2017-09-21 04:41:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.