Description of problem: when one of the storage node of cluster is down, running gstatus command doesn't name of the node which is down in Status message. Version-Release number of selected component (if applicable): [root@localhost ~]# gstatus --version gstatus 0.64 [root@localhost ~]# rpm -qa | grep glusterfs glusterfs-api-3.7.1-11.el7rhgs.x86_64 glusterfs-cli-3.7.1-11.el7rhgs.x86_64 glusterfs-libs-3.7.1-11.el7rhgs.x86_64 glusterfs-client-xlators-3.7.1-11.el7rhgs.x86_64 glusterfs-server-3.7.1-11.el7rhgs.x86_64 glusterfs-rdma-3.7.1-11.el7rhgs.x86_64 glusterfs-3.7.1-11.el7rhgs.x86_64 glusterfs-fuse-3.7.1-11.el7rhgs.x86_64 glusterfs-geo-replication-3.7.1-11.el7rhgs.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create 6X2 distribute replicate volume 2. Mount volume as FUSE mount on client 3. bring down one of the storage node. check gstatus. e.g gstatus -a Actual results: status message doesn't show the name of the storage node which is down. [root@knightandday ~]# gstatus -a Product: RHGS vserver3.1 Capacity: 119.00 GiB(raw bricks) Status: UNHEALTHY(13) 198.00 MiB(raw used) Glusterfs: 3.7.1 50.00 GiB(usable from volumes) OverCommit: Yes Snapshots: 1 Nodes : 2/ 4 Volumes: 0 Up Self Heal : 2/ 4 0 Up(Degraded) Bricks : 6/ 12 1 Up(Partial) Connections : 0/ 0 0 Down Volume Information testvol UP(PARTIAL) - 6/12 bricks up - Distributed-Replicate Capacity: (0% used) 99.00 MiB/50.00 GiB (used/total) Snapshots: 1 Self Heal: 6/12 Tasks Active: None Protocols: glusterfs:on NFS:on SMB:on Gluster Connectivty: 0 hosts, 0 tcp connections Status Messages - Cluster is UNHEALTHY - Volume 'testvol' is in a PARTIAL state, some data is inaccessible data, due to missing bricks - WARNING -> Write requests may fail against volume 'testvol' - Cluster node '' is down - Self heal daemon is down on - Cluster node '' is down - Self heal daemon is down on - Brick 10.70.47.3:/rhs/brick3/b12 in volume 'testvol' is down/unavailable - Brick 10.70.47.2:/rhs/brick3/b11 in volume 'testvol' is down/unavailable - Brick 10.70.47.3:/rhs/brick2/b8 in volume 'testvol' is down/unavailable - Brick 10.70.47.2:/rhs/brick2/b7 in volume 'testvol' is down/unavailable - Brick 10.70.47.2:/rhs/brick1/b3 in volume 'testvol' is down/unavailable - Brick 10.70.47.3:/rhs/brick1/b4 in volume 'testvol' is down/unavailable - INFO -> Not all bricks are online, so capacity provided is NOT accurate Expected results: Status message should display the name of the storage node which is down Additional info:
After discussions with Anil it was decided that we remove the self heal information, and include the number of nodes that are down/up. Sample output 1: Status Messages - Cluster is UNHEALTHY - One of the nodes in the cluster is down - Brick 10.70.47.129:/gluster/brick1 in volume 'glustervol' is down/unavailable - INFO -> Not all bricks are online, so capacity provided is NOT accurate Sample output 2: Status Messages - Cluster is UNHEALTHY - Volume 'glustervol' is in a PARTIAL state, some data is inaccessible data, due to missing bricks - WARNING -> Write requests may fail against volume 'glustervol' - 2 nodes in the cluster are down - Brick 10.70.46.185:/gluster/brick1 in volume 'glustervol' is down/unavailable - Brick 10.70.47.129:/gluster/brick1 in volume 'glustervol' is down/unavailable - INFO -> Not all bricks are online, so capacity provided is NOT accurate
[root@rhs-client46 yum.repos.d]# gstatus -a Product: RHGS Server v3.1 Capacity: 2.70 TiB(raw bricks) Status: UNHEALTHY(4) 67.00 MiB(raw used) Glusterfs: 3.7.1 2.70 TiB(usable from volumes) OverCommit: No Snapshots: 0 Nodes : 2/ 4 Volumes: 0 Up Self Heal : 2/ 4 1 Up(Degraded) Bricks : 2/ 4 0 Up(Partial) Connections : 4/ 16 0 Down Volume Information vol0 UP(DEGRADED) - 2/4 bricks up - Distributed-Replicate Capacity: (0% used) 67.00 MiB/2.70 TiB (used/total) Snapshots: 0 Self Heal: 2/ 4 Tasks Active: None Protocols: glusterfs:on NFS:on SMB:on Gluster Connectivty: 4 hosts, 16 tcp connections Status Messages - Cluster is UNHEALTHY - 2 nodes in the cluster are down - Brick 10.70.36.71:/rhs/brick1/b02 in volume 'vol0' is down/unavailable - Brick 10.70.36.46:/rhs/brick1/b03 in volume 'vol0' is down/unavailable - INFO -> Not all bricks are online, so capacity provided is NOT accurate Bug verified on build glusterfs-3.7.1-14.el7rhgs.x86_64 [root@rhs-client46 yum.repos.d]# gstatus --version gstatus 0.65
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1845.html