Bug 1395539

Summary: ganesha-ha.conf --status should validate if the VIPs are assigned to right nodes
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Soumya Koduri <skoduri>
Component: common-haAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED ERRATA QA Contact: Arthy Loganathan <aloganat>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: aloganat, amukherj, asrivast, jthottan, kkeithle, rhinduja, rhs-bugs, skoduri, storage-qa-internal, surs
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1394815
: 1395648 (view as bug list) Environment:
Last Closed: 2017-03-23 06:18:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1351528, 1364410, 1393966, 1394815, 1395648, 1395649, 1395652    

Comment 2 Soumya Koduri 2016-11-16 06:29:54 UTC
This bug is to address subset of what has been requested above.

Verifying if the VIPs are assigned to their respective nodes and in STARTED state shall confirm if the services nfs-ganesha, pacemaker/corosync etc are started and the node is healthy. As part of this BZ, this validation shall be added to '--status' option so that gdeploy can use it for cluster health check.

Comment 6 Atin Mukherjee 2016-11-21 06:44:11 UTC
upstream mainline patch http://review.gluster.org/15882 posted for review.

Comment 8 Arthy Loganathan 2016-11-25 07:49:14 UTC
Even if are the nodes are healthy, ganesha-ha.sh --status shows Cluster HA Status as Bad.

http://pastebin.test.redhat.com/433564

Also, are we checking the pcs status of all the three processes(nfs-block, cluster-ip, nfs-unblock) of each node to be in Started state?

Comment 10 Arthy Loganathan 2016-11-30 15:03:55 UTC
If nodes are in failover state, the status output shows HA status as BAD instead of FAILOVER and the failover node and VIP is not printed in the output.

[root@dhcp46-42 ~]# /usr/libexec/ganesha/ganesha-ha.sh --status
Online: [ dhcp46-101.lab.eng.blr.redhat.com dhcp46-42.lab.eng.blr.redhat.com dhcp47-155.lab.eng.blr.redhat.com ]

dhcp46-42.lab.eng.blr.redhat.com-cluster_ip-1 dhcp46-42.lab.eng.blr.redhat.com
dhcp46-101.lab.eng.blr.redhat.com-cluster_ip-1 dhcp46-101.lab.eng.blr.redhat.com
dhcp47-155.lab.eng.blr.redhat.com-cluster_ip-1 dhcp47-155.lab.eng.blr.redhat.com

Cluster HA Status: BAD

Updated the review with same comments.

Comment 11 Atin Mukherjee 2016-12-04 05:23:39 UTC
upstream mainline : http://review.gluster.org/15882
release-3.9 : http://review.gluster.org/15991
release-3.8 : http://review.gluster.org/15992

downstream : https://code.engineering.redhat.com/gerrit/#/c/91878/

Comment 13 Arthy Loganathan 2016-12-12 11:34:29 UTC
Verified the fix in build,
glusterfs-ganesha-3.8.4-7.el7rhgs.x86_64
nfs-ganesha-2.4.1-2.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.1-2.el7rhgs.x86_64

ganesha-ha.sh --status output:
-------------------------------

[root@dhcp46-111 ~]# /usr/libexec/ganesha/ganesha-ha.sh --status /run/gluster/shared_storage/nfs-ganesha/
Online: [ dhcp46-111.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com ]

dhcp46-111.lab.eng.blr.redhat.com-cluster_ip-1 dhcp46-111.lab.eng.blr.redhat.com
dhcp46-115.lab.eng.blr.redhat.com-cluster_ip-1 dhcp46-115.lab.eng.blr.redhat.com
dhcp46-139.lab.eng.blr.redhat.com-cluster_ip-1 dhcp46-115.lab.eng.blr.redhat.com
dhcp46-124.lab.eng.blr.redhat.com-cluster_ip-1 dhcp46-124.lab.eng.blr.redhat.com

Cluster HA Status: FAILOVER


[root@dhcp46-111 ~]# /usr/libexec/ganesha/ganesha-ha.sh --status /run/gluster/shared_storage/nfs-ganesha/
Online: [ dhcp46-111.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com ]

dhcp46-111.lab.eng.blr.redhat.com-cluster_ip-1
dhcp46-115.lab.eng.blr.redhat.com-cluster_ip-1
dhcp46-139.lab.eng.blr.redhat.com-cluster_ip-1
dhcp46-124.lab.eng.blr.redhat.com-cluster_ip-1

Cluster HA Status: BAD

[root@dhcp46-115 ~]# /usr/libexec/ganesha/ganesha-ha.sh --status /run/gluster/shared_storage/nfs-ganesha/
Online: [ dhcp46-111.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]

dhcp46-111.lab.eng.blr.redhat.com-cluster_ip-1 dhcp46-111.lab.eng.blr.redhat.com
dhcp46-115.lab.eng.blr.redhat.com-cluster_ip-1 dhcp46-115.lab.eng.blr.redhat.com
dhcp46-139.lab.eng.blr.redhat.com-cluster_ip-1 dhcp46-139.lab.eng.blr.redhat.com
dhcp46-124.lab.eng.blr.redhat.com-cluster_ip-1 dhcp46-124.lab.eng.blr.redhat.com

Cluster HA Status: HEALTHY

Comment 15 errata-xmlrpc 2017-03-23 06:18:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html