Bug 1508523

Summary:	cluster status is healthy if one of volumes is down and all nodes are connected
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Martin Kudlej <mkudlej>
Component:	web-admin-tendrl-gluster-integration	Assignee:	Shubhendu Tripathi <shtripat>
Status:	CLOSED ERRATA	QA Contact:	Martin Kudlej <mkudlej>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.3	CC:	nthomas, ppenicka, rcyriac, sankarshan
Target Milestone:	---	Keywords:	ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	tendrl-gluster-integration-1.5.4-2.el7rhgs	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-12-18 04:39:36 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Martin Kudlej 2017-11-01 15:39:50 UTC

Description of problem:
I've looked at code for cluster status in 'tendrl/gluster_integration/sds_sync/cluster_status.py':

def sync_cluster_status(volumes):
    status = 'healthy'

    # Calculate status based on volumes status
    degraded_count = 0
    if len(volumes) > 0:
        volume_states = _derive_volume_states(volumes)
        for vol_id, state in volume_states.iteritems():
            if 'down' in state or 'partial' in state:
                status = 'unhealthy'
            if 'degraded' in state:
                degraded_count += 1

...
    # Change status basd on node status
    cmd = cmd_utils.Command(
        'gluster pool list', True
    )
    out, err, rc = cmd.run()
    peer_count = 0
    if not err:
        out_lines = out.split('\n')
        connected = True
...
        if connected:
            status = 'healthy'
        else:
            status = 'unhealthy'

As you can see cluster status can be 'healthy' if all nodes are
connected and some volumes are down. I think that if any of volumes
are down then checking of nodes can be avoided.


Version-Release number of selected component (if applicable):
etcd-3.2.5-1.el7.x86_64
glusterfs-4.0dev-0.213.git09f6ae2.el7.centos.x86_64
glusterfs-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-api-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-cli-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-client-xlators-4.0dev-0.213.git09f6ae2.el7.centos.x86_64
glusterfs-client-xlators-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-events-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-fuse-4.0dev-0.213.git09f6ae2.el7.centos.x86_64
glusterfs-fuse-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-geo-replication-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-libs-4.0dev-0.213.git09f6ae2.el7.centos.x86_64
glusterfs-libs-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-server-4.0dev-0.218.git614904f.el7.centos.x86_64
python2-gluster-4.0dev-0.218.git614904f.el7.centos.x86_64
python-etcd-0.4.5-1.noarch
rubygem-etcd-0.3.0-1.el7.centos.noarch
tendrl-ansible-1.5.3-20171016T154931.c64462a.noarch
tendrl-api-1.5.3-20171013T082716.a2f3b3f.noarch
tendrl-api-httpd-1.5.3-20171013T082716.a2f3b3f.noarch
tendrl-commons-1.5.3-20171017T094335.9050aa7.noarch
tendrl-gluster-integration-1.5.3-20171013T082052.b8ddae5.noarch
tendrl-grafana-plugins-1.5.3-20171016T100950.e8eb6c8.noarch
tendrl-grafana-selinux-1.5.3-20171013T090621.ffb1b7f.noarch
tendrl-monitoring-integration-1.5.3-20171016T100950.e8eb6c8.noarch
tendrl-node-agent-1.5.3-20171016T094453.4aa81f7.noarch
tendrl-notifier-1.5.3-20171011T200310.3c01717.noarch
tendrl-selinux-1.5.3-20171013T090621.ffb1b7f.noarch
tendrl-ui-1.5.3-20171016T141509.544015a.noarch

Comment 1 Petr Penicka 2017-11-08 14:09:03 UTC

Triage Nov 8: Dev and QE agree to have in 3.3.1 release.

Comment 3 Martin Kudlej 2017-11-16 13:24:32 UTC

I've done code inspection and it seems to me OK. 

tendrl-gluster-integration-1.5.4-2.el7rhgs.noarch

--> VERIFIED

Comment 5 errata-xmlrpc 2017-12-18 04:39:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3478