Bug 1508523

Summary: cluster status is healthy if one of volumes is down and all nodes are connected
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Martin Kudlej <mkudlej>
Component: web-admin-tendrl-gluster-integrationAssignee: Shubhendu Tripathi <shtripat>
Status: CLOSED ERRATA QA Contact: Martin Kudlej <mkudlej>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: nthomas, ppenicka, rcyriac, sankarshan
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tendrl-gluster-integration-1.5.4-2.el7rhgs Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-18 04:39:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Kudlej 2017-11-01 15:39:50 UTC
Description of problem:
I've looked at code for cluster status in 'tendrl/gluster_integration/sds_sync/cluster_status.py':

def sync_cluster_status(volumes):
    status = 'healthy'

    # Calculate status based on volumes status
    degraded_count = 0
    if len(volumes) > 0:
        volume_states = _derive_volume_states(volumes)
        for vol_id, state in volume_states.iteritems():
            if 'down' in state or 'partial' in state:
                status = 'unhealthy'
            if 'degraded' in state:
                degraded_count += 1

...
    # Change status basd on node status
    cmd = cmd_utils.Command(
        'gluster pool list', True
    )
    out, err, rc = cmd.run()
    peer_count = 0
    if not err:
        out_lines = out.split('\n')
        connected = True
...
        if connected:
            status = 'healthy'
        else:
            status = 'unhealthy'

As you can see cluster status can be 'healthy' if all nodes are
connected and some volumes are down. I think that if any of volumes
are down then checking of nodes can be avoided.


Version-Release number of selected component (if applicable):
etcd-3.2.5-1.el7.x86_64
glusterfs-4.0dev-0.213.git09f6ae2.el7.centos.x86_64
glusterfs-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-api-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-cli-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-client-xlators-4.0dev-0.213.git09f6ae2.el7.centos.x86_64
glusterfs-client-xlators-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-events-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-fuse-4.0dev-0.213.git09f6ae2.el7.centos.x86_64
glusterfs-fuse-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-geo-replication-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-libs-4.0dev-0.213.git09f6ae2.el7.centos.x86_64
glusterfs-libs-4.0dev-0.218.git614904f.el7.centos.x86_64
glusterfs-server-4.0dev-0.218.git614904f.el7.centos.x86_64
python2-gluster-4.0dev-0.218.git614904f.el7.centos.x86_64
python-etcd-0.4.5-1.noarch
rubygem-etcd-0.3.0-1.el7.centos.noarch
tendrl-ansible-1.5.3-20171016T154931.c64462a.noarch
tendrl-api-1.5.3-20171013T082716.a2f3b3f.noarch
tendrl-api-httpd-1.5.3-20171013T082716.a2f3b3f.noarch
tendrl-commons-1.5.3-20171017T094335.9050aa7.noarch
tendrl-gluster-integration-1.5.3-20171013T082052.b8ddae5.noarch
tendrl-grafana-plugins-1.5.3-20171016T100950.e8eb6c8.noarch
tendrl-grafana-selinux-1.5.3-20171013T090621.ffb1b7f.noarch
tendrl-monitoring-integration-1.5.3-20171016T100950.e8eb6c8.noarch
tendrl-node-agent-1.5.3-20171016T094453.4aa81f7.noarch
tendrl-notifier-1.5.3-20171011T200310.3c01717.noarch
tendrl-selinux-1.5.3-20171013T090621.ffb1b7f.noarch
tendrl-ui-1.5.3-20171016T141509.544015a.noarch

Comment 1 Petr Penicka 2017-11-08 14:09:03 UTC
Triage Nov 8: Dev and QE agree to have in 3.3.1 release.

Comment 3 Martin Kudlej 2017-11-16 13:24:32 UTC
I've done code inspection and it seems to me OK. 

tendrl-gluster-integration-1.5.4-2.el7rhgs.noarch

--> VERIFIED

Comment 5 errata-xmlrpc 2017-12-18 04:39:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3478