1663306 – Check GlusterFS for cluster health fails when gluster nodes are SchedulingDisabled

Bug 1663306 - Check GlusterFS for cluster health fails when gluster nodes are SchedulingDisabled

Summary: Check GlusterFS for cluster health fails when gluster nodes are SchedulingDis...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.10.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Jose A. Rivera
QA Contact:	Ashmitha Ambastha
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-01-03 18:11 UTC by Andrew Collins
Modified:	2020-06-17 20:21 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-06-17 20:21:25 UTC
Target Upstream Version:
Embargoed:
Flags:	ancollin: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:2477	0	None	None	None	2020-06-17 20:21:43 UTC

Description Andrew Collins 2019-01-03 18:11:40 UTC

Description of problem:
When upgrading from OCP 3.9 to 3.10, "Check GlusterFS for Cluster Health" task fails consistently.  The gluster nodes are "Ready,SchedulingDisabled", and gluster volumes are all connected and healed.

Version-Release number of the following components:
rpm -q openshift-ansible
openshift-ansible-3.10.83-1.git.0.12699eb.el7.noarch
rpm -q ansible
ansible-2.4.6.0-1.el7ae.noarch
ansible --version
ansible 2.4.6.0
  python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 (Red Hat 4.8.5-28)]

How reproducible:
100%

Steps to Reproduce:
1. Cordon gluster nodes to give them SchedulingDisabled status (oc adm cordon <gluster nodes>)
2. Attempt to run upgrade_control_plane.yml

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

2019-01-03 11:48:40,311 p=74676 u=root |  TASK [/usr/share/ansible/openshift-ansible/roles/openshift_storage_glusterfs : Check for GlusterFS cluster health] ***********************************
FAILED - RETRYING: Check for GlusterFS cluster health (113 retries left).Result was: 
    "attempts": 8,
    "changed": false,
    "failed": true,
    "invocation": {
        "module_args": {
            "cluster_name": "storage",
            "exclude_node": "locp002a.rnd.pncint.net",
            "oc_bin": "oc",
            "oc_conf": "/etc/origin/master/admin.kubeconfig",
            "oc_namespace": "glusterfs"
        }
    },
    "msg": "Unable to find suitable pod in get pods output: NAME                                          READY     STATUS    RESTARTS   AGE       IP              NODE\nglusterblock-storage-provisioner-dc-1-jljkg   1/1       Running   1          13h      xx.xx.xx.xx     locp005a.rnd.pncint.net\nglusterfs-storage-kbvdj                       1/1       Running   20         23d      xx.xx.xx.xx   locp013a.rnd.pncint.net\nglusterfs-storage-rvpbs                       1/1       Running   0          25m      xx.xx.xx.xx   locp011a.rnd.pncint.net\nglusterfs-storage-tlb8d                       1/1       Running   14         23d      xx.xx.xx.xx   locp012a.rnd.pncint.net\nheketi-storage-1-k24fz                        1/1       Running   1          23d      xx.xx.xx.xx     locp004a.rnd.pncint.net\n",
    "retries": 121,
    "state": "unknown"


Expected results:
Upgrade completes as expected.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Andrew Collins 2019-01-03 18:13:06 UTC

Was able to fix by changing lib_utils/library/glusterfs_check_containerized.py line 83 from:

fields[1] == "Ready"

to:

"Ready" in fields[1]

Comment 2 Scott Dodson 2019-01-03 18:16:32 UTC

Can you open a PR?

Comment 3 Andrew Collins 2019-01-09 00:04:04 UTC

Sure thing! https://github.com/openshift/openshift-ansible/pull/10970

Comment 4 Scott Dodson 2019-01-31 15:58:43 UTC

PR merged, in openshift-ansible-3.10.99-1 and later

Comment 10 errata-xmlrpc 2020-06-17 20:21:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2477

Note You need to log in before you can comment on or make changes to this bug.