Bug 1663306 - Check GlusterFS for cluster health fails when gluster nodes are SchedulingDisabled
Summary: Check GlusterFS for cluster health fails when gluster nodes are SchedulingDis...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 3.11.z
Assignee: Jose A. Rivera
QA Contact: Ashmitha Ambastha
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-03 18:11 UTC by Andrew Collins
Modified: 2020-06-17 20:21 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-17 20:21:25 UTC
Target Upstream Version:
Embargoed:
ancollin: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2477 0 None None None 2020-06-17 20:21:43 UTC

Description Andrew Collins 2019-01-03 18:11:40 UTC
Description of problem:
When upgrading from OCP 3.9 to 3.10, "Check GlusterFS for Cluster Health" task fails consistently.  The gluster nodes are "Ready,SchedulingDisabled", and gluster volumes are all connected and healed.

Version-Release number of the following components:
rpm -q openshift-ansible
openshift-ansible-3.10.83-1.git.0.12699eb.el7.noarch
rpm -q ansible
ansible-2.4.6.0-1.el7ae.noarch
ansible --version
ansible 2.4.6.0
  python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 (Red Hat 4.8.5-28)]

How reproducible:
100%

Steps to Reproduce:
1. Cordon gluster nodes to give them SchedulingDisabled status (oc adm cordon <gluster nodes>)
2. Attempt to run upgrade_control_plane.yml

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

2019-01-03 11:48:40,311 p=74676 u=root |  TASK [/usr/share/ansible/openshift-ansible/roles/openshift_storage_glusterfs : Check for GlusterFS cluster health] ***********************************
FAILED - RETRYING: Check for GlusterFS cluster health (113 retries left).Result was: 
    "attempts": 8,
    "changed": false,
    "failed": true,
    "invocation": {
        "module_args": {
            "cluster_name": "storage",
            "exclude_node": "locp002a.rnd.pncint.net",
            "oc_bin": "oc",
            "oc_conf": "/etc/origin/master/admin.kubeconfig",
            "oc_namespace": "glusterfs"
        }
    },
    "msg": "Unable to find suitable pod in get pods output: NAME                                          READY     STATUS    RESTARTS   AGE       IP              NODE\nglusterblock-storage-provisioner-dc-1-jljkg   1/1       Running   1          13h      xx.xx.xx.xx     locp005a.rnd.pncint.net\nglusterfs-storage-kbvdj                       1/1       Running   20         23d      xx.xx.xx.xx   locp013a.rnd.pncint.net\nglusterfs-storage-rvpbs                       1/1       Running   0          25m      xx.xx.xx.xx   locp011a.rnd.pncint.net\nglusterfs-storage-tlb8d                       1/1       Running   14         23d      xx.xx.xx.xx   locp012a.rnd.pncint.net\nheketi-storage-1-k24fz                        1/1       Running   1          23d      xx.xx.xx.xx     locp004a.rnd.pncint.net\n",
    "retries": 121,
    "state": "unknown"


Expected results:
Upgrade completes as expected.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Andrew Collins 2019-01-03 18:13:06 UTC
Was able to fix by changing lib_utils/library/glusterfs_check_containerized.py line 83 from:

fields[1] == "Ready"

to:

"Ready" in fields[1]

Comment 2 Scott Dodson 2019-01-03 18:16:32 UTC
Can you open a PR?

Comment 3 Andrew Collins 2019-01-09 00:04:04 UTC
Sure thing! https://github.com/openshift/openshift-ansible/pull/10970

Comment 4 Scott Dodson 2019-01-31 15:58:43 UTC
PR merged, in openshift-ansible-3.10.99-1 and later

Comment 10 errata-xmlrpc 2020-06-17 20:21:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2477


Note You need to log in before you can comment on or make changes to this bug.