Description of problem: When upgrading from OCP 3.9 to 3.10, "Check GlusterFS for Cluster Health" task fails consistently. The gluster nodes are "Ready,SchedulingDisabled", and gluster volumes are all connected and healed. Version-Release number of the following components: rpm -q openshift-ansible openshift-ansible-3.10.83-1.git.0.12699eb.el7.noarch rpm -q ansible ansible-2.4.6.0-1.el7ae.noarch ansible --version ansible 2.4.6.0 python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 (Red Hat 4.8.5-28)] How reproducible: 100% Steps to Reproduce: 1. Cordon gluster nodes to give them SchedulingDisabled status (oc adm cordon <gluster nodes>) 2. Attempt to run upgrade_control_plane.yml Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated 2019-01-03 11:48:40,311 p=74676 u=root | TASK [/usr/share/ansible/openshift-ansible/roles/openshift_storage_glusterfs : Check for GlusterFS cluster health] *********************************** FAILED - RETRYING: Check for GlusterFS cluster health (113 retries left).Result was: "attempts": 8, "changed": false, "failed": true, "invocation": { "module_args": { "cluster_name": "storage", "exclude_node": "locp002a.rnd.pncint.net", "oc_bin": "oc", "oc_conf": "/etc/origin/master/admin.kubeconfig", "oc_namespace": "glusterfs" } }, "msg": "Unable to find suitable pod in get pods output: NAME READY STATUS RESTARTS AGE IP NODE\nglusterblock-storage-provisioner-dc-1-jljkg 1/1 Running 1 13h xx.xx.xx.xx locp005a.rnd.pncint.net\nglusterfs-storage-kbvdj 1/1 Running 20 23d xx.xx.xx.xx locp013a.rnd.pncint.net\nglusterfs-storage-rvpbs 1/1 Running 0 25m xx.xx.xx.xx locp011a.rnd.pncint.net\nglusterfs-storage-tlb8d 1/1 Running 14 23d xx.xx.xx.xx locp012a.rnd.pncint.net\nheketi-storage-1-k24fz 1/1 Running 1 23d xx.xx.xx.xx locp004a.rnd.pncint.net\n", "retries": 121, "state": "unknown" Expected results: Upgrade completes as expected. Additional info: Please attach logs from ansible-playbook with the -vvv flag
Was able to fix by changing lib_utils/library/glusterfs_check_containerized.py line 83 from: fields[1] == "Ready" to: "Ready" in fields[1]
Can you open a PR?
Sure thing! https://github.com/openshift/openshift-ansible/pull/10970
PR merged, in openshift-ansible-3.10.99-1 and later
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2477