Bug 1663306
| Summary: | Check GlusterFS for cluster health fails when gluster nodes are SchedulingDisabled | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Andrew Collins <ancollin> |
| Component: | Installer | Assignee: | Jose A. Rivera <jarrpa> |
| Installer sub component: | openshift-ansible | QA Contact: | Ashmitha Ambastha <asambast> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | ancollin, aos-bugs, asambast, jokerman, kramdoss, mmccomas, pprakash |
| Version: | 3.10.0 | Flags: | ancollin:
needinfo-
|
| Target Milestone: | --- | ||
| Target Release: | 3.11.z | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-06-17 20:21:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Was able to fix by changing lib_utils/library/glusterfs_check_containerized.py line 83 from: fields[1] == "Ready" to: "Ready" in fields[1] Can you open a PR? PR merged, in openshift-ansible-3.10.99-1 and later Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2477 |
Description of problem: When upgrading from OCP 3.9 to 3.10, "Check GlusterFS for Cluster Health" task fails consistently. The gluster nodes are "Ready,SchedulingDisabled", and gluster volumes are all connected and healed. Version-Release number of the following components: rpm -q openshift-ansible openshift-ansible-3.10.83-1.git.0.12699eb.el7.noarch rpm -q ansible ansible-2.4.6.0-1.el7ae.noarch ansible --version ansible 2.4.6.0 python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 (Red Hat 4.8.5-28)] How reproducible: 100% Steps to Reproduce: 1. Cordon gluster nodes to give them SchedulingDisabled status (oc adm cordon <gluster nodes>) 2. Attempt to run upgrade_control_plane.yml Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated 2019-01-03 11:48:40,311 p=74676 u=root | TASK [/usr/share/ansible/openshift-ansible/roles/openshift_storage_glusterfs : Check for GlusterFS cluster health] *********************************** FAILED - RETRYING: Check for GlusterFS cluster health (113 retries left).Result was: "attempts": 8, "changed": false, "failed": true, "invocation": { "module_args": { "cluster_name": "storage", "exclude_node": "locp002a.rnd.pncint.net", "oc_bin": "oc", "oc_conf": "/etc/origin/master/admin.kubeconfig", "oc_namespace": "glusterfs" } }, "msg": "Unable to find suitable pod in get pods output: NAME READY STATUS RESTARTS AGE IP NODE\nglusterblock-storage-provisioner-dc-1-jljkg 1/1 Running 1 13h xx.xx.xx.xx locp005a.rnd.pncint.net\nglusterfs-storage-kbvdj 1/1 Running 20 23d xx.xx.xx.xx locp013a.rnd.pncint.net\nglusterfs-storage-rvpbs 1/1 Running 0 25m xx.xx.xx.xx locp011a.rnd.pncint.net\nglusterfs-storage-tlb8d 1/1 Running 14 23d xx.xx.xx.xx locp012a.rnd.pncint.net\nheketi-storage-1-k24fz 1/1 Running 1 23d xx.xx.xx.xx locp004a.rnd.pncint.net\n", "retries": 121, "state": "unknown" Expected results: Upgrade completes as expected. Additional info: Please attach logs from ansible-playbook with the -vvv flag