Description of problem: While upgrading from OpenShift 3.7 to 3.9 (with CNS/OCS), infra node upgrade playbook fails due to glusterfs health check error. Version-Release number of the following components: rpm -q openshift-ansible openshift-ansible-3.9.43-1.git.0.d0bc600.el7.noarch rpm -q ansible ansible-2.4.6.0-1.el7ae.noarch ansible --version ansible 2.4.6.0 config file = /etc/ansible/ansible.cfg configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/sha re/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /bin/ansible python version = 2.7.5 (default, May 3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)] How reproducible: Consistently Steps to Reproduce: Run ansible playbook... /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_9/upgrade_nodes.yml according to documentation at... https://docs.openshift.com/container- platform/3.9/upgrading/automated_upgrades.html#special-considerations-for-glusterfs Actual results: **Edited for readability** ---------------------------------------------------------------------------- TASK [openshift_storage_glusterfs : Check for cluster health of glusterfs] **************************************************************************** FAILED - RETRYING: Check for cluster health of glusterfs (120 retries left). FAILED - RETRYING: Check for cluster health of glusterfs (119 retries left). FAILED - RETRYING: Check for cluster health of glusterfs (118 retries left). FAILED - RETRYING: Check for cluster health of glusterfs (117 retries left). ... FAILED - RETRYING: Check for cluster health of glusterfs (3 retries left). FAILED - RETRYING: Check for cluster health of glusterfs (2 retries left). FAILED - RETRYING: Check for cluster health of glusterfs (1 retries left). fatal: [b*****map01.*****.com -> b*****map01.*****.com]: FAILED! => {"attempts": 120, "changed": false, "failed": true, "msg": "volume heketidbstorage is not ready", "state": "unknown"} to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_9/upgrade_nodes.retry PLAY RECAP *************************************************************************** b*****aap01.*****.com : ok=79 changed=7 unreachable=0 failed=0 b*****aap02.*****.com : ok=79 changed=7 unreachable=0 failed=0 b*****aap03.*****.com : ok=79 changed=7 unreachable=0 failed=0 b*****iap01.*****.com : ok=79 changed=7 unreachable=0 failed=0 b*****iap02.*****.com : ok=79 changed=7 unreachable=0 failed=0 b*****iap03.*****.com : ok=79 changed=7 unreachable=0 failed=0 b*****map01.*****.com : ok=59 changed=2 unreachable=0 failed=1 b*****map02.*****.com : ok=50 changed=2 unreachable=0 failed=0 b*****map03.*****.com : ok=50 changed=2 unreachable=0 failed=0 localhost : ok=12 changed=0 unreachable=0 failed=0 Failure summary: 1. Hosts: b*****map01.*****.com Play: Verify upgrade can proceed on first master Task: Check for cluster health of glusterfs Message: volume heketidbstorage is not ready ---------------------------------------------------------------------------- Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag
TASK [openshift_storage_glusterfs : Check for cluster health of glusterfs] ************************************************************************************************************************** task path: /usr/share/ansible/openshift-ansible/roles/openshift_storage_glusterfs/tasks/check_cluster_health.yml:4 fatal: [b****.*****.com]: FAILED! => { "failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'first_master_client_binary' is undefined\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_storage_glusterfs/tasks/check_cluster_health.yml': line 4, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n# lib_utils/library/glusterfs_check_containerized.py\n- name: Check for cluster health of glusterfs\n ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'first_master_client_binary' is undefined" } This is a dupe *** This bug has been marked as a duplicate of bug 1636018 ***