Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1640369

Summary: Upgrade from 3.7 to 3.9 fails due to glusterfs health check error
Product: OpenShift Container Platform Reporter: Luke Stanton <lstanton>
Component: Cluster Version OperatorAssignee: Jose A. Rivera <jarrpa>
Status: CLOSED DUPLICATE QA Contact: Qin Ping <piqin>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.9.0CC: aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: 3.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-18 14:49:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Luke Stanton 2018-10-17 22:15:16 UTC
Description of problem:

While upgrading from OpenShift 3.7 to 3.9 (with CNS/OCS), infra node upgrade playbook fails due to glusterfs health check error.

Version-Release number of the following components:

rpm -q openshift-ansible
openshift-ansible-3.9.43-1.git.0.d0bc600.el7.noarch

rpm -q ansible
ansible-2.4.6.0-1.el7ae.noarch

ansible --version
ansible 2.4.6.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/sha                                                                                                                     re/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /bin/ansible
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]

How reproducible:
Consistently

Steps to Reproduce:

Run ansible playbook...

/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_9/upgrade_nodes.yml

according to documentation at...

https://docs.openshift.com/container-
platform/3.9/upgrading/automated_upgrades.html#special-considerations-for-glusterfs

Actual results:

**Edited for readability**

----------------------------------------------------------------------------
TASK [openshift_storage_glusterfs : Check for cluster health of glusterfs] ****************************************************************************
FAILED - RETRYING: Check for cluster health of glusterfs (120 retries left).
FAILED - RETRYING: Check for cluster health of glusterfs (119 retries left).
FAILED - RETRYING: Check for cluster health of glusterfs (118 retries left).
FAILED - RETRYING: Check for cluster health of glusterfs (117 retries left).
...
FAILED - RETRYING: Check for cluster health of glusterfs (3 retries left).
FAILED - RETRYING: Check for cluster health of glusterfs (2 retries left).
FAILED - RETRYING: Check for cluster health of glusterfs (1 retries left).
fatal: [b*****map01.*****.com -> b*****map01.*****.com]: FAILED! => {"attempts": 120, "changed": false, "failed": true, "msg": "volume heketidbstorage is not ready", "state": "unknown"}
        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_9/upgrade_nodes.retry

PLAY RECAP ***************************************************************************
b*****aap01.*****.com  : ok=79   changed=7    unreachable=0    failed=0
b*****aap02.*****.com  : ok=79   changed=7    unreachable=0    failed=0
b*****aap03.*****.com  : ok=79   changed=7    unreachable=0    failed=0
b*****iap01.*****.com  : ok=79   changed=7    unreachable=0    failed=0
b*****iap02.*****.com  : ok=79   changed=7    unreachable=0    failed=0
b*****iap03.*****.com  : ok=79   changed=7    unreachable=0    failed=0
b*****map01.*****.com  : ok=59   changed=2    unreachable=0    failed=1
b*****map02.*****.com  : ok=50   changed=2    unreachable=0    failed=0
b*****map03.*****.com  : ok=50   changed=2    unreachable=0    failed=0
localhost              : ok=12   changed=0    unreachable=0    failed=0


Failure summary:


  1. Hosts:    b*****map01.*****.com
     Play:     Verify upgrade can proceed on first master
     Task:     Check for cluster health of glusterfs
     Message:  volume heketidbstorage is not ready
----------------------------------------------------------------------------

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 2 Scott Dodson 2018-10-18 14:49:16 UTC
TASK [openshift_storage_glusterfs : Check for cluster health of glusterfs] **************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_storage_glusterfs/tasks/check_cluster_health.yml:4
fatal: [b****.*****.com]: FAILED! => {
    "failed": true, 
    "msg": "The task includes an option with an undefined variable. The error was: 'first_master_client_binary' is undefined\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_storage_glusterfs/tasks/check_cluster_health.yml': line 4, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n# lib_utils/library/glusterfs_check_containerized.py\n- name: Check for cluster health of glusterfs\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'first_master_client_binary' is undefined"
}

This is a dupe

*** This bug has been marked as a duplicate of bug 1636018 ***