Bug 1640369 - Upgrade from 3.7 to 3.9 fails due to glusterfs health check error
Summary: Upgrade from 3.7 to 3.9 fails due to glusterfs health check error
Keywords:
Status: CLOSED DUPLICATE of bug 1636018
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.9.z
Assignee: Jose A. Rivera
QA Contact: Qin Ping
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-17 22:15 UTC by Luke Stanton
Modified: 2021-12-10 17:57 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-18 14:49:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Luke Stanton 2018-10-17 22:15:16 UTC
Description of problem:

While upgrading from OpenShift 3.7 to 3.9 (with CNS/OCS), infra node upgrade playbook fails due to glusterfs health check error.

Version-Release number of the following components:

rpm -q openshift-ansible
openshift-ansible-3.9.43-1.git.0.d0bc600.el7.noarch

rpm -q ansible
ansible-2.4.6.0-1.el7ae.noarch

ansible --version
ansible 2.4.6.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/sha                                                                                                                     re/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /bin/ansible
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]

How reproducible:
Consistently

Steps to Reproduce:

Run ansible playbook...

/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_9/upgrade_nodes.yml

according to documentation at...

https://docs.openshift.com/container-
platform/3.9/upgrading/automated_upgrades.html#special-considerations-for-glusterfs

Actual results:

**Edited for readability**

----------------------------------------------------------------------------
TASK [openshift_storage_glusterfs : Check for cluster health of glusterfs] ****************************************************************************
FAILED - RETRYING: Check for cluster health of glusterfs (120 retries left).
FAILED - RETRYING: Check for cluster health of glusterfs (119 retries left).
FAILED - RETRYING: Check for cluster health of glusterfs (118 retries left).
FAILED - RETRYING: Check for cluster health of glusterfs (117 retries left).
...
FAILED - RETRYING: Check for cluster health of glusterfs (3 retries left).
FAILED - RETRYING: Check for cluster health of glusterfs (2 retries left).
FAILED - RETRYING: Check for cluster health of glusterfs (1 retries left).
fatal: [b*****map01.*****.com -> b*****map01.*****.com]: FAILED! => {"attempts": 120, "changed": false, "failed": true, "msg": "volume heketidbstorage is not ready", "state": "unknown"}
        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_9/upgrade_nodes.retry

PLAY RECAP ***************************************************************************
b*****aap01.*****.com  : ok=79   changed=7    unreachable=0    failed=0
b*****aap02.*****.com  : ok=79   changed=7    unreachable=0    failed=0
b*****aap03.*****.com  : ok=79   changed=7    unreachable=0    failed=0
b*****iap01.*****.com  : ok=79   changed=7    unreachable=0    failed=0
b*****iap02.*****.com  : ok=79   changed=7    unreachable=0    failed=0
b*****iap03.*****.com  : ok=79   changed=7    unreachable=0    failed=0
b*****map01.*****.com  : ok=59   changed=2    unreachable=0    failed=1
b*****map02.*****.com  : ok=50   changed=2    unreachable=0    failed=0
b*****map03.*****.com  : ok=50   changed=2    unreachable=0    failed=0
localhost              : ok=12   changed=0    unreachable=0    failed=0


Failure summary:


  1. Hosts:    b*****map01.*****.com
     Play:     Verify upgrade can proceed on first master
     Task:     Check for cluster health of glusterfs
     Message:  volume heketidbstorage is not ready
----------------------------------------------------------------------------

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 2 Scott Dodson 2018-10-18 14:49:16 UTC
TASK [openshift_storage_glusterfs : Check for cluster health of glusterfs] **************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_storage_glusterfs/tasks/check_cluster_health.yml:4
fatal: [b****.*****.com]: FAILED! => {
    "failed": true, 
    "msg": "The task includes an option with an undefined variable. The error was: 'first_master_client_binary' is undefined\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_storage_glusterfs/tasks/check_cluster_health.yml': line 4, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n# lib_utils/library/glusterfs_check_containerized.py\n- name: Check for cluster health of glusterfs\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'first_master_client_binary' is undefined"
}

This is a dupe

*** This bug has been marked as a duplicate of bug 1636018 ***


Note You need to log in before you can comment on or make changes to this bug.