Bug 1636018

Summary: [3.9] Running upgrade_nodes.yml fails with 'first_master_client_binary' is undefined
Product: OpenShift Container Platform Reporter: daniel <dmoessne>
Component: Cluster Version OperatorAssignee: Michael Gugino <mgugino>
Status: CLOSED UPSTREAM QA Contact: krishnaram Karthick <kramdoss>
Severity: high Docs Contact:
Priority: high    
Version: 3.9.0CC: aabhishe, aos-bugs, dyan, jokerman, kramdoss, lstanton, mifiedle, mmccomas, mtaru, wmeng
Target Milestone: ---Flags: kramdoss: needinfo+
Target Release: 3.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-15 14:51:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1625568    
Bug Blocks:    

Description daniel 2018-10-04 09:17:51 UTC
Description of problem:
Failure summary:


  1. Hosts:    storage01.example.com
     Play:     Drain and upgrade nodes
     Task:     Check for cluster health of glusterfs
     Message:  The task includes an option with an undefined variable. The error was: 'first_master_client_binary' is undefined

               The error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_storage_glusterfs/tasks/check_cluster_health.yml': line 4, column 3, but may
               be elsewhere in the file depending on the exact syntax problem.

               The offending line appears to be:

               # lib_utils/library/glusterfs_check_containerized.py
               - name: Check for cluster health of glusterfs
                 ^ here

               exception type: <class 'ansible.errors.AnsibleUndefinedVariable'>
               exception: 'first_master_client_binary' is undefined



Version-Release number of the following components:
openshift-ansible-roles-3.9.43-1.git.0.d0bc600.el7.noarch.rpm

--> /usr/share/ansible/openshift-ansible/roles/openshift_storage_glusterfs/tasks/check_cluster_health.yml


How reproducible:

Steps to Reproduce:
1. Update to latest OCP 3.9 ansible playbooks: 3.9.43-1
2. when upgrading nodes we run into above issue


Actual results:
see above

Expected results: 
upgrade should success w/o error

Additional info:
- looks like https://bugzilla.redhat.com/show_bug.cgi?id=1625568 
  in fact the very same solution helped :
 
~~~
/usr/share/ansible/openshift-ansible/roles/openshift_storage_glusterfs/tasks] grep first_master_client_binary *
check_cluster_health.yml:#    oc_bin: "{{ first_master_client_binary }}"
check_cluster_health.yml:    oc_bin: "{{ hostvars[groups.oo_first_master.0]['first_master_client_binary'] }}"
check_cluster_health.yml:#    oc_bin: "{{ first_master_client_binary }}"
check_cluster_health.yml:    oc_bin: "{{ hostvars[groups.oo_first_master.0]['first_master_client_binary'] }}"
~~~

Comment 1 Scott Dodson 2018-10-04 12:36:48 UTC
Needs https://github.com/openshift/openshift-ansible/pull/9933 backported

Comment 2 Michael Gugino 2018-10-17 15:09:33 UTC
3.9 BP created: https://github.com/openshift/openshift-ansible/pull/10429

Comment 3 Scott Dodson 2018-10-18 14:49:16 UTC
*** Bug 1640369 has been marked as a duplicate of this bug. ***