Bug 1921353 - OCP 3.11.374 Upgrade fails with Either OpenShift needs to be installed or openshift_release needs to be specified
Summary: OCP 3.11.374 Upgrade fails with Either OpenShift needs to be installed or ope...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
: 3.11.z
Assignee: Russell Teague
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-27 22:31 UTC by Matthew Robson
Modified: 2021-03-03 12:29 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
cluster_facts.yml requires openshift_release to be set. If openshift_release is not set in the inventory, cluster_facts.yml needs to be run after openshift_release has been defined by the upgrade playbook and the openshift_version role has been run.
Clone Of:
Environment:
Last Closed: 2021-03-03 12:27:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 12307 0 None closed Bug 1921353: [release-3.11] Gather cluster_facts after version.yml during upgrade 2021-02-15 18:42:04 UTC
Red Hat Product Errata RHSA-2021:0637 0 None None None 2021-03-03 12:29:08 UTC

Description Matthew Robson 2021-01-27 22:31:26 UTC
Version:

openshift ansible 3.11.374

Platform:

Bare Metal 3.11 Cluster

What happened?

Control Plane upgrade fine:
ansible-playbook -vvv -i inventory/cluster.inv /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_control_plane.yml

Upgrading infra and app nodes fails 100% of the time:

ansible-playbook -vvv -i inventory/cluster.inv /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_nodes.yml -e openshift_upgrade_nodes_label="region=infra"


Failure summary:
  1. Hosts:    node-301.dmz, node-302.dmz, node-303.dmz
     Play:     Initialize cluster facts
     Task:     Set Default scheduler predicates and priorities
     Message:  An unhandled exception occurred while running the lookup plugin 'openshift_master_facts_default_predicates'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Either OpenShift needs to be installed or openshift_release needs to be specified

  2. Hosts:    node-311.dmz, node-312.dmz, node-313.dmz, node-321.dmz, node-322.dmz, node-331.dmz, node-332.dmz, node-333.dmz, node-334.dmz
     Play:     Set openshift_version for etcd, node, and master hosts
     Task:     set_fact
     Message:  The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'openshift_version'
               
               The error appears to be in '/usr/share/ansible/openshift-ansible/playbooks/init/version.yml': line 20, column 5, but may
               be elsewhere in the file depending on the exact syntax problem.
               
               The offending line appears to be:
               
                 tasks:
                 - set_fact:
                   ^ here

It looks like https://github.com/openshift/openshift-ansible/commit/a098a9d37471608cadd6a0ad3e6a45ba25c89e91 or https://github.com/openshift/openshift-ansible/commit/95c65b7c677e88a2c98ccf3dea4e379de6a6755f may be the cause.

Reverting to 3.11.317 everything works fine upgrading the infra / app nodes.

How to reproduce it (as minimally and precisely as possible)?

Happens running normal upgrade. Will attach logs and hosts file.

Anything else we need to know?

#Enter text here.

Comment 3 Russell Teague 2021-02-04 18:23:09 UTC
Investigating potential fixes for this issue.

Comment 4 trumbaut 2021-02-10 12:14:10 UTC
We're hitting the same issue. I tried to get a workaround by manually applying the changes of https://github.com/openshift/openshift-ansible/pull/12307/commits/8694821bccc1fb58f82b154ba0a35ccda8ec22e1 but this works neither.

~~~~
[...]
PLAY [Filter list of nodes to be upgraded if necessary] ***********************************************************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************************************************************************************
ok: [master1.example.com]

TASK [Retrieve list of openshift nodes matching upgrade label] ****************************************************************************************************************************************************
[WARNING]: Module invocation had junk after the JSON data: Error in atexit._run_exitfuncs: Traceback (most recent call last):   File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs     func(*targs,
**kargs)   File "/tmp/ansible_oc_obj_payload_GJT3UR/ansible_oc_obj_payload.zip/ansible/modules/oc_obj.py", line 1260, in cleanup AttributeError: 'NoneType' object has no attribute 'path' Error in sys.exitfunc:
Traceback (most recent call last):   File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs     func(*targs, **kargs)   File
"/tmp/ansible_oc_obj_payload_GJT3UR/ansible_oc_obj_payload.zip/ansible/modules/oc_obj.py", line 1260, in cleanup AttributeError: 'NoneType' object has no attribute 'path'
ok: [master1.example.com]

TASK [Fail if no nodes match openshift_upgrade_nodes_label] *******************************************************************************************************************************************************
skipping: [master1.example.com]

TASK [Map labelled nodes to inventory hosts] **********************************************************************************************************************************************************************
fatal: [master1.example.com]: FAILED! => {"msg": "The conditional check 'hostvars[item].l_kubelet_node_name | lower in nodes_to_upgrade.module_results.results[0]['items'] | map(attribute='metadata.name') | list' failed. The error was: error while evaluating conditional (hostvars[item].l_kubelet_node_name | lower in nodes_to_upgrade.module_results.results[0]['items'] | map(attribute='metadata.name') | list): 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'l_kubelet_node_name'\n\nThe error appears to be in '/usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/initialize_nodes_to_upgrade.yml': line 25, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n    # using their openshift.common.hostname fact.\n    - name: Map labelled nodes to inventory hosts\n      ^ here\n"}

NO MORE HOSTS LEFT ************************************************************************************************************************************************************************************************

PLAY RECAP ********************************************************************************************************************************************************************************************************
localhost                  : ok=11   changed=0    unreachable=0    failed=0    skipped=5    rescued=0    ignored=0
infra1.example.com : ok=20   changed=2    unreachable=0    failed=0    skipped=21   rescued=0    ignored=0
infra2.example.com : ok=20   changed=2    unreachable=0    failed=0    skipped=21   rescued=0    ignored=0
infra3.example.com : ok=20   changed=2    unreachable=0    failed=0    skipped=21   rescued=0    ignored=0
master1.example.com : ok=33   changed=2    unreachable=0    failed=1    skipped=25   rescued=0    ignored=0
master2.example.com : ok=28   changed=2    unreachable=0    failed=0    skipped=22   rescued=0    ignored=0
master3.example.com : ok=28   changed=2    unreachable=0    failed=0    skipped=22   rescued=0    ignored=0
worker1.example.com : ok=20   changed=2    unreachable=0    failed=0    skipped=21   rescued=0    ignored=0
worker2.example.com : ok=20   changed=2    unreachable=0    failed=0    skipped=21   rescued=0    ignored=0
worker3.example.com : ok=20   changed=2    unreachable=0    failed=0    skipped=21   rescued=0    ignored=0
worker4.example.com : ok=20   changed=2    unreachable=0    failed=0    skipped=21   rescued=0    ignored=0
~~~~

Does there is another workaround?

Comment 5 Russell Teague 2021-02-10 14:54:18 UTC
@thomas.rumbaut,
The error reported in comment #4 appears to be unrelated to the fix for this bug.  Please open a new bug and attach the Ansible inventory and complete Ansible logs with '-vvv' verbosity to allow us to troubleshoot further.  If it turns out this is related to this fix we can close the new bug as a duplicate of this bug with any further fixes required here.

Comment 6 trumbaut 2021-02-10 15:06:31 UTC
Hi @Russell Teague,

We were hitting the same issue as described in comment #1. We upgraded from v3.11.286 towards v3.11.380. Upgrading Control Plane went fine, but upgrading the Infra Nodes afterwards failed:

~~~~
[...]
PLAY [Determine openshift_version to configure on first master] ***************************************************************************************************************************************************

PLAY [Set openshift_version for etcd, node, and master hosts] *****************************************************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************************************************************************************
ok: [worker2.example.com]
ok: [infra2.example.com]
ok: [infra3.example.com]
ok: [infra1.example.com]
ok: [worker1.example.com]
ok: [worker4.example.com]
ok: [worker3.example.com]

TASK [set_fact] ***************************************************************************************************************************************************************************************************
fatal: [infra1.example.com]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: {{ hostvars[groups.oo_first_master.0].openshift_version }}: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'openshift_version'\n\nThe error appears to be in '/usr/share/ansible/openshift-ansible/playbooks/init/version.yml': line 20, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n  tasks:\n  - set_fact:\n    ^ here\n"}
fatal: [infra2.example.com]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: {{ hostvars[groups.oo_first_master.0].openshift_version }}: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'openshift_version'\n\nThe error appears to be in '/usr/share/ansible/openshift-ansible/playbooks/init/version.yml': line 20, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n  tasks:\n  - set_fact:\n    ^ here\n"}
fatal: [infra3.example.com]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: {{ hostvars[groups.oo_first_master.0].openshift_version }}: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'openshift_version'\n\nThe error appears to be in '/usr/share/ansible/openshift-ansible/playbooks/init/version.yml': line 20, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n  tasks:\n  - set_fact:\n    ^ here\n"}
fatal: [worker1.example.com]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: {{ hostvars[groups.oo_first_master.0].openshift_version }}: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'openshift_version'\n\nThe error appears to be in '/usr/share/ansible/openshift-ansible/playbooks/init/version.yml': line 20, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n  tasks:\n  - set_fact:\n    ^ here\n"}
fatal: [worker2.example.com]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: {{ hostvars[groups.oo_first_master.0].openshift_version }}: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'openshift_version'\n\nThe error appears to be in '/usr/share/ansible/openshift-ansible/playbooks/init/version.yml': line 20, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n  tasks:\n  - set_fact:\n    ^ here\n"}
fatal: [worker3.example.com]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: {{ hostvars[groups.oo_first_master.0].openshift_version }}: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'openshift_version'\n\nThe error appears to be in '/usr/share/ansible/openshift-ansible/playbooks/init/version.yml': line 20, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n  tasks:\n  - set_fact:\n    ^ here\n"}
fatal: [worker4.example.com]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: {{ hostvars[groups.oo_first_master.0].openshift_version }}: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'openshift_version'\n\nThe error appears to be in '/usr/share/ansible/openshift-ansible/playbooks/init/version.yml': line 20, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n  tasks:\n  - set_fact:\n    ^ here\n"}

PLAY [OpenShift Health Checks] ************************************************************************************************************************************************************************************

PLAY [Verify upgrade can proceed on first master] *****************************************************************************************************************************************************************

PLAY [Verify master processes] ************************************************************************************************************************************************************************************

PLAY [Verify masters are already upgraded] ************************************************************************************************************************************************************************

PLAY [Validate configuration for rolling restart] *****************************************************************************************************************************************************************

PLAY [Create temp file on localhost] ******************************************************************************************************************************************************************************

TASK [command] ****************************************************************************************************************************************************************************************************
skipping: [localhost]

PLAY [Check if temp file exists on any masters] *******************************************************************************************************************************************************************

PLAY [Cleanup temp file on localhost] *****************************************************************************************************************************************************************************

TASK [file] *******************************************************************************************************************************************************************************************************
skipping: [localhost]

PLAY [Warn if restarting the system where ansible is running] *****************************************************************************************************************************************************

PLAY [Verify upgrade targets] *************************************************************************************************************************************************************************************

PLAY [Verify docker upgrade targets] ******************************************************************************************************************************************************************************

PLAY [Verify Requirements] ****************************************************************************************************************************************************************************************

PLAY [Verify Node Prerequisites] **********************************************************************************************************************************************************************************
skipping: no hosts matched
[WARNING]: Could not match supplied host pattern, ignoring: oo_nodes_to_upgrade

PLAY [Prepull images and rpms before doing rolling restart] *******************************************************************************************************************************************************
skipping: no hosts matched

PLAY [Drain and upgrade nodes] ************************************************************************************************************************************************************************************
skipping: no hosts matched

PLAY [Re-enable excluders] ****************************************************************************************************************************************************************************************
skipping: no hosts matched

PLAY RECAP ********************************************************************************************************************************************************************************************************
localhost                  : ok=11   changed=0    unreachable=0    failed=0    skipped=7    rescued=0    ignored=0
infra1.example.com : ok=59   changed=11   unreachable=0    failed=1    skipped=38   rescued=0    ignored=0
infra2.example.com : ok=59   changed=11   unreachable=0    failed=1    skipped=36   rescued=0    ignored=0
infra3.example.com : ok=59   changed=11   unreachable=0    failed=1    skipped=36   rescued=0    ignored=0
master1.example.com : ok=29   changed=2    unreachable=0    failed=1    skipped=19   rescued=0    ignored=0
master2.example.com : ok=26   changed=2    unreachable=0    failed=1    skipped=19   rescued=0    ignored=0
master3.example.com : ok=26   changed=2    unreachable=0    failed=1    skipped=19   rescued=0    ignored=0
worker1.example.com : ok=59   changed=9    unreachable=0    failed=1    skipped=36   rescued=0    ignored=0
worker2.example.com : ok=59   changed=9    unreachable=0    failed=1    skipped=36   rescued=0    ignored=0
worker3.example.com : ok=59   changed=11   unreachable=0    failed=1    skipped=36   rescued=0    ignored=0
worker4.example.com : ok=59   changed=11   unreachable=0    failed=1    skipped=36   rescued=0    ignored=0



Failure summary:


  1. Hosts:    master1.example.com, master2.example.com, master3.example.com
     Play:     Initialize cluster facts
     Task:     Set Default scheduler predicates and priorities
     Message:  An unhandled exception occurred while running the lookup plugin 'openshift_master_facts_default_predicates'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Either OpenShift needs to be installed or openshift_release needs to be specified

  2. Hosts:    infra1.example.com, infra2.example.com, infra3.example.com, worker1.example.com, worker2.example.com, worker3.example.com, worker4.example.com
     Play:     Set openshift_version for etcd, node, and master hosts
     Task:     set_fact
     Message:  The task includes an option with an undefined variable. The error was: {{ hostvars[groups.oo_first_master.0].openshift_version }}: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'openshift_version'

               The error appears to be in '/usr/share/ansible/openshift-ansible/playbooks/init/version.yml': line 20, column 5, but may
               be elsewhere in the file depending on the exact syntax problem.

               The offending line appears to be:

                 tasks:
                 - set_fact:
                   ^ here
~~~~

As this Bugzilla mentioned https://github.com/openshift/openshift-ansible/pull/12307, we tried to apply https://github.com/openshift/openshift-ansible/pull/12307/commits manually. However this resulted in the Ansible error mentioned in comment #4.

After adding "openshift_version=3.11.380" to our inventory, we were able to proceed. More details can be found in support case 02866405.

Comment 8 Gaoyun Pei 2021-02-17 12:32:00 UTC
Could reproduce this issue with openshift-ansible-3.11.374-1.git.0.92f5956.el7.noarch.rpm.
When no openshift_release set in ansible inventory file, run playbook
openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_nodes.yml

02-17 17:18:51  PLAY [Set openshift_version for etcd, node, and master hosts] ******************
02-17 17:18:51  
02-17 17:18:51  TASK [Gathering Facts] *********************************************************
02-17 17:18:52  ok: [ci-vm-10-0-151-145.hosted.upshift.rdu2.redhat.com]
02-17 17:18:53  ok: [ci-vm-10-0-151-131.hosted.upshift.rdu2.redhat.com]
02-17 17:18:53  
02-17 17:18:53  TASK [set_fact] ****************************************************************
02-17 17:18:53  fatal: [ci-vm-10-0-151-131.hosted.upshift.rdu2.redhat.com]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: {{ hostvars[groups.oo_first_master.0].openshift_version }}: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'openshift_version'\n\nThe error appears to be in '/home/slave1/workspace/Run-Ansible-Playbooks-Nextge/private-openshift-ansible/playbooks/init/version.yml': line 20, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n  tasks:\n  - set_fact:\n    ^ here\n"}


With using openshift-ansible-3.11.387-1.git.0.78acf7c.el7.noarch.rpm, no such issue.

02-17 19:30:34  PLAY [Set openshift_version for etcd, node, and master hosts] ******************
02-17 19:30:34  
02-17 19:30:34  TASK [Gathering Facts] *********************************************************
02-17 19:30:35  ok: [ci-vm-10-0-151-145.hosted.upshift.rdu2.redhat.com]
02-17 19:30:36  ok: [ci-vm-10-0-151-131.hosted.upshift.rdu2.redhat.com]
02-17 19:30:36  
02-17 19:30:36  TASK [set_fact] ****************************************************************
02-17 19:30:36  ok: [ci-vm-10-0-151-131.hosted.upshift.rdu2.redhat.com] => {"ansible_facts": {"openshift_image_tag": "v3.11", "openshift_pkg_version": "-3.11*", "openshift_version": "3.11"}, "changed": false}
02-17 19:30:36  ok: [ci-vm-10-0-151-145.hosted.upshift.rdu2.redhat.com] => {"ansible_facts": {"openshift_image_tag": "v3.11", "openshift_pkg_version": "-3.11*", "openshift_version": "3.11"}, "changed": false}

Comment 11 errata-xmlrpc 2021-03-03 12:27:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 3.11.394 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0637


Note You need to log in before you can comment on or make changes to this bug.