Bug 1933090 - [3.11] Upgrade fails when specifying openshift_upgrade_nodes_label
Summary: [3.11] Upgrade fails when specifying openshift_upgrade_nodes_label
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.11.z
Assignee: Russell Teague
QA Contact: Gaoyun Pei
URL:
Whiteboard:
: 1935456 1935796 (view as bug list)
Depends On:
Blocks: 1917013
TreeView+ depends on / blocked
 
Reported: 2021-02-25 16:22 UTC by Russell Teague
Modified: 2021-10-28 04:12 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Node based facts such as l_kubelet_node_name were being set late in the upgrade cycle. Consequence: The fact was undefined when referenced. Fix: Moved node based fact initialization earlier when other node facts are set. Result: Facts are set prior to being referenced.
Clone Of:
Environment:
Last Closed: 2021-03-25 09:50:07 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 12310 0 None open Bug 1933090: Move node fact initialization to basic_facts.yml 2021-02-25 16:34:01 UTC
Red Hat Knowledge Base (Solution) 5874501 0 None None None 2021-03-10 22:09:31 UTC
Red Hat Product Errata RHSA-2021:0833 0 None None None 2021-03-25 09:50:22 UTC

Description Russell Teague 2021-02-25 16:22:52 UTC
When runnng an upgrade with openshift_upgrade_nodes_label specified, the upgrade fails at TASK [Map labelled nodes to inventory hosts].

Version: openshift-ansible-3.11.394-1


TASK [Map labelled nodes to inventory hosts] ******************************************************************************************************************
task path: /home/rteague/git/oa-testing/aws-3a/openshift-ansible/playbooks/common/openshift-cluster/upgrades/initialize_nodes_to_upgrade.yml:25
Thursday 25 February 2021  09:50:17 -0500 (0:00:00.025)       0:00:17.059 ***** 
fatal: [ec2-13-58-87-255.us-east-2.compute.amazonaws.com]: FAILED! => 
  msg: |-
    The conditional check 'hostvars[item].l_kubelet_node_name | lower in nodes_to_upgrade.module_results.results[0]['items'] | map(attribute='metadata.name') | list' failed. The error was: error while evaluating conditional (hostvars[item].l_kubelet_node_name | lower in nodes_to_upgrade.module_results.results[0]['items'] | map(attribute='metadata.name') | list): 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'l_kubelet_node_name'
  
    The error appears to be in '/home/rteague/git/oa-testing/aws-3a/openshift-ansible/playbooks/common/openshift-cluster/upgrades/initialize_nodes_to_upgrade.yml': line 25, column 7, but may
    be elsewhere in the file depending on the exact syntax problem.
  
    The offending line appears to be:
  
        # using their openshift.common.hostname fact.
        - name: Map labelled nodes to inventory hosts
          ^ here

This regression was introduced in https://github.com/openshift/openshift-ansible/commit/8694821bccc1fb58f82b154ba0a35ccda8ec22e1

Comment 1 Russell Teague 2021-03-05 13:31:35 UTC
*** Bug 1935456 has been marked as a duplicate of this bug. ***

Comment 2 Russell Teague 2021-03-05 21:21:58 UTC
*** Bug 1935796 has been marked as a duplicate of this bug. ***

Comment 4 Russell Teague 2021-03-10 15:13:28 UTC
The proposed fix [1] for this bug has merged and will be tested by QE.  If desired, this patch could be tested in a development environment as a potential workaround until the patch is shipped in the next release.

[1] https://github.com/openshift/openshift-ansible/pull/12310

Comment 8 Florian 2021-03-12 15:41:32 UTC
May I recommend to add a short note to the OCP 3.11 release notes at https://docs.openshift.com/container-platform/3.11/release_notes/ocp_3_11_release_notes.html#ocp-3-11-394 to hint to this bug?

We learned of this the hard way during the update of our second test cluster (first one only had master nodes).

Comment 10 Gaoyun Pei 2021-03-13 14:05:26 UTC
Could reproduce this issue with openshift-ansible-3.11.394-6.git.0.47ec25d.el7.noarch.rpm.

When setting openshift_upgrade_nodes_label="infra=true" in the inventory file, run playbook
playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_nodes.yml
It will fail as below:

TASK [Map labelled nodes to inventory hosts] ***********************************
21:34:32 
 fatal: [ci-vm-10-0-150-191.hosted.upshift.rdu2.redhat.com]: FAILED! => {"msg": "The conditional check 'hostvars[item].l_kubelet_node_name | lower in nodes_to_upgrade.module_results.results[0]['items'] | map(attribute='metadata.name') | list' failed. The error was: error while evaluating conditional (hostvars[item].l_kubelet_node_name | lower in nodes_to_upgrade.module_results.results[0]['items'] | map(attribute='metadata.name') | list): 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'l_kubelet_node_name'\n\nThe error appears to be in '/home/slave1/workspace/Run-Ansible-Playbooks-Nextge/private-openshift-ansible/playbooks/common/openshift-cluster/upgrades/initialize_nodes_to_upgrade.yml': line 25, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n    # using their openshift.common.hostname fact.\n    - name: Map labelled nodes to inventory hosts\n      ^ here\n"}
21:34:32 
 

Verified on openshift-ansible-3.11.400-1.git.0.3f4fe20.el7.noarch.rpm.

 TASK [Map labelled nodes to inventory hosts] ***********************************
21:25:11 
 skipping: [ci-vm-10-0-150-191.hosted.upshift.rdu2.redhat.com] => (item=ci-vm-10-0-150-191.hosted.upshift.rdu2.redhat.com)  => {"ansible_loop_var": "item", "changed": false, "item": "ci-vm-10-0-150-191.hosted.upshift.rdu2.redhat.com", "skip_reason": "Conditional result was False"}
21:25:11 
 ok: [ci-vm-10-0-150-191.hosted.upshift.rdu2.redhat.com] => (item=ci-vm-10-0-150-233.hosted.upshift.rdu2.redhat.com) => {"add_host": {"groups": ["temp_nodes_to_upgrade"], "host_name": "ci-vm-10-0-150-233.hosted.upshift.rdu2.redhat.com", "host_vars": {}}, "ansible_loop_var": "item", "changed": false, "item": "ci-vm-10-0-150-233.hosted.upshift.rdu2.redhat.com"}
21:25:11 
 skipping: [ci-vm-10-0-150-191.hosted.upshift.rdu2.redhat.com] => (item=ci-vm-10-0-148-147.hosted.upshift.rdu2.redhat.com)  => {"ansible_loop_var": "item", "changed": false, "item": "ci-vm-10-0-148-147.hosted.upshift.rdu2.redhat.com", "skip_reason": "Conditional result was False"}

Comment 17 errata-xmlrpc 2021-03-25 09:50:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 3.11.404 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0833


Note You need to log in before you can comment on or make changes to this bug.