Bug 1933090

Summary: [3.11] Upgrade fails when specifying openshift_upgrade_nodes_label
Product: OpenShift Container Platform Reporter: Russell Teague <rteague>
Component: InstallerAssignee: Russell Teague <rteague>
Installer sub component: openshift-ansible QA Contact: Gaoyun Pei <gpei>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-install, apjagtap, awyatt, cdouglas, dkaylor, emahoney, florian.herzog.fh, jnordell, mnunes, mstaeble, rabdulra
Version: 3.11.0Keywords: Regression
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Node based facts such as l_kubelet_node_name were being set late in the upgrade cycle. Consequence: The fact was undefined when referenced. Fix: Moved node based fact initialization earlier when other node facts are set. Result: Facts are set prior to being referenced.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-25 09:50:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1917013    

Description Russell Teague 2021-02-25 16:22:52 UTC
When runnng an upgrade with openshift_upgrade_nodes_label specified, the upgrade fails at TASK [Map labelled nodes to inventory hosts].

Version: openshift-ansible-3.11.394-1


TASK [Map labelled nodes to inventory hosts] ******************************************************************************************************************
task path: /home/rteague/git/oa-testing/aws-3a/openshift-ansible/playbooks/common/openshift-cluster/upgrades/initialize_nodes_to_upgrade.yml:25
Thursday 25 February 2021  09:50:17 -0500 (0:00:00.025)       0:00:17.059 ***** 
fatal: [ec2-13-58-87-255.us-east-2.compute.amazonaws.com]: FAILED! => 
  msg: |-
    The conditional check 'hostvars[item].l_kubelet_node_name | lower in nodes_to_upgrade.module_results.results[0]['items'] | map(attribute='metadata.name') | list' failed. The error was: error while evaluating conditional (hostvars[item].l_kubelet_node_name | lower in nodes_to_upgrade.module_results.results[0]['items'] | map(attribute='metadata.name') | list): 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'l_kubelet_node_name'
  
    The error appears to be in '/home/rteague/git/oa-testing/aws-3a/openshift-ansible/playbooks/common/openshift-cluster/upgrades/initialize_nodes_to_upgrade.yml': line 25, column 7, but may
    be elsewhere in the file depending on the exact syntax problem.
  
    The offending line appears to be:
  
        # using their openshift.common.hostname fact.
        - name: Map labelled nodes to inventory hosts
          ^ here

This regression was introduced in https://github.com/openshift/openshift-ansible/commit/8694821bccc1fb58f82b154ba0a35ccda8ec22e1

Comment 1 Russell Teague 2021-03-05 13:31:35 UTC
*** Bug 1935456 has been marked as a duplicate of this bug. ***

Comment 2 Russell Teague 2021-03-05 21:21:58 UTC
*** Bug 1935796 has been marked as a duplicate of this bug. ***

Comment 4 Russell Teague 2021-03-10 15:13:28 UTC
The proposed fix [1] for this bug has merged and will be tested by QE.  If desired, this patch could be tested in a development environment as a potential workaround until the patch is shipped in the next release.

[1] https://github.com/openshift/openshift-ansible/pull/12310

Comment 8 Florian 2021-03-12 15:41:32 UTC
May I recommend to add a short note to the OCP 3.11 release notes at https://docs.openshift.com/container-platform/3.11/release_notes/ocp_3_11_release_notes.html#ocp-3-11-394 to hint to this bug?

We learned of this the hard way during the update of our second test cluster (first one only had master nodes).

Comment 10 Gaoyun Pei 2021-03-13 14:05:26 UTC
Could reproduce this issue with openshift-ansible-3.11.394-6.git.0.47ec25d.el7.noarch.rpm.

When setting openshift_upgrade_nodes_label="infra=true" in the inventory file, run playbook
playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_nodes.yml
It will fail as below:

TASK [Map labelled nodes to inventory hosts] ***********************************
21:34:32 
 fatal: [ci-vm-10-0-150-191.hosted.upshift.rdu2.redhat.com]: FAILED! => {"msg": "The conditional check 'hostvars[item].l_kubelet_node_name | lower in nodes_to_upgrade.module_results.results[0]['items'] | map(attribute='metadata.name') | list' failed. The error was: error while evaluating conditional (hostvars[item].l_kubelet_node_name | lower in nodes_to_upgrade.module_results.results[0]['items'] | map(attribute='metadata.name') | list): 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'l_kubelet_node_name'\n\nThe error appears to be in '/home/slave1/workspace/Run-Ansible-Playbooks-Nextge/private-openshift-ansible/playbooks/common/openshift-cluster/upgrades/initialize_nodes_to_upgrade.yml': line 25, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n    # using their openshift.common.hostname fact.\n    - name: Map labelled nodes to inventory hosts\n      ^ here\n"}
21:34:32 
 

Verified on openshift-ansible-3.11.400-1.git.0.3f4fe20.el7.noarch.rpm.

 TASK [Map labelled nodes to inventory hosts] ***********************************
21:25:11 
 skipping: [ci-vm-10-0-150-191.hosted.upshift.rdu2.redhat.com] => (item=ci-vm-10-0-150-191.hosted.upshift.rdu2.redhat.com)  => {"ansible_loop_var": "item", "changed": false, "item": "ci-vm-10-0-150-191.hosted.upshift.rdu2.redhat.com", "skip_reason": "Conditional result was False"}
21:25:11 
 ok: [ci-vm-10-0-150-191.hosted.upshift.rdu2.redhat.com] => (item=ci-vm-10-0-150-233.hosted.upshift.rdu2.redhat.com) => {"add_host": {"groups": ["temp_nodes_to_upgrade"], "host_name": "ci-vm-10-0-150-233.hosted.upshift.rdu2.redhat.com", "host_vars": {}}, "ansible_loop_var": "item", "changed": false, "item": "ci-vm-10-0-150-233.hosted.upshift.rdu2.redhat.com"}
21:25:11 
 skipping: [ci-vm-10-0-150-191.hosted.upshift.rdu2.redhat.com] => (item=ci-vm-10-0-148-147.hosted.upshift.rdu2.redhat.com)  => {"ansible_loop_var": "item", "changed": false, "item": "ci-vm-10-0-148-147.hosted.upshift.rdu2.redhat.com", "skip_reason": "Conditional result was False"}

Comment 17 errata-xmlrpc 2021-03-25 09:50:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 3.11.404 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0833