Bug 1935456

Summary: OCP 3.11.394 upgrade fails for non control plane nodes
Product: OpenShift Container Platform Reporter: Chuck Douglas <cdouglas>
Component: InstallerAssignee: aos-install
Installer sub component: openshift-ansible QA Contact: Gaoyun Pei <gpei>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified    
Version: 3.11.0   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-05 13:31:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chuck Douglas 2021-03-04 22:47:33 UTC
Thanks for opening a bug report!
Before hitting the button, please fill in as much of the template below as you can.
If you leave out information, it's harder to help you.
Be ready for follow-up questions, and please respond in a timely manner.
If we can't reproduce a bug we might close your issue.
If we're wrong, PLEASE feel free to reopen it and explain why.

Version:

openshift-ansible-playbooks-3.11.394-6.git.0.47ec25d.el7.noarch

Platform:

Bare metal

Please specify:
* UPI (semi-manual installation on customized infrastructure)

What happened?
Performing an upgrade from OCP 3.11.286 to OCP 3.11.394 works for the control
plane using the following playbook:

/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_control_plane.yml -i hosts.lab1 -e openshift_certificate_expiry_warning_days="5" -vvv

Performing an upgrade from OCP 3.11.286 to OCP 3.11.394 for infra or compute nodes fails if the following is used:

/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_nodes.yml -i hosts.lab1  -e openshift_upgrade_nodes_serial="1" \
  -e openshift_upgrade_nodes_label="region=infra" \
  -e openshift_certificate_expiry_warning_days="5" \
  -vvv


#Enter text here.
Control plane updated successfully.  The infra nodes failed with the following:

Failure summary:


  1. Hosts:    njrarltapp0017a.linux.us.ams1907.com
     Play:     Filter list of nodes to be upgraded if necessary
     Task:     Map labelled nodes to inventory hosts
     Message:  The conditional check 'hostvars[item].l_kubelet_node_name | lower in nodes_to_upgrade.module_results.results[0]['items'] | map(attribute='metadata.name') | list' failed. The error was: error while evaluating conditional (hostvars[item].l_kubelet_node_name | lower in nodes_to_upgrade.module_results.results[0]['items'] | map(attribute='metadata.name') | list): 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'l_kubelet_node_name'

               The error appears to be in '/usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/initialize_nodes_to_upgrade.yml': line 25, column 7, but may
               be elsewhere in the file depending on the exact syntax problem.

               The offending line appears to be:

                   # using their openshift.common.hostname fact.
                   - name: Map labelled nodes to inventory hosts
                     ^ here


What did you expect to happen?

Successful upgrade of infra and/or compute nodes

How to reproduce it (as minimally and precisely as possible)?

Reproduced using the commands above.

Anything else we need to know?

This is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1921353 however the workarounds noted in that fix and in the comments do not work.

l_kubelet_node_name is only set in /usr/share/ansible/openshift-ansible/playbooks/init/cluster_facts.yml file and this isn't called when upgrading non-control plane nodes.

We were able to work around this using the following patch:

--- initialize_nodes_to_upgrade.yml     2021-03-04 17:09:40.231942468 -0500
+++ initialize_nodes_to_upgrade.yml.orig        2021-03-04 16:59:49.684963443 -0500
@@ -1,7 +1,4 @@
 ---
-- name: Gather some preliminary facts
-  import_playbook: ../../../init/cluster_facts.yml
-
 - name: Filter list of nodes to be upgraded if necessary
   hosts: oo_first_master



This is applied to /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/initialize_nodes_to_upgrade.yml

Comment 1 Russell Teague 2021-03-05 13:31:35 UTC

*** This bug has been marked as a duplicate of bug 1933090 ***