Bug 1917013

Summary: yum excluders are not set back in nodes after upgrade when it is done in separate phases and "openshift_upgrade_nodes_label" parameter is used to filter which nodes should be upgraded at a time
Product: OpenShift Container Platform Reporter: Joel Rosental R. <jrosenta>
Component: InstallerAssignee: Russell Teague <rteague>
Installer sub component: openshift-ansible QA Contact: Gaoyun Pei <gpei>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: mstaeble, vmedina
Version: 3.11.0Keywords: Reopened
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The node upgrade playbooks scoped all nodes instead of just nodes filtered by openshift_upgrade_nodes_label. Consequence: Nodes that were not intended to be upgraded had excluders disabled but not reenabled. Fix: Moved the initialization of the variable for filtering nodes to upgrade earlier in the play and scoped the pre/config to the filtered list of nodes. Result: Only nodes intended for upgrade have yum excluders disabled.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-25 09:50:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1933090    
Bug Blocks:    

Description Joel Rosental R. 2021-01-16 14:38:56 UTC
Version:

$ rpm -q openshift-ansible
openshift-ansible-3.11.318-2.git.1.da17c54.el7.noarch

$ rpm -q ansible
ansible-2.9.16-1.el7ae.noarch


What happened?

While upgrading an OCP 3.11 cluster in different phases (control plane and nodes), when the nodes upgrade playbook is invoked with "openshift_upgrade_nodes_label" parameter to specify that only nodes with a certain label are upgraded, excluders are disabled in all nodes at the beggining of the playbook execution but are not set back.

Only the node specified in the "openshift_upgrade_nodes_label" parameter has proper excluders set after playbook execution.

What did you expect to happen?

Excluders should be set back in all nodes after playbook finishes.

How to reproduce it (as minimally and precisely as possible)?

Perform an upgrade from OCP in separate phases (control plane & nodes), and in the nodes phase, launch the playbook with the "openshift_upgrade_nodes_label" parameter set to one of the nodes.

Comment 3 Matthew Staebler 2021-02-15 19:25:49 UTC
As there is a workaround for this and it is not blocking installs or upgrades, we will not be pursuing a fix.

Comment 4 Victor Medina 2021-02-25 11:54:28 UTC
@mstaeble Hello, would you point me out to the workaround link? Thanks

Comment 5 Russell Teague 2021-02-25 14:31:41 UTC
The excluders are present to prevent the upgrade of openshift packages during an OS upgrade.  Following the OS upgrade steps [1] include ensuring the excluders are enabled prior to OS upgrades.

From Step 2:
# atomic-openshift-docker-excluder exclude
# atomic-openshift-excluder exclude


[1] https://docs.openshift.com/container-platform/3.11/upgrading/os_upgrades.html

Comment 6 Victor Medina 2021-02-25 16:34:33 UTC
Thanks

Comment 7 Russell Teague 2021-02-25 16:45:59 UTC
When investigating this issue, I found that using openshift_upgrade_nodes_label was broken in the most recent code.  I've opened a bug for that issue, https://bugzilla.redhat.com/show_bug.cgi?id=1933090.  Since I was already fixing that issue, I also worked up a fix for this bug and will submit a patch shortly.

Comment 9 Gaoyun Pei 2021-03-13 15:01:04 UTC
Could reproduce this issue with openshift-ansible-3.11.318-1.git.0.bccee5b.el7.noarch.rpm

Step "disable openshift excluder" was executed on all the nodes, but only enabled openshift excluder on the node matched "openshift_upgrade_nodes_label" in the end of playbook playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_nodes.yml.

22:30:10  TASK [openshift_excluder : disable openshift excluder] *************************
22:30:11  [0;33mchanged: [ci-vm-10-0-148-147.hosted.upshift.rdu2.redhat.com] => {"changed": true, "cmd": ["/sbin/atomic-openshift-excluder", "unexclude"], "delta": "0:00:00.054883", "end": "2021-03-13 09:30:10.031549", "rc": 0, "start": "2021-03-13 09:30:09.976666", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}[0m
22:30:11  [0;33mchanged: [ci-vm-10-0-150-233.hosted.upshift.rdu2.redhat.com] => {"changed": true, "cmd": ["/sbin/atomic-openshift-excluder", "unexclude"], "delta": "0:00:00.069715", "end": "2021-03-13 09:30:10.230393", "rc": 0, "start": "2021-03-13 09:30:10.160678", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}[0m

...

22:36:07  TASK [openshift_excluder : Enable openshift excluder] **************************
22:36:08  [0;33mchanged: [ci-vm-10-0-150-233.hosted.upshift.rdu2.redhat.com] => {"changed": true, "cmd": ["/sbin/atomic-openshift-excluder", "exclude"], "delta": "0:00:00.075231", "end": "2021-03-13 09:36:07.115162", "rc": 0, "start": "2021-03-13 09:36:07.039931", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}[0m



Verified with openshift-ansible-3.11.400-1.git.0.3f4fe20.el7.noarch.rpm, openshift excluder would be disabled on only the node matched "openshift_upgrade_nodes_label".

 TASK [openshift_excluder : disable openshift excluder] *************************
22:44:02 
 changed: [ci-vm-10-0-150-233.hosted.upshift.rdu2.redhat.com] => {"changed": true, "cmd": ["/sbin/atomic-openshift-excluder", "unexclude"], "delta": "0:00:00.068328", "end": "2021-03-13 09:44:00.608174", "rc": 0, "start": "2021-03-13 09:44:00.539846", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

Comment 12 errata-xmlrpc 2021-03-25 09:50:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 3.11.404 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0833