Bug 2035005

Summary: MCD is not always removing in progress taint after a successful update
Product: OpenShift Container Platform Reporter: Simone Tiraboschi <stirabos>
Component: Machine Config OperatorAssignee: Yu Qi Zhang <jerzhang>
Machine Config Operator sub component: Machine Config Operator QA Contact: Rio Liu <rioliu>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, dbasunag, dsanzmor, jerzhang, lmohanty, mkrejci, nmalik, rsevilla, rzaleski, sdodson, skumari, wking
Version: 4.10Keywords: Upgrades
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2076308 2102069 (view as bug list) Environment:
Last Closed: 2022-08-10 10:41:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2076308    

Description Simone Tiraboschi 2021-12-22 17:22:52 UTC
Description of problem:

Version-Release number of MCO (Machine Config Operator) (if applicable):

Platform (AWS, VSphere, Metal, etc.):

Are you certain that the root cause of the issue being reported is the MCO (Machine Config Operator)?
(Y/N/Not sure): Y

How reproducible: ? (not on all the nodes)

Did you catch this issue by running a Jenkins job? If yes, please list:
1. Jenkins job: https://main-jenkins-csb-cnvqe.apps.ocp-c1.prod.psi.redhat.com/view/Upgrade-Pipelines/job/Upgrade-CNV-4.10-Scheduled/17/

2. Profile: ?

Steps to Reproduce:
1. during our tests we apply an ICSP that causes a new config to be rendered for our nodes
2. MCP starts applying it
3.

Actual results:
at the end of the process a few nodes (not all of them) still contains:

        "taints": [
            {
                "effect": "PreferNoSchedule",
                "key": "UpdateInProgress"
            }
        ]



Expected results:
after a successful update UpdateInProgress taint is correctly removed

Additional info:

on the node we see:
            "machineconfiguration.openshift.io/currentConfig": "rendered-worker-f283e2dd057330b8c4d288348ae5b5cb",
            "machineconfiguration.openshift.io/desiredConfig": "rendered-worker-f283e2dd057330b8c4d288348ae5b5cb",

on MCD logs:
I1222 06:34:39.073411    3396 update.go:1956] Update completed for config rendered-worker-f283e2dd057330b8c4d288348ae5b5cb and node has been successfully uncordoned
I1222 06:34:39.120947    3396 daemon.go:1283] In desired config rendered-worker-f283e2dd057330b8c4d288348ae5b5cb

but the node is still tainted.

Comment 2 Sinny Kumari 2021-12-22 17:27:39 UTC
Ravi, can you please look at this bug as this seems like regression from PR https://github.com/openshift/machine-config-operator/pull/2686

Comment 3 Simone Tiraboschi 2021-12-22 17:28:58 UTC
in machine-config-controller logs we see:

I1222 06:42:29.011398       1 status.go:90] Pool worker: All nodes are updated with rendered-worker-f283e2dd057330b8c4d288348ae5b5cb

Comment 4 Simone Tiraboschi 2021-12-22 17:29:50 UTC
*** Bug 2034901 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2022-08-10 10:41:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 18 Red Hat Bugzilla 2023-09-18 04:29:42 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days