Bug 2024108

Summary: Occasionally node remains in SchedulingDisabled state even after update has been completed sucessfully
Product: OpenShift Container Platform Reporter: Sinny Kumari <skumari>
Component: Machine Config OperatorAssignee: Sinny Kumari <skumari>
Machine Config Operator sub component: Machine Config Operator QA Contact: Rio Liu <rioliu>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, jerzhang, mkrejci
Version: 4.10   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:28:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2026275    

Description Sinny Kumari 2021-11-17 10:33:40 UTC
In recent few bugs we have seen that node remains in SchedulingDisabled even after MCO updated node successfully and node has been uncordoned successfully.

It hard to debug the root cause with limited logging. As a result, it will be nice to have better logging around cordon/unCordon and make it more robust.

Bugs where we have seen this issue:
- https://bugzilla.redhat.com/show_bug.cgi?id=2022387
- https://bugzilla.redhat.com/show_bug.cgi?id=2015589

Comment 1 Yu Qi Zhang 2021-11-17 21:47:18 UTC
Actually I think https://bugzilla.redhat.com/show_bug.cgi?id=2021151 is due to racing with SRIOV reboots. Maybe we should link https://bugzilla.redhat.com/show_bug.cgi?id=2015589 instead (thinking of closing it as a duplicate)

Comment 2 Sinny Kumari 2021-11-18 11:22:43 UTC
(In reply to Yu Qi Zhang from comment #1)
> Actually I think https://bugzilla.redhat.com/show_bug.cgi?id=2021151 is due
> to racing with SRIOV reboots. Maybe we should link
> https://bugzilla.redhat.com/show_bug.cgi?id=2015589 instead (thinking of
> closing it as a duplicate)

Thanks Jerry, updated.

Comment 8 errata-xmlrpc 2022-03-10 16:28:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056