Bug 2024108 - Occasionally node remains in SchedulingDisabled state even after update has been completed sucessfully
Summary: Occasionally node remains in SchedulingDisabled state even after update has b...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Sinny Kumari
QA Contact: Rio Liu
URL:
Whiteboard:
Depends On:
Blocks: 2026275
TreeView+ depends on / blocked
 
Reported: 2021-11-17 10:33 UTC by Sinny Kumari
Modified: 2022-03-10 16:29 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:28:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2829 0 None open Bug 2024108: daemon: make cordon/uncordon more robust 2021-11-17 10:34:56 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:29:08 UTC

Description Sinny Kumari 2021-11-17 10:33:40 UTC
In recent few bugs we have seen that node remains in SchedulingDisabled even after MCO updated node successfully and node has been uncordoned successfully.

It hard to debug the root cause with limited logging. As a result, it will be nice to have better logging around cordon/unCordon and make it more robust.

Bugs where we have seen this issue:
- https://bugzilla.redhat.com/show_bug.cgi?id=2022387
- https://bugzilla.redhat.com/show_bug.cgi?id=2015589

Comment 1 Yu Qi Zhang 2021-11-17 21:47:18 UTC
Actually I think https://bugzilla.redhat.com/show_bug.cgi?id=2021151 is due to racing with SRIOV reboots. Maybe we should link https://bugzilla.redhat.com/show_bug.cgi?id=2015589 instead (thinking of closing it as a duplicate)

Comment 2 Sinny Kumari 2021-11-18 11:22:43 UTC
(In reply to Yu Qi Zhang from comment #1)
> Actually I think https://bugzilla.redhat.com/show_bug.cgi?id=2021151 is due
> to racing with SRIOV reboots. Maybe we should link
> https://bugzilla.redhat.com/show_bug.cgi?id=2015589 instead (thinking of
> closing it as a duplicate)

Thanks Jerry, updated.

Comment 8 errata-xmlrpc 2022-03-10 16:28:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.