In recent few bugs we have seen that node remains in SchedulingDisabled even after MCO updated node successfully and node has been uncordoned successfully. It hard to debug the root cause with limited logging. As a result, it will be nice to have better logging around cordon/unCordon and make it more robust. Bugs where we have seen this issue: - https://bugzilla.redhat.com/show_bug.cgi?id=2022387 - https://bugzilla.redhat.com/show_bug.cgi?id=2015589
Actually I think https://bugzilla.redhat.com/show_bug.cgi?id=2021151 is due to racing with SRIOV reboots. Maybe we should link https://bugzilla.redhat.com/show_bug.cgi?id=2015589 instead (thinking of closing it as a duplicate)
(In reply to Yu Qi Zhang from comment #1) > Actually I think https://bugzilla.redhat.com/show_bug.cgi?id=2021151 is due > to racing with SRIOV reboots. Maybe we should link > https://bugzilla.redhat.com/show_bug.cgi?id=2015589 instead (thinking of > closing it as a duplicate) Thanks Jerry, updated.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056