Description of problem: When we apply a MC that does not trigger a drain execution, the nodes are annotated as drained anyway. Version-Release number of MCO (Machine Config Operator) (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-06-201913 True False 62m Cluster version is 4.11.0-0.nightly-2022-06-06-201913 Platform (AWS, VSphere, Metal, etc.): Are you certain that the root cause of the issue being reported is the MCO (Machine Config Operator)? (Y/N/Not sure): Yes How reproducible: Did you catch this issue by running a Jenkins job? If yes, please list: 1. Jenkins job: 2. Profile: Steps to Reproduce: 1. Get current annotated drains $ oc get node -l node-role.kubernetes.io/worker -o jsonpath='{.items[0].metadata.annotations.machineconfiguration\.openshift\.io/desiredDrain}' uncordon-rendered-worker-cd52afd4bd39d302834b215fcc978be8 It should be the latest rendered machine config that triggered a drain $ oc get mc| grep worker | grep render rendered-worker-cd52afd4bd39d302834b215fcc978be8 f5950ed0b5e5468fd172b37cef4a8f34995a3b3f 3.2.0 84m 2. Create a new MachineConfig resource that should not trigger a drain execution. We can, for example, create an ICSP apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: name: ubi8repo spec: repositoryDigestMirrors: - mirrors: - example.io/example/ubi-minimal source: registry.access.redhat.com/ubi8/ubi-minimal - mirrors: - example.com/example/ubi-minimal source: registry.access.redhat.com/ubi8/ubi-minimal 3. A new rendered machine config is generated $ oc get mc| grep worker | grep render rendered-worker-cd52afd4bd39d302834b215fcc978be8 f5950ed0b5e5468fd172b37cef4a8f34995a3b3f 3.2.0 86m rendered-worker-dc596e0454284758c260b1de37796675 f5950ed0b5e5468fd172b37cef4a8f34995a3b3f 3.2.0 1s this new rendered config (rendered-worker-dc596e0454284758c260b1de37796675) does NOT trigger a drain execution We can see these log messages in the daemon pods logs I0610 09:53:29.397019 2072 drain.go:237] /etc/containers/registries.conf: changes made are safe to skip drain I0610 09:53:29.397030 2072 update.go:544] Changes do not require drain, skipping. Actual results: Even if the new rendered machine config did not trigger a drain execution, the nodes are annotated as drained for this machine config $ oc get node -l node-role.kubernetes.io/worker -o jsonpath='{.items[0].metadata.annotations.machineconfiguration\.openshift\.io/desiredDrain}' uncordon-rendered-worker-dc596e0454284758c260b1de37796675 Expected results: The node annotations should display the actual drain execution Additional info:
Marking low since this is technically unchanged behaviour. The annotation indicates that an uncordon happened after the update was completed. This will always happen regardless if the drain has happened or not, so we requested a uncordon and then the controller did so (it's a no-op in this case, but just to make sure). I'm pretty sure we did this because at the very beginning of the no-reboot update implementation, for all updates, we'd cordon the nodes before doing the calculations (this is no longer the case). We can definitely skip this step. Longer term, the MCO probably shouldn't uncordon without checking who applied the cordon (something we don't track today at all, no metadata for this is tracked anywhere).
After discussion, we will be closing this as NOTABUG. The behaviour is unchanged and the annotation update is just cosmetic. We will reconsider proper cordon behaviour at a later time.