Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2095707

Summary: drain annotations are updated when no drain is executed
Product: OpenShift Container Platform Reporter: Sergio <sregidor>
Component: Machine Config OperatorAssignee: Yu Qi Zhang <jerzhang>
Machine Config Operator sub component: Machine Config Operator QA Contact: Sergio <sregidor>
Status: CLOSED NOTABUG Docs Contact:
Severity: low    
Priority: low CC: mkrejci, rioliu
Version: 4.11   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-10-24 16:28:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sergio 2022-06-10 10:12:33 UTC
Description of problem:
When we apply a MC that does not trigger a drain execution, the nodes are annotated as drained anyway.


Version-Release number of MCO (Machine Config Operator) (if applicable):
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-06-201913   True        False         62m     Cluster version is 4.11.0-0.nightly-2022-06-06-201913

Platform (AWS, VSphere, Metal, etc.):

Are you certain that the root cause of the issue being reported is the MCO (Machine Config Operator)?
(Y/N/Not sure): Yes

How reproducible:

Did you catch this issue by running a Jenkins job? If yes, please list:
1. Jenkins job:

2. Profile:

Steps to Reproduce:
1. Get current annotated drains

$ oc get node -l node-role.kubernetes.io/worker  -o jsonpath='{.items[0].metadata.annotations.machineconfiguration\.openshift\.io/desiredDrain}'
uncordon-rendered-worker-cd52afd4bd39d302834b215fcc978be8

It should be the latest rendered machine config that triggered a drain
$ oc get mc| grep worker | grep render
rendered-worker-cd52afd4bd39d302834b215fcc978be8   f5950ed0b5e5468fd172b37cef4a8f34995a3b3f   3.2.0             84m


2. Create a new MachineConfig resource that should not trigger a drain execution. We can, for example, create an ICSP

apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  name: ubi8repo
spec:
  repositoryDigestMirrors:
  - mirrors:
    - example.io/example/ubi-minimal
    source: registry.access.redhat.com/ubi8/ubi-minimal
  - mirrors:
    - example.com/example/ubi-minimal
    source: registry.access.redhat.com/ubi8/ubi-minimal



3. A new rendered machine config is generated
$ oc get mc| grep worker | grep render
rendered-worker-cd52afd4bd39d302834b215fcc978be8   f5950ed0b5e5468fd172b37cef4a8f34995a3b3f   3.2.0             86m
rendered-worker-dc596e0454284758c260b1de37796675   f5950ed0b5e5468fd172b37cef4a8f34995a3b3f   3.2.0             1s

this new rendered config (rendered-worker-dc596e0454284758c260b1de37796675) does NOT trigger a drain execution

We can see these log messages in the daemon pods logs
I0610 09:53:29.397019    2072 drain.go:237] /etc/containers/registries.conf: changes made are safe to skip drain
I0610 09:53:29.397030    2072 update.go:544] Changes do not require drain, skipping.


Actual results:
Even if the new rendered machine config did not trigger a drain execution, the nodes are annotated as drained for this machine config

$ oc get node -l node-role.kubernetes.io/worker  -o jsonpath='{.items[0].metadata.annotations.machineconfiguration\.openshift\.io/desiredDrain}'
uncordon-rendered-worker-dc596e0454284758c260b1de37796675


Expected results:
The node annotations should display the actual drain execution


Additional info:

Comment 1 Yu Qi Zhang 2022-06-20 22:28:00 UTC
Marking low since this is technically unchanged behaviour.

The annotation indicates that an uncordon happened after the update was completed. This will always happen regardless if the drain has happened or not, so we requested a uncordon and then the controller did so (it's a no-op in this case, but just to make sure).

I'm pretty sure we did this because at the very beginning of the no-reboot update implementation, for all updates, we'd cordon the nodes before doing the calculations (this is no longer the case).

We can definitely skip this step. Longer term, the MCO probably shouldn't uncordon without checking who applied the cordon (something we don't track today at all, no metadata for this is tracked anywhere).

Comment 2 Yu Qi Zhang 2022-10-24 16:28:45 UTC
After discussion, we will be closing this as NOTABUG. The behaviour is unchanged and the annotation update is just cosmetic. We will reconsider proper cordon behaviour at a later time.