Description of problem: In our functional tests we use the logs in the machine-config-controller pod to check certain events. Currently the logs in the migration controller are printing the message: "I0523 15:40:42.153153 1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: initiating drain" But actually the controller starts a cordon operation instead of a drain operation. It makes difficult for us to use the controller logs. Version-Release number of MCO (Machine Config Operator) (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-05-20-213928 True False 6h23m Cluster version is 4.11.0-0.nightly-2022-05-20-213928 Platform (AWS, VSphere, Metal, etc.): Are you certain that the root cause of the issue being reported is the MCO (Machine Config Operator)? (Y/N/Not sure): How reproducible: Always Did you catch this issue by running a Jenkins job? If yes, please list: 1. Jenkins job: 2. Profile: Steps to Reproduce: 1. Create a MachineConfig that triggers a cordon+drain process in the nodes cat << EOF | oc create -f - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: test-drain-maxunavail spec: config: ignition: version: 3.2.0 kernelArguments: - quiet kernelType: realtime EOF Actual results: The machine-config controller pod will show this logs for every node: I0523 15:40:42.153153 1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: initiating drain <<<<<<<<<--- HERE IT SAYS THE DRAIN BEGINS BUT IT'S NOT TRUE I0523 15:40:42.153250 1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: cordoning <<<<<<<<<<<-- HERE WE CORDON. WE REALLY START CORDONING. I0523 15:40:42.153276 1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: initiating cordon (currently schedulable: true) I0523 15:40:42.179749 1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: cordon succeeded (currently schedulable: false) I0523 15:40:42.196015 1 node_controller.go:446] Pool worker[zone=us-east-2a]: node ip-10-0-137-171.us-east-2.compute.internal: changed taints E0523 15:40:42.829374 1 drain_controller.go:106] WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-csi-drivers/aws-ebs-csi-driver-node-tdqtp, openshift-cluster-node-tuning-operator/tuned-rkxz4, openshift-dns/dns-default-spqzk, openshift-dns/node-resolver-d8q99, openshift-image-registry/node-ca-jxs4m, openshift-ingress-canary/ingress-canary-pqdnd, openshift-machine-config-operator/machine-config-daemon-snbvd, openshift-monitoring/node-exporter-r2brt, openshift-multus/multus-additional-cni-plugins-kmbns, openshift-multus/multus-hccmq, openshift-multus/network-metrics-daemon-tm5k7, openshift-network-diagnostics/network-check-target-rfm9r, openshift-ovn-kubernetes/ovnkube-node-fkhnj <<<<<<<<<---- HERE WE START ACTUALLY DRAINING I0523 15:40:42.830596 1 drain_controller.go:106] evicting pod openshift-operator-lifecycle-manager/collect-profiles-27555330-8cbwp The logs say "initiating drain", but it triggers a cordon operation instead or draining. Expected results: "initiating drain" log should be printed when the drain in the nodes is actually executed. Additional info:
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days