Bug 2090358

Summary: Initiating drain log message is displayed before the drain actually starts
Product: OpenShift Container Platform Reporter: Sergio <sregidor>
Component: Machine Config OperatorAssignee: Yu Qi Zhang <jerzhang>
Machine Config Operator sub component: Machine Config Operator QA Contact: Sergio <sregidor>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: unspecified CC: mkrejci, rioliu
Version: 4.11   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:14:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sergio 2022-05-25 15:25:22 UTC
Description of problem:
In our functional tests we use the logs in the machine-config-controller pod to check certain events.

Currently the logs in the migration controller are printing the message:

"I0523 15:40:42.153153       1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: initiating drain"

But actually the controller starts a cordon operation instead of a drain operation.

It makes difficult for us to use the controller logs.


Version-Release number of MCO (Machine Config Operator) (if applicable):
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-05-20-213928   True        False         6h23m   Cluster version is 4.11.0-0.nightly-2022-05-20-213928

Platform (AWS, VSphere, Metal, etc.):

Are you certain that the root cause of the issue being reported is the MCO (Machine Config Operator)?
(Y/N/Not sure):

How reproducible:
Always

Did you catch this issue by running a Jenkins job? If yes, please list:
1. Jenkins job:

2. Profile:

Steps to Reproduce:
1. Create a MachineConfig that triggers a cordon+drain process in the nodes
cat << EOF | oc create -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-drain-maxunavail
spec:
  config:
    ignition:
      version: 3.2.0
  kernelArguments:
    - quiet
  kernelType: realtime
EOF



Actual results:
The machine-config controller pod will show this logs for every node:

I0523 15:40:42.153153       1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: initiating drain   <<<<<<<<<--- HERE IT SAYS THE DRAIN BEGINS BUT IT'S NOT TRUE
I0523 15:40:42.153250       1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: cordoning  <<<<<<<<<<<-- HERE WE CORDON. WE REALLY START CORDONING.
I0523 15:40:42.153276       1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: initiating cordon (currently schedulable: true)
I0523 15:40:42.179749       1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: cordon succeeded (currently schedulable: false)
I0523 15:40:42.196015       1 node_controller.go:446] Pool worker[zone=us-east-2a]: node ip-10-0-137-171.us-east-2.compute.internal: changed taints
E0523 15:40:42.829374       1 drain_controller.go:106] WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-csi-drivers/aws-ebs-csi-driver-node-tdqtp, openshift-cluster-node-tuning-operator/tuned-rkxz4, openshift-dns/dns-default-spqzk, openshift-dns/node-resolver-d8q99, openshift-image-registry/node-ca-jxs4m, openshift-ingress-canary/ingress-canary-pqdnd, openshift-machine-config-operator/machine-config-daemon-snbvd, openshift-monitoring/node-exporter-r2brt, openshift-multus/multus-additional-cni-plugins-kmbns, openshift-multus/multus-hccmq, openshift-multus/network-metrics-daemon-tm5k7, openshift-network-diagnostics/network-check-target-rfm9r, openshift-ovn-kubernetes/ovnkube-node-fkhnj   <<<<<<<<<---- HERE WE START ACTUALLY DRAINING
I0523 15:40:42.830596       1 drain_controller.go:106] evicting pod openshift-operator-lifecycle-manager/collect-profiles-27555330-8cbwp


The logs say "initiating drain", but it triggers a cordon operation instead or draining.

Expected results:
"initiating drain" log should be printed when the drain in the nodes is actually executed.


Additional info:

Comment 6 errata-xmlrpc 2022-08-10 11:14:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 7 Red Hat Bugzilla 2023-09-15 01:55:11 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days