2090358 – Initiating drain log message is displayed before the drain actually starts

Bug 2090358 - Initiating drain log message is displayed before the drain actually starts

Summary: Initiating drain log message is displayed before the drain actually starts

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Yu Qi Zhang
QA Contact:	Sergio
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-05-25 15:25 UTC by Sergio
Modified:	2023-09-15 01:55 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 11:14:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 3168	0	None	open	Bug 2090358: Move drain log message to when drain starts	2022-05-30 23:03:43 UTC
Red Hat Product Errata	RHSA-2022:5069	0	None	None	None	2022-08-10 11:14:22 UTC

Description Sergio 2022-05-25 15:25:22 UTC

Description of problem:
In our functional tests we use the logs in the machine-config-controller pod to check certain events.

Currently the logs in the migration controller are printing the message:

"I0523 15:40:42.153153       1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: initiating drain"

But actually the controller starts a cordon operation instead of a drain operation.

It makes difficult for us to use the controller logs.


Version-Release number of MCO (Machine Config Operator) (if applicable):
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-05-20-213928   True        False         6h23m   Cluster version is 4.11.0-0.nightly-2022-05-20-213928

Platform (AWS, VSphere, Metal, etc.):

Are you certain that the root cause of the issue being reported is the MCO (Machine Config Operator)?
(Y/N/Not sure):

How reproducible:
Always

Did you catch this issue by running a Jenkins job? If yes, please list:
1. Jenkins job:

2. Profile:

Steps to Reproduce:
1. Create a MachineConfig that triggers a cordon+drain process in the nodes
cat << EOF | oc create -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-drain-maxunavail
spec:
  config:
    ignition:
      version: 3.2.0
  kernelArguments:
    - quiet
  kernelType: realtime
EOF



Actual results:
The machine-config controller pod will show this logs for every node:

I0523 15:40:42.153153       1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: initiating drain   <<<<<<<<<--- HERE IT SAYS THE DRAIN BEGINS BUT IT'S NOT TRUE
I0523 15:40:42.153250       1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: cordoning  <<<<<<<<<<<-- HERE WE CORDON. WE REALLY START CORDONING.
I0523 15:40:42.153276       1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: initiating cordon (currently schedulable: true)
I0523 15:40:42.179749       1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: cordon succeeded (currently schedulable: false)
I0523 15:40:42.196015       1 node_controller.go:446] Pool worker[zone=us-east-2a]: node ip-10-0-137-171.us-east-2.compute.internal: changed taints
E0523 15:40:42.829374       1 drain_controller.go:106] WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-csi-drivers/aws-ebs-csi-driver-node-tdqtp, openshift-cluster-node-tuning-operator/tuned-rkxz4, openshift-dns/dns-default-spqzk, openshift-dns/node-resolver-d8q99, openshift-image-registry/node-ca-jxs4m, openshift-ingress-canary/ingress-canary-pqdnd, openshift-machine-config-operator/machine-config-daemon-snbvd, openshift-monitoring/node-exporter-r2brt, openshift-multus/multus-additional-cni-plugins-kmbns, openshift-multus/multus-hccmq, openshift-multus/network-metrics-daemon-tm5k7, openshift-network-diagnostics/network-check-target-rfm9r, openshift-ovn-kubernetes/ovnkube-node-fkhnj   <<<<<<<<<---- HERE WE START ACTUALLY DRAINING
I0523 15:40:42.830596       1 drain_controller.go:106] evicting pod openshift-operator-lifecycle-manager/collect-profiles-27555330-8cbwp


The logs say "initiating drain", but it triggers a cordon operation instead or draining.

Expected results:
"initiating drain" log should be printed when the drain in the nodes is actually executed.


Additional info:

Comment 6 errata-xmlrpc 2022-08-10 11:14:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 7 Red Hat Bugzilla 2023-09-15 01:55:11 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days

Note You need to log in before you can comment on or make changes to this bug.