Bug 2090358 - Initiating drain log message is displayed before the drain actually starts
Summary: Initiating drain log message is displayed before the drain actually starts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.11.0
Assignee: Yu Qi Zhang
QA Contact: Sergio
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-25 15:25 UTC by Sergio
Modified: 2023-09-15 01:55 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:14:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 3168 0 None open Bug 2090358: Move drain log message to when drain starts 2022-05-30 23:03:43 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:14:22 UTC

Description Sergio 2022-05-25 15:25:22 UTC
Description of problem:
In our functional tests we use the logs in the machine-config-controller pod to check certain events.

Currently the logs in the migration controller are printing the message:

"I0523 15:40:42.153153       1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: initiating drain"

But actually the controller starts a cordon operation instead of a drain operation.

It makes difficult for us to use the controller logs.


Version-Release number of MCO (Machine Config Operator) (if applicable):
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-05-20-213928   True        False         6h23m   Cluster version is 4.11.0-0.nightly-2022-05-20-213928

Platform (AWS, VSphere, Metal, etc.):

Are you certain that the root cause of the issue being reported is the MCO (Machine Config Operator)?
(Y/N/Not sure):

How reproducible:
Always

Did you catch this issue by running a Jenkins job? If yes, please list:
1. Jenkins job:

2. Profile:

Steps to Reproduce:
1. Create a MachineConfig that triggers a cordon+drain process in the nodes
cat << EOF | oc create -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-drain-maxunavail
spec:
  config:
    ignition:
      version: 3.2.0
  kernelArguments:
    - quiet
  kernelType: realtime
EOF



Actual results:
The machine-config controller pod will show this logs for every node:

I0523 15:40:42.153153       1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: initiating drain   <<<<<<<<<--- HERE IT SAYS THE DRAIN BEGINS BUT IT'S NOT TRUE
I0523 15:40:42.153250       1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: cordoning  <<<<<<<<<<<-- HERE WE CORDON. WE REALLY START CORDONING.
I0523 15:40:42.153276       1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: initiating cordon (currently schedulable: true)
I0523 15:40:42.179749       1 drain_controller.go:141] node ip-10-0-137-171.us-east-2.compute.internal: cordon succeeded (currently schedulable: false)
I0523 15:40:42.196015       1 node_controller.go:446] Pool worker[zone=us-east-2a]: node ip-10-0-137-171.us-east-2.compute.internal: changed taints
E0523 15:40:42.829374       1 drain_controller.go:106] WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-csi-drivers/aws-ebs-csi-driver-node-tdqtp, openshift-cluster-node-tuning-operator/tuned-rkxz4, openshift-dns/dns-default-spqzk, openshift-dns/node-resolver-d8q99, openshift-image-registry/node-ca-jxs4m, openshift-ingress-canary/ingress-canary-pqdnd, openshift-machine-config-operator/machine-config-daemon-snbvd, openshift-monitoring/node-exporter-r2brt, openshift-multus/multus-additional-cni-plugins-kmbns, openshift-multus/multus-hccmq, openshift-multus/network-metrics-daemon-tm5k7, openshift-network-diagnostics/network-check-target-rfm9r, openshift-ovn-kubernetes/ovnkube-node-fkhnj   <<<<<<<<<---- HERE WE START ACTUALLY DRAINING
I0523 15:40:42.830596       1 drain_controller.go:106] evicting pod openshift-operator-lifecycle-manager/collect-profiles-27555330-8cbwp


The logs say "initiating drain", but it triggers a cordon operation instead or draining.

Expected results:
"initiating drain" log should be printed when the drain in the nodes is actually executed.


Additional info:

Comment 6 errata-xmlrpc 2022-08-10 11:14:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 7 Red Hat Bugzilla 2023-09-15 01:55:11 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.