Bug 1878880

Summary: Misleading log message when deleting Node
Product: OpenShift Container Platform Reporter: Zane Bitter <zbitter>
Component: Cloud ComputeAssignee: Zane Bitter <zbitter>
Cloud Compute sub component: Other Providers QA Contact: sunzhaohua <zhsun>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: mimccune
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:40:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zane Bitter 2020-09-14 18:31:18 UTC
When the Machine controller deletes a Node, the log messages are confusing due to the parameters being passed in the wrong order. Log messages should be in the form:


  <Machine name>: ...

but when deleting a Node you see one in the form:

  <Node name>: deleting node <Machine name> for machine

This is independent of the platform in use.

Comment 2 sunzhaohua 2020-09-18 09:31:18 UTC
Verify failed, still is <Node name>: deleting node <Machine name> for machine
clusterversion: 4.6.0-0.nightly-2020-09-18-002612

$ oc get node
NAME                                                       STATUS   ROLES    AGE    VERSION
zhsungcp918-cd2rn-master-0.c.openshift-qe.internal         Ready    master   137m   v1.19.0+b4ffb45
zhsungcp918-cd2rn-master-1.c.openshift-qe.internal         Ready    master   137m   v1.19.0+b4ffb45
zhsungcp918-cd2rn-master-2.c.openshift-qe.internal         Ready    master   138m   v1.19.0+b4ffb45
zhsungcp918-cd2rn-worker-a-zkgff.c.openshift-qe.internal   Ready    worker   128m   v1.19.0+b4ffb45
zhsungcp918-cd2rn-worker-b-r64rd.c.openshift-qe.internal   Ready    worker   128m   v1.19.0+b4ffb45
zhsungcp918-cd2rn-worker-c-pnmw7.c.openshift-qe.internal   Ready    worker   128m   v1.19.0+b4ffb45

$ oc get machine
NAME                               PHASE     TYPE            REGION        ZONE            AGE
zhsungcp918-cd2rn-master-0         Running   n1-standard-4   us-central1   us-central1-a   142m
zhsungcp918-cd2rn-master-1         Running   n1-standard-4   us-central1   us-central1-b   142m
zhsungcp918-cd2rn-master-2         Running   n1-standard-4   us-central1   us-central1-c   142m
zhsungcp918-cd2rn-worker-a-zkgff   Running   n1-standard-4   us-central1   us-central1-a   131m
zhsungcp918-cd2rn-worker-b-r64rd   Running   n1-standard-4   us-central1   us-central1-b   131m
zhsungcp918-cd2rn-worker-c-pnmw7   Running   n1-standard-4   us-central1   us-central1-c   131m

$ oc delete machine zhsungcp918-cd2rn-worker-c-pnmw7
machine.machine.openshift.io "zhsungcp918-cd2rn-worker-c-pnmw7" deleted

I0918 09:04:00.936404       1 controller.go:247] zhsungcp918-cd2rn-worker-c-pnmw7.c.openshift-qe.internal: deleting node "zhsungcp918-cd2rn-worker-c-pnmw7" for machine

Comment 3 Zane Bitter 2020-09-18 14:34:42 UTC
The Machine controller is vendored in each cluster-api-provider's tree, so this is only fixed for vSphere (which is in-tree in MAO). Other providers will get the fix as they update their vendored packages.

Comment 4 Michael McCune 2020-09-22 16:43:45 UTC
i am working on proposing pull requests for the other providers now. i should have these posted today.

Comment 5 Joel Speed 2020-10-02 10:05:16 UTC
All PRs are now merged, the GCP PR was replaced by github.com/openshift/cluster-api-provider-gcp/pull/122 but it is linked to a separate BZ

Comment 8 sunzhaohua 2020-10-09 07:50:41 UTC
verified
clusterversion: 4.6.0-0.nightly-2020-10-08-210814
aws:
I1009 05:26:34.314348       1 controller.go:248] zhsun109aws-tsn5h-worker-us-east-2c-frj7w: deleting node "ip-10-0-206-48.us-east-2.compute.internal" for machine
gcp:
$ oc logs machine-api-controllers-65fbf64c45-nk55m -c machine-controller | grep deleting
I1009 06:36:04.994387       1 controller.go:248] zhsun109gcp-m2fw5-worker-c-dwvsb: deleting node "zhsun109gcp-m2fw5-worker-c-dwvsb.c.openshift-qe.internal" for machine
azure:
I1009 06:43:20.246529       1 controller.go:248] zhsun109az-hnqvv-worker-northcentralus-rwrv8: deleting node "zhsun109az-hnqvv-worker-northcentralus-rwrv8" for machine
ocp:
$ oc logs machine-api-controllers-65fbf64c45-nk55m -c machine-controller | grep deleting
I1009 06:36:04.994387       1 controller.go:248] zhsun109gcp-m2fw5-worker-c-dwvsb: deleting node "zhsun109gcp-m2fw5-worker-c-dwvsb.c.openshift-qe.internal" for machine
vshpere:
I1009 07:49:44.218593       1 controller.go:248] zhsun109vsp-5vlnn-worker-l5rlm: deleting node "zhsun109vsp-5vlnn-worker-l5rlm" for machine

Comment 10 errata-xmlrpc 2020-10-27 16:40:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196