Description of problem: When a machine watched by the MHC is considered to need remediation, deletion requests and events are recorded unconditionally in loop until the machine is gone resulting in event spam. This should happen only once. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. 2. 3. Actual results: Events spam. E.g. Jun 04 22:49:48.053 I ns/openshift-machine-api machine/ci-op-dgz95p2d-067ff-pxnzv-worker-us-east-1b-7rvmd reason/MachineDeleted Machine openshift-machine-api/e2e-health-check-worker/ci-op-dgz95p2d-067ff-pxnzv-worker-us-east-1b-7rvmd/ip-10-0-171-88.ec2.internal has been remediated by requesting to delete Machine object Jun 04 22:49:48.087 I ns/openshift-machine-api machine/ci-op-dgz95p2d-067ff-pxnzv-worker-us-east-1b-7rvmd reason/MachineDeleted Machine openshift-machine-api/e2e-health-check-worker/ci-op-dgz95p2d-067ff-pxnzv-worker-us-east-1b-7rvmd/ip-10-0-171-88.ec2.internal has been remediated by requesting to delete Machine object (2 times) Jun 04 22:49:48.123 I ns/openshift-machine-api machine/ci-op-dgz95p2d-067ff-pxnzv-worker-us-east-1b-7rvmd reason/MachineDeleted Machine openshift-machine-api/e2e-health-check-worker/ci-op-dgz95p2d-067ff-pxnzv-worker-us-east-1b-7rvmd/ip-10-0-171-88.ec2.internal has been remediated by requesting to delete Machine object (3 times) Jun 04 22:49:48.124 I ns/openshift-machine-api machine/ci-op-dgz95p2d-067ff-pxnzv-worker-us-east-1b-qrmgl reason/DetectedUnhealthy Machine openshift-machine-api/e2e-health- Expected results: Record event for deletion only once. Additional info:
Planning to address this during upcomingSprint.
I've raised a PR that will check if the Machine has been deleted before attempting to delete it and skips sending the delete call and the delete event if the Machine has already been deleted
[miyadav@miyadav ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-06-30-020342 True False 106m Cluster version is 4.6.0-0.nightly-2020-06-30-020342 Steps: 1.Create mhc using below : apiVersion: "machine.openshift.io/v1beta1" kind: "MachineHealthCheck" metadata: name: mhc1 namespace: openshift-machine-api spec: selector: matchLabels: machine.openshift.io/cluster-api-cluster: miyadav-0702-jwtwh machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: miyadav-0702-jwtwh-worker-us-east-2a unhealthyConditions: - type: Ready status: "False" timeout: 300s - type: Ready status: Unknown timeout: 300s maxUnhealthy: 3 Actual & Expected , mhc created successfully 2.Delete machine of the machineset getting monitored Actual and expected : machine deleted successfully 3. Monitor logs : oc logs machine-api-controllers-6f467bfc-ndzqj -c machine-controller | grep delete . . I0702 03:53:55.422842 1 controller.go:217] miyadav-0702-jwtwh-worker-us-east-2a-7zq5s: reconciling machine triggers delete I0702 03:53:55.999127 1 controller.go:217] miyadav-0702-jwtwh-worker-us-east-2a-7zq5s: reconciling machine triggers delete I0702 03:54:25.987396 1 controller.go:217] miyadav-0702-jwtwh-worker-us-east-2a-7zq5s: reconciling machine triggers delete I0702 03:54:30.823811 1 controller.go:217] miyadav-0702-jwtwh-worker-us-east-2a-7zq5s: reconciling machine triggers delete I0702 03:54:56.371037 1 controller.go:217] miyadav-0702-jwtwh-worker-us-east-2a-7zq5s: reconciling machine triggers delete I0702 03:54:56.377749 1 controller.go:346] Could not find node from noderef, it may have already been deleted: ip-10-0-148-159.us-east-2.compute.internal W0702 03:54:56.543267 1 reconciler.go:108] miyadav-0702-jwtwh-worker-us-east-2a-7zq5s: no instances found to delete for machine As Expected 4. oc get events : [miyadav@miyadav ~]$ oc get events | grep requesting 111m Normal MachineDeleted machine/miyadav-0702-jwtwh-worker-us-east-2a-nxbd9 Machine openshift-machine-api/mhc1/miyadav-0702-jwtwh-worker-us-east-2a-nxbd9/ip-10-0-131-176.us-east-2.compute.internal has been remediated by requesting to delete Machine object As expected Additional info logs of earlier version 4.4 . . I0702 04:47:22.324720 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete I0702 04:47:42.583364 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete I0702 04:48:04.348089 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete I0702 04:48:16.090063 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete I0702 04:48:22.684951 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete I0702 04:48:51.147782 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete W0702 04:48:51.477298 1 actuator.go:363] miyadav-07-ldn98-worker-us-east-2a-gffw5: no instances found to delete for machine I0702 04:48:52.596945 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete W0702 04:48:52.786056 1 actuator.go:363] miyadav-07-ldn98-worker-us-east-2a-gffw5: no instances found to delete for machine I0702 04:48:53.882622 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete W0702 04:48:54.078475 1 actuator.go:363] miyadav-07-ldn98-worker-us-east-2a-gffw5: no instances found to delete for machine I0702 04:48:55.173135 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete W0702 04:48:55.348079 1 actuator.go:363] miyadav-07-ldn98-worker-us-east-2a-gffw5: no instances found to delete for machine I0702 04:48:56.672863 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete I0702 04:48:56.678646 1 controller.go:332] Could not find node from noderef, it may have already been deleted: ip-10-0-151-143.us-east-2.compute.internal W0702 04:48:56.776967 1 actuator.go:363] miyadav-07-ldn98-worker-us-east-2a-gffw5: no instances found to delete for machine I0702 04:48:57.877306 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete I0702 04:48:57.882742 1 controller.go:332] Could not find node from noderef, it may have already been deleted: ip-10-0-151-143.us-east-2.compute.internal W0702 04:48:57.998559 1 actuator.go:363] miyadav-07-ldn98-worker-us-east-2a-gffw5: no instances found to delete for machine I0702 04:48:59.094517 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete I0702 04:48:59.102316 1 controller.go:332] Could not find node from noderef, it may have already been deleted: ip-10-0-151-143.us-east-2.compute.internal W0702 04:48:59.189758 1 actuator.go:363] miyadav-07-ldn98-worker-us-east-2a-gffw5: no instances found to delete for machine I0702 04:49:00.290766 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete I0702 04:49:00.299654 1 controller.go:332] Could not find node from noderef, it may have already been deleted: ip-10-0-151-143.us-east-2.compute.internal W0702 04:49:00.389612 1 actuator.go:363] miyadav-07-ldn98-worker-us-east-2a-gffw5: no instances found to delete for machine I0702 04:49:01.495477 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete I0702 04:49:01.501752 1 controller.go:332] Could not find node from noderef, it may have already been deleted: ip-10-0-151-143.us-east-2.compute.internal W0702 04:49:01.610120 1 actuator.go:363] miyadav-07-ldn98-worker-us-east-2a-gffw5: no instances found to delete for machine I0702 04:49:02.984863 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete I0702 04:49:02.990204 1 controller.go:332] Could not find node from noderef, it may have already been deleted: ip-10-0-151-143.us-east-2.compute.internal W0702 04:49:03.104123 1 actuator.go:363] miyadav-07-ldn98-worker-us-east-2a-gffw5: no instances found to delete for machine I0702 04:49:05.820811 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete I0702 04:49:05.827009 1 controller.go:332] Could not find node from noderef, it may have already been deleted: ip-10-0-151-143.us-east-2.compute.internal W0702 04:49:05.931050 1 actuator.go:363] miyadav-07-ldn98-worker-us-east-2a-gffw5: no instances found to delete for machine I0702 04:49:11.164889 1 controller.go:203] miyadav-07-ldn98-worker-us-east-2a-gffw5: reconciling machine triggers delete I0702 04:49:11.173109 1 controller.go:332] Could not find node from noderef, it may have already been deleted: ip-10-0-151-143.us-east-2.compute.internal W0702 04:49:11.278721 1 actuator.go:363] miyadav-07-ldn98-worker-us-east-2a-gffw5: no instances found to delete for machine. . . Moved to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196