+++ This bug was initially created as a clone of Bug #1800314 +++
Deleting the Node object in Kubernetes signals to the cluster that no pod on that node is running and that it is safe to release any storage or process locks that ensure two processes can't be running on different nodes with the same name or volumes. The machine controller was deleting the node before the machine was fully terminated, which means that a stateful set controller would be able to launch two pods with the same name running on the cluster at the same time, which violates our cluster safety guarantees.
Fix is to wait for machine is confirmed shut down by cloud provider before deleting the Node object.
Tested on Azure, aws, gcp the timestamp shows node is deleted after the instance is deleted.
I0309 09:23:58.764338 1 actuator.go:135] Deleting machine zhsunazure-vn5cs-worker-centralus3-t6sqz
I0309 09:23:58.764943 1 virtualmachines.go:225] deleting vm zhsunazure-vn5cs-worker-centralus3-t6sqz
I0309 09:23:58.998472 1 virtualmachines.go:242] successfully deleted vm zhsunazure-vn5cs-worker-centralus3-t6sqz
I0309 09:23:58.998498 1 disks.go:49] deleting disk zhsunazure-vn5cs-worker-centralus3-t6sqz_OSDisk
I0309 09:23:59.029511 1 disks.go:65] successfully deleted disk zhsunazure-vn5cs-worker-centralus3-t6sqz_OSDisk
I0309 09:23:59.029540 1 networkinterfaces.go:215] deleting nic zhsunazure-vn5cs-worker-centralus3-t6sqz-nic
I0309 09:24:09.226004 1 networkinterfaces.go:234] successfully deleted nic zhsunazure-vn5cs-worker-centralus3-t6sqz-nic
I0309 09:24:09.226070 1 machine_scope.go:180] zhsunazure-vn5cs-worker-centralus3-t6sqz: status unchanged
I0309 09:24:09.239968 1 controller.go:254] Deleting node "zhsunazure-vn5cs-worker-centralus3-t6sqz" for machine "zhsunazure-vn5cs-worker-centralus3-t6sqz"
I0309 09:24:09.240456 1 reflector.go:120] Starting reflector *v1.Node (10h0m0s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:204
I0309 09:24:09.240491 1 reflector.go:158] Listing and watching *v1.Node from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:204
I0309 09:24:09.398866 1 controller.go:268] Machine "zhsunazure-vn5cs-worker-centralus3-t6sqz" deletion successful
I0309 09:40:36.447595 1 actuator.go:338] zhsunaws-9mm2f-worker-us-east-2b-jc9d2: deleting machine
I0309 09:40:36.526295 1 actuator.go:551] zhsunaws-9mm2f-worker-us-east-2b-jc9d2: Found instance by id: i-0d202b31e33653a48
I0309 09:40:36.526321 1 actuator.go:380] zhsunaws-9mm2f-worker-us-east-2b-jc9d2: found 1 existing instances for machine
I0309 09:40:36.526332 1 utils.go:196] Cleaning up extraneous instance for machine: i-0d202b31e33653a48, state: shutting-down, launchTime: 2020-03-09 07:50:51 +0000 UTC
I0309 09:40:36.526349 1 utils.go:200] Terminating i-0d202b31e33653a48 instance
I0309 09:40:36.636576 1 controller.go:254] Deleting node "ip-10-0-151-64.us-east-2.compute.internal" for machine "zhsunaws-9mm2f-worker-us-east-2b-jc9d2"
I0309 09:40:36.636617 1 controller.go:397] Node "ip-10-0-151-64.us-east-2.compute.internal" not found
I0309 09:40:36.651471 1 controller.go:268] Machine "zhsunaws-9mm2f-worker-us-east-2b-jc9d2" deletion successful
I0309 15:08:46.989893 1 info.go:20] drained node "zhsun8-2j9sv-w-b-r2lj5.c.openshift-qe.internal"
I0309 15:08:46.989906 1 controller.go:342] drain successful for machine "zhsun8-2j9sv-w-b-r2lj5"
I0309 15:08:46.989939 1 actuator.go:118] zhsun8-2j9sv-w-b-r2lj5: Deleting machine
I0309 15:08:47.223762 1 reconciler.go:302] zhsun8-2j9sv-w-b-r2lj5: Machine does not exist
I0309 15:08:47.223788 1 reconciler.go:319] zhsun8-2j9sv-w-b-r2lj5: Machine not found during delete, skipping
I0309 15:08:47.223829 1 controller.go:228] Deleting node "zhsun8-2j9sv-w-b-r2lj5.c.openshift-qe.internal" for machine "zhsun8-2j9sv-w-b-r2lj5"
I0309 15:08:47.224343 1 reflector.go:120] Starting reflector *v1.Node (10h0m0s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:204
I0309 15:08:47.224370 1 reflector.go:158] Listing and watching *v1.Node from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:204
I0309 15:08:47.359139 1 controller.go:242] Machine "zhsun8-2j9sv-w-b-r2lj5" deletion successful
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.