+++ This bug was initially created as a clone of Bug #1800314 +++ Deleting the Node object in Kubernetes signals to the cluster that no pod on that node is running and that it is safe to release any storage or process locks that ensure two processes can't be running on different nodes with the same name or volumes. The machine controller was deleting the node before the machine was fully terminated, which means that a stateful set controller would be able to launch two pods with the same name running on the cluster at the same time, which violates our cluster safety guarantees. Fix is to wait for machine is confirmed shut down by cloud provider before deleting the Node object.
Verified Tested on Azure, aws, gcp the timestamp shows node is deleted after the instance is deleted. clusterversion: 4.3.5 Azure I0309 09:23:58.764338 1 actuator.go:135] Deleting machine zhsunazure-vn5cs-worker-centralus3-t6sqz I0309 09:23:58.764943 1 virtualmachines.go:225] deleting vm zhsunazure-vn5cs-worker-centralus3-t6sqz I0309 09:23:58.998472 1 virtualmachines.go:242] successfully deleted vm zhsunazure-vn5cs-worker-centralus3-t6sqz I0309 09:23:58.998498 1 disks.go:49] deleting disk zhsunazure-vn5cs-worker-centralus3-t6sqz_OSDisk I0309 09:23:59.029511 1 disks.go:65] successfully deleted disk zhsunazure-vn5cs-worker-centralus3-t6sqz_OSDisk I0309 09:23:59.029540 1 networkinterfaces.go:215] deleting nic zhsunazure-vn5cs-worker-centralus3-t6sqz-nic I0309 09:24:09.226004 1 networkinterfaces.go:234] successfully deleted nic zhsunazure-vn5cs-worker-centralus3-t6sqz-nic I0309 09:24:09.226070 1 machine_scope.go:180] zhsunazure-vn5cs-worker-centralus3-t6sqz: status unchanged I0309 09:24:09.239968 1 controller.go:254] Deleting node "zhsunazure-vn5cs-worker-centralus3-t6sqz" for machine "zhsunazure-vn5cs-worker-centralus3-t6sqz" I0309 09:24:09.240456 1 reflector.go:120] Starting reflector *v1.Node (10h0m0s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:204 I0309 09:24:09.240491 1 reflector.go:158] Listing and watching *v1.Node from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:204 I0309 09:24:09.398866 1 controller.go:268] Machine "zhsunazure-vn5cs-worker-centralus3-t6sqz" deletion successful AWS I0309 09:40:36.447595 1 actuator.go:338] zhsunaws-9mm2f-worker-us-east-2b-jc9d2: deleting machine I0309 09:40:36.526295 1 actuator.go:551] zhsunaws-9mm2f-worker-us-east-2b-jc9d2: Found instance by id: i-0d202b31e33653a48 I0309 09:40:36.526321 1 actuator.go:380] zhsunaws-9mm2f-worker-us-east-2b-jc9d2: found 1 existing instances for machine I0309 09:40:36.526332 1 utils.go:196] Cleaning up extraneous instance for machine: i-0d202b31e33653a48, state: shutting-down, launchTime: 2020-03-09 07:50:51 +0000 UTC I0309 09:40:36.526349 1 utils.go:200] Terminating i-0d202b31e33653a48 instance I0309 09:40:36.636576 1 controller.go:254] Deleting node "ip-10-0-151-64.us-east-2.compute.internal" for machine "zhsunaws-9mm2f-worker-us-east-2b-jc9d2" I0309 09:40:36.636617 1 controller.go:397] Node "ip-10-0-151-64.us-east-2.compute.internal" not found I0309 09:40:36.651471 1 controller.go:268] Machine "zhsunaws-9mm2f-worker-us-east-2b-jc9d2" deletion successful gcp I0309 15:08:46.989893 1 info.go:20] drained node "zhsun8-2j9sv-w-b-r2lj5.c.openshift-qe.internal" I0309 15:08:46.989906 1 controller.go:342] drain successful for machine "zhsun8-2j9sv-w-b-r2lj5" I0309 15:08:46.989939 1 actuator.go:118] zhsun8-2j9sv-w-b-r2lj5: Deleting machine I0309 15:08:47.223762 1 reconciler.go:302] zhsun8-2j9sv-w-b-r2lj5: Machine does not exist I0309 15:08:47.223788 1 reconciler.go:319] zhsun8-2j9sv-w-b-r2lj5: Machine not found during delete, skipping I0309 15:08:47.223829 1 controller.go:228] Deleting node "zhsun8-2j9sv-w-b-r2lj5.c.openshift-qe.internal" for machine "zhsun8-2j9sv-w-b-r2lj5" I0309 15:08:47.224343 1 reflector.go:120] Starting reflector *v1.Node (10h0m0s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:204 I0309 15:08:47.224370 1 reflector.go:158] Listing and watching *v1.Node from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:204 I0309 15:08:47.359139 1 controller.go:242] Machine "zhsun8-2j9sv-w-b-r2lj5" deletion successful
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0858