Bug 1800316

Summary:	Machine controller must wait for node to be fully terminated before deleting Kube Node object
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Cloud Compute	Assignee:	Alexander Demicev <ademicev>
Cloud Compute sub component:	Other Providers	QA Contact:	Jianwei Hou <jhou>
Status:	CLOSED WONTFIX	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	agarcial, jhou, vlaad
Version:	4.2.z
Target Milestone:	---
Target Release:	4.2.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1800315
Clones:	1800317 (view as bug list)		Environment:
Last Closed:	2020-07-27 13:45:15 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1800314, 1800315
Bug Blocks:	1800317

Description Clayton Coleman 2020-02-06 20:55:25 UTC

+++ This bug was initially created as a clone of Bug #1800315 +++

+++ This bug was initially created as a clone of Bug #1800314 +++

Deleting the Node object in Kubernetes signals to the cluster that no pod on that node is running and that it is safe to release any storage or process locks that ensure two processes can't be running on different nodes with the same name or volumes.  The machine controller was deleting the node before the machine was fully terminated, which means that a stateful set controller would be able to launch two pods with the same name running on the cluster at the same time, which violates our cluster safety guarantees.

Fix is to wait for machine is confirmed shut down by cloud provider before deleting the Node object.