Bug 1800316

Summary: Machine controller must wait for node to be fully terminated before deleting Kube Node object
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: Cloud ComputeAssignee: Alexander Demicev <ademicev>
Cloud Compute sub component: Other Providers QA Contact: Jianwei Hou <jhou>
Status: CLOSED WONTFIX Docs Contact:
Severity: high    
Priority: unspecified CC: agarcial, jhou, vlaad
Version: 4.2.z   
Target Milestone: ---   
Target Release: 4.2.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1800315
: 1800317 (view as bug list) Environment:
Last Closed: 2020-07-27 13:45:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1800314, 1800315    
Bug Blocks: 1800317    

Description Clayton Coleman 2020-02-06 20:55:25 UTC
+++ This bug was initially created as a clone of Bug #1800315 +++

+++ This bug was initially created as a clone of Bug #1800314 +++

Deleting the Node object in Kubernetes signals to the cluster that no pod on that node is running and that it is safe to release any storage or process locks that ensure two processes can't be running on different nodes with the same name or volumes.  The machine controller was deleting the node before the machine was fully terminated, which means that a stateful set controller would be able to launch two pods with the same name running on the cluster at the same time, which violates our cluster safety guarantees.

Fix is to wait for machine is confirmed shut down by cloud provider before deleting the Node object.