Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1800317

Summary:	Machine controller must wait for node to be fully terminated before deleting Kube Node object
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Cloud Compute	Assignee:	Alexander Demicev <ademicev>
Cloud Compute sub component:	Other Providers	QA Contact:	Jianwei Hou <jhou>
Status:	CLOSED WONTFIX	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	agarcial, jhou, mgugino, wking
Version:	4.1.z
Target Milestone:	---
Target Release:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1800316	Environment:
Last Closed:	2020-05-19 03:13:15 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1800314, 1800315, 1800316
Bug Blocks:

Description Clayton Coleman 2020-02-06 20:56:49 UTC

+++ This bug was initially created as a clone of Bug #1800316 +++

+++ This bug was initially created as a clone of Bug #1800315 +++

+++ This bug was initially created as a clone of Bug #1800314 +++

Deleting the Node object in Kubernetes signals to the cluster that no pod on that node is running and that it is safe to release any storage or process locks that ensure two processes can't be running on different nodes with the same name or volumes.  The machine controller was deleting the node before the machine was fully terminated, which means that a stateful set controller would be able to launch two pods with the same name running on the cluster at the same time, which violates our cluster safety guarantees.

Fix is to wait for machine is confirmed shut down by cloud provider before deleting the Node object.

Backport to 4.1 is justified because this can cause data loss or break stateful applications.

Comment 1 Michael Gugino 2020-05-19 03:59:33 UTC

This is fixed in 4.3 and 4.4.