Bug 1800317 - Machine controller must wait for node to be fully terminated before deleting Kube Node object
Summary: Machine controller must wait for node to be fully terminated before deleting ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.1.z
Assignee: Alexander Demicev
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On: 1800314 1800315 1800316
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-06 20:56 UTC by Clayton Coleman
Modified: 2020-05-19 03:59 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1800316
Environment:
Last Closed: 2020-05-19 03:13:15 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Clayton Coleman 2020-02-06 20:56:49 UTC
+++ This bug was initially created as a clone of Bug #1800316 +++

+++ This bug was initially created as a clone of Bug #1800315 +++

+++ This bug was initially created as a clone of Bug #1800314 +++

Deleting the Node object in Kubernetes signals to the cluster that no pod on that node is running and that it is safe to release any storage or process locks that ensure two processes can't be running on different nodes with the same name or volumes.  The machine controller was deleting the node before the machine was fully terminated, which means that a stateful set controller would be able to launch two pods with the same name running on the cluster at the same time, which violates our cluster safety guarantees.

Fix is to wait for machine is confirmed shut down by cloud provider before deleting the Node object.

Backport to 4.1 is justified because this can cause data loss or break stateful applications.

Comment 1 Michael Gugino 2020-05-19 03:59:33 UTC
This is fixed in 4.3 and 4.4.


Note You need to log in before you can comment on or make changes to this bug.