Bug 1883993
Summary: | Openshift 4.5.8 Deleting pv disk vmdk after delete machine | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | hgomes | |
Component: | Cloud Compute | Assignee: | dmoiseev | |
Cloud Compute sub component: | Other Providers | QA Contact: | Milind Yadav <miyadav> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | high | CC: | ableisch, amurdaca, ansverma, aos-bugs, dmoiseev, ebrizuel, gbravi, gfontana, hekumar, jcallen, jsafrane, mkrejci, rkant, sreber, suchaudh | |
Version: | 4.5 | Keywords: | Reopened | |
Target Milestone: | --- | Flags: | jcallen:
needinfo-
jcallen: needinfo- |
|
Target Release: | 4.8.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Previously, during machine deletion process vmdk's created for PV's and attached to the node might be deleted with the machine in case of unreachable kubelet, which was leading to unrecoverable data deletion.
Now vSphere cloud provider checks and detach these disks from vm if kubelet not reachable, which allows to reattach it to different node and do not loose data on it.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1884643 (view as bug list) | Environment: | ||
Last Closed: | 2021-07-27 22:33:30 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1884643, 1947372 |
Description
hgomes
2020-09-30 17:18:49 UTC
Kumar, I was wondering if the following use case could be implemented in the machine-api-operator in order to avoid accidental pv removal during machine deleting process: - Before removing a VM on vsphere check all vmdks attached in the VM. - If there is any vmdk that comes from PVs, machine-api will try to detach this volume from the VM (before VM deletion). - If VMDKs are detached successfully the machine-api-operator continues to remove the VM. - If any VMDK is not detached for some reason, machine-api fails to remove the machine with some intuitive message like "Failed to delete the VM due failure to detach one or more vmdks from Persistent Volume: VMDK: <VMDK_NAME> - Error: <MSG FROM VMWARE>" Do you think this is feasible? I would like to propose this use case for implementation. Do you know how to make a proposal for a enhancement in the machine-api-operator? Thanks a lot! Appreciate that! Regards, Giovanni Fontana Did some testing today with 4.6 nightly (but these parts have not changed for quite some time): 1. When a pod that uses a volume attached to a shutdown node is deleted, Kubernetes waits forever for kubelet to confirm the volume has been unmounted (to prevent data corruption). 1. When such pod is deleted with force, "oc delete pod --force", the volume is detached after ~6 minutes and a new pod can start. So please delete pods with force and be patient. 2. "oc adm drain <node>" does not drain the nodes with force. It waits forever for such a pod to get deleted. So, from storage point of view (vmware volume plugin / attach-detach controller), the system works as designed and prevents data corruption. On overall OCP level, something (MCO?) could force-delete pods on nodes that are confirmed to be shut down in the cloud - MCO has / could have such knowledge, while Kubernetes/kube-controller-manager does not. I've been thinking about this and there is not much we can do on the storage side. Kubernetes attach/detach controller does not know what's the status of the VM in vSphere and it protects data from corruption, i.e. never force-detaches volumes from nodes unless either the Node object is deleted or Pod is force-deleted (Terminating is not enough, as kubelet must confirm the volumes are unmounted and it's not running at that time). There is already a bug to track documentation changes ("force delete pods before deleting VMs", #1884643). Leaving to MCO team to judge if they can initiate force-delete pods from a node before removing it from vSphere or force-detaching volumes from it. Still, users can go to vSphere console directly and delete VMs manually, potentially loosing their data. And there is nothing we can do about that. (In reply to Jan Safranek from comment #13) > I've been thinking about this and there is not much we can do on the storage > side. Kubernetes attach/detach controller does not know what's the status of > the VM in vSphere and it protects data from corruption, i.e. never > force-detaches volumes from nodes unless either the Node object is deleted > or Pod is force-deleted (Terminating is not enough, as kubelet must confirm > the volumes are unmounted and it's not running at that time). > > There is already a bug to track documentation changes ("force delete pods > before deleting VMs", #1884643). > > Leaving to MCO team to judge if they can initiate force-delete pods from a > node before removing it from vSphere or force-detaching volumes from it. > Still, users can go to vSphere console directly and delete VMs manually, > potentially loosing their data. And there is nothing we can do about that. The MCO wouldn't force-delete a pod as we also want to avoid any data corruption so until kube tells us it's gone, we still loop the drain. I understand the corruption shouldn't be a problem but the MCO won't grow the knoledge of force deleting a pod (in the short term) and it really sounds like both MCO and kube are doing the right thing but vsphere. Hello, Did you get a chance to check the above comment made by Sudarshan? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |