1947372 – Openshift 4.5.8 Deleting pv disk vmdk after delete machine

Bug 1947372 - Openshift 4.5.8 Deleting pv disk vmdk after delete machine

Summary: Openshift 4.5.8 Deleting pv disk vmdk after delete machine

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	4.7.z
Assignee:	dmoiseev
QA Contact:	Milind Yadav
Docs Contact:
URL:
Whiteboard:
Depends On:	1883993
Blocks:	1884643 1947813
TreeView+	depends on / blocked

Reported:	2021-04-08 10:24 UTC by OpenShift BugZilla Robot
Modified:	2024-10-01 17:52 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, during machine deletion process vmdk's created for PV's and attached to the node might be deleted with the machine in case of unreachable kubelet, which was leading to unrecoverable data deletion. Now vSphere cloud provider checks and detach these disks from vm if kubelet not reachable, which allows to reattach it to different node and do not loose data on it.
Clone Of:
Environment:
Last Closed:	2021-05-19 15:15:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-api-operator pull 841	0	None	open	[release-4.7] Bug 1947372: vSphere, detach virtual disks before virtual machine destroy if node not available	2021-04-08 10:25:25 UTC
Red Hat Product Errata	RHBA-2021:1550	0	None	None	None	2021-05-19 15:16:14 UTC

Comment 4 Milind Yadav 2021-05-12 05:58:41 UTC

Validated on - 
[miyadav@miyadav vsphere]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-05-12-004740   True        False         46m     Cluster version is 4.7.0-0.nightly-2021-05-12-004740

Steps :
1. Create a pvc 

Expected Results :
[miyadav@miyadav vsphere]$ oc create -f pvc.yaml 
persistentvolumeclaim/pvc created
[miyadav@miyadav vsphere]$ oc get pvc 
oc NAME   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pvc    Bound    pvc-d03398c5-0ebe-427f-97fd-574b352969a8   1Gi        RWO            thin           17s
[miyadav@miyadav vsphere]$ oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                       STORAGECLASS   REASON   AGE
pvc-d03398c5-0ebe-427f-97fd-574b352969a8   1Gi        RWO            Delete           Bound    openshift-machine-api/pvc   thin                    20s

2. Create pod with volume mount 
Expected Results :
[miyadav@miyadav vsphere]$ oc get pods
NAME                                           READY   STATUS    RESTARTS   AGE
cluster-autoscaler-operator-7db79d7945-jptqq   2/2     Running   0          54m
cluster-baremetal-operator-5bcf7cd8fb-6r87l    1/1     Running   0          54m
machine-api-controllers-6bbc87b698-82mxf       7/7     Running   0          51m
machine-api-operator-57dd9d9b96-k2cw4          2/2     Running   0          54m
[miyadav@miyadav vsphere]$ oc  create -f podvm.yaml 
deployment.apps/dep1 created
[miyadav@miyadav vsphere]$ oc get pods 
NAME                                           READY   STATUS              RESTARTS   AGE
cluster-autoscaler-operator-7db79d7945-jptqq   2/2     Running             0          55m
cluster-baremetal-operator-5bcf7cd8fb-6r87l    1/1     Running             0          55m
dep1-64495756b4-tvql7                          0/1     ContainerCreating   0          6s
machine-api-controllers-6bbc87b698-82mxf       7/7     Running             0          51m
machine-api-operator-57dd9d9b96-k2cw4          2/2     Running             0          55m

Step3 
once the pod is Running , kill the kubelet(oc debug node/<nodename> , systemctl stop kubelet) of the node running the pod (we can get the node running the pod using - oc get pod -o wide) .
Expected Result:
Kubelet stopped successfully 

Step4. Delete the machine running the node 
Expected Result :
Machine deleted successfully 

Step5 
Monitor machine controller logs 
Expected Result :
Logs : 
.
.
.
I0512 05:25:58.080388       1 reconciler.go:258] miyadav-12-jvhvq-worker-7vktt: node not ready, kubelet unreachable for some reason. Detaching disks before vm destroy..
.
.


Step6.
Pod moved to another node and running successfully 
Expected and actual result :
[miyadav@miyadav vsphere]$ oc get pods -o wide
NAME                                           READY   STATUS    RESTARTS   AGE     IP            NODE                            NOMINATED NODE   READINESS GATES
cluster-autoscaler-operator-7db79d7945-jptqq   2/2     Running   0          63m     10.128.0.16   miyadav-12-jvhvq-master-2       <none>           <none>
cluster-baremetal-operator-5bcf7cd8fb-6r87l    1/1     Running   0          63m     10.129.0.3    miyadav-12-jvhvq-master-0       <none>           <none>
dep1-64495756b4-6rdft                          1/1     Running   0          4m22s   10.131.0.39   miyadav-12-jvhvq-worker-zc55p   <none>           <none>
machine-api-controllers-6bbc87b698-82mxf       7/7     Running   0          60m     10.129.0.12   miyadav-12-jvhvq-master-0       <none>           <none>
machine-api-operator-57dd9d9b96-k2cw4          2/2     Running   0          63m     10.128.0.8    miyadav-12-jvhvq-master-2       <none>           <none>


Additional Info:
Moved to VERIFIED

Comment 6 errata-xmlrpc 2021-05-19 15:15:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.11 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1550

Note You need to log in before you can comment on or make changes to this bug.