Bug 1947372

Summary: Openshift 4.5.8 Deleting pv disk vmdk after delete machine
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: Cloud ComputeAssignee: dmoiseev
Cloud Compute sub component: Other Providers QA Contact: Milind Yadav <miyadav>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: ableisch, amurdaca, aos-bugs, dmoiseev, ebrizuel, gbravi, gfontana, hekumar, jcallen, jsafrane, mkrejci, rkant, sreber, suchaudh
Version: 4.5Keywords: Reopened
Target Milestone: ---   
Target Release: 4.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, during machine deletion process vmdk's created for PV's and attached to the node might be deleted with the machine in case of unreachable kubelet, which was leading to unrecoverable data deletion. Now vSphere cloud provider checks and detach these disks from vm if kubelet not reachable, which allows to reattach it to different node and do not loose data on it.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-19 15:15:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1883993    
Bug Blocks: 1884643, 1947813    

Comment 4 Milind Yadav 2021-05-12 05:58:41 UTC
Validated on - 
[miyadav@miyadav vsphere]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-05-12-004740   True        False         46m     Cluster version is 4.7.0-0.nightly-2021-05-12-004740

Steps :
1. Create a pvc 

Expected Results :
[miyadav@miyadav vsphere]$ oc create -f pvc.yaml 
persistentvolumeclaim/pvc created
[miyadav@miyadav vsphere]$ oc get pvc 
oc NAME   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pvc    Bound    pvc-d03398c5-0ebe-427f-97fd-574b352969a8   1Gi        RWO            thin           17s
[miyadav@miyadav vsphere]$ oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                       STORAGECLASS   REASON   AGE
pvc-d03398c5-0ebe-427f-97fd-574b352969a8   1Gi        RWO            Delete           Bound    openshift-machine-api/pvc   thin                    20s

2. Create pod with volume mount 
Expected Results :
[miyadav@miyadav vsphere]$ oc get pods
NAME                                           READY   STATUS    RESTARTS   AGE
cluster-autoscaler-operator-7db79d7945-jptqq   2/2     Running   0          54m
cluster-baremetal-operator-5bcf7cd8fb-6r87l    1/1     Running   0          54m
machine-api-controllers-6bbc87b698-82mxf       7/7     Running   0          51m
machine-api-operator-57dd9d9b96-k2cw4          2/2     Running   0          54m
[miyadav@miyadav vsphere]$ oc  create -f podvm.yaml 
deployment.apps/dep1 created
[miyadav@miyadav vsphere]$ oc get pods 
NAME                                           READY   STATUS              RESTARTS   AGE
cluster-autoscaler-operator-7db79d7945-jptqq   2/2     Running             0          55m
cluster-baremetal-operator-5bcf7cd8fb-6r87l    1/1     Running             0          55m
dep1-64495756b4-tvql7                          0/1     ContainerCreating   0          6s
machine-api-controllers-6bbc87b698-82mxf       7/7     Running             0          51m
machine-api-operator-57dd9d9b96-k2cw4          2/2     Running             0          55m

Step3 
once the pod is Running , kill the kubelet(oc debug node/<nodename> , systemctl stop kubelet) of the node running the pod (we can get the node running the pod using - oc get pod -o wide) .
Expected Result:
Kubelet stopped successfully 

Step4. Delete the machine running the node 
Expected Result :
Machine deleted successfully 

Step5 
Monitor machine controller logs 
Expected Result :
Logs : 
.
.
.
I0512 05:25:58.080388       1 reconciler.go:258] miyadav-12-jvhvq-worker-7vktt: node not ready, kubelet unreachable for some reason. Detaching disks before vm destroy..
.
.


Step6.
Pod moved to another node and running successfully 
Expected and actual result :
[miyadav@miyadav vsphere]$ oc get pods -o wide
NAME                                           READY   STATUS    RESTARTS   AGE     IP            NODE                            NOMINATED NODE   READINESS GATES
cluster-autoscaler-operator-7db79d7945-jptqq   2/2     Running   0          63m     10.128.0.16   miyadav-12-jvhvq-master-2       <none>           <none>
cluster-baremetal-operator-5bcf7cd8fb-6r87l    1/1     Running   0          63m     10.129.0.3    miyadav-12-jvhvq-master-0       <none>           <none>
dep1-64495756b4-6rdft                          1/1     Running   0          4m22s   10.131.0.39   miyadav-12-jvhvq-worker-zc55p   <none>           <none>
machine-api-controllers-6bbc87b698-82mxf       7/7     Running   0          60m     10.129.0.12   miyadav-12-jvhvq-master-0       <none>           <none>
machine-api-operator-57dd9d9b96-k2cw4          2/2     Running   0          63m     10.128.0.8    miyadav-12-jvhvq-master-2       <none>           <none>


Additional Info:
Moved to VERIFIED

Comment 6 errata-xmlrpc 2021-05-19 15:15:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.11 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1550