Bug 1993120

Summary: [release-4.6] Volumes are accidentally deleted along with the machine [vsphere]
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: Cloud ComputeAssignee: Mike Fedosin <mfedosin>
Cloud Compute sub component: Other Providers QA Contact: Milind Yadav <miyadav>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high    
Version: 4.9   
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-03 21:01:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1993118    
Bug Blocks:    

Description OpenShift BugZilla Robot 2021-08-12 12:23:23 UTC
+++ This bug was initially created as a clone of Bug #1990432 +++

Description of problem:

Pod deletion and volume detach happen asynchronously, so pod could be deleted before volume detached from the node.

When deleting a machine, this could cause issues for vsphere-volume, because if the node deleted before volume detaching success, then the underline volume will be deleted together with the Machine.

Expected results: 

After machine deletion its volumes should remain untouched.

Related to https://bugzilla.redhat.com/show_bug.cgi?id=1883993

Upstream issue: https://github.com/kubernetes-sigs/cluster-api/issues/4707

Comment 5 Milind Yadav 2021-10-21 11:47:51 UTC
[miyadav@miyadav vsphere]$ oc project openshift-machine-api
Now using project "openshift-machine-api" on server "https://api.miyadav2110.qe.devcluster.openshift.com:6443".

[miyadav@miyadav vsphere]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2021-10-20-163353   True        False         8m14s   Cluster version is 4.6.0-0.nightly-2021-10-20-163353

[miyadav@miyadav vsphere]$ oc create -f pvc.yaml 
persistentvolumeclaim/pvc4 created
[miyadav@miyadav vsphere]$ oc create -f deploymentyaml.yaml 
deployment.apps/dep1 created

[miyadav@miyadav vsphere]$ oc get pods -o wide
NAME                                          READY   STATUS    RESTARTS   AGE   IP            NODE                             NOMINATED NODE   READINESS GATES
cluster-autoscaler-operator-d9546b555-fbml2   2/2     Running   1          45m   10.129.0.5    miyadav2110-bqj7k-master-0       <none>           <none>
dep1-64495756b4-lx9vv                         1/1     Running   0          8s    10.128.2.25   miyadav2110-bqj7k-worker-2mng7   <none>           <none>
machine-api-controllers-78ffbf8794-6pqsp      7/7     Running   4          42m   10.130.0.6    miyadav2110-bqj7k-master-2       <none>           <none>
machine-api-operator-794b4c65c5-jpbqn         2/2     Running   1          45m   10.129.0.2    miyadav2110-bqj7k-master-0       <none>           <none>

[miyadav@miyadav vsphere]$ oc delete machine miyadav2110-bqj7k-worker-2mng7
machine.machine.openshift.io "miyadav2110-bqj7k-worker-2mng7" deleted


[miyadav@miyadav vsphere]$ oc get machines
NAME                             PHASE      TYPE   REGION   ZONE   AGE
miyadav2110-bqj7k-master-0       Running                           49m
miyadav2110-bqj7k-master-1       Running                           49m
miyadav2110-bqj7k-master-2       Running                           49m
miyadav2110-bqj7k-worker-2mng7   Deleting                          45m
miyadav2110-bqj7k-worker-4s9hl                                     25s
miyadav2110-bqj7k-worker-qd2z5   Running                           45m


[miyadav@miyadav vsphere]$ govc datastore.ls -l 'xxxxxxxxxxxxxxxx' | grep 'miyadav2110'
12.0MB    Thu Oct 21 11:36:27 2021  miyadav2110-bqj7k-dynamic-pvc-678a815c-6771-416e-86d1-37d1610db186.vmdk

[miyadav@miyadav vsphere]$ oc get machines
NAME                             PHASE         TYPE   REGION   ZONE   AGE
miyadav2110-bqj7k-master-0       Running                              52m
miyadav2110-bqj7k-master-1       Running                              52m
miyadav2110-bqj7k-master-2       Running                              52m
miyadav2110-bqj7k-worker-4s9hl   Provisioned                          3m4s
miyadav2110-bqj7k-worker-qd2z5   Running                              48m

[miyadav@miyadav vsphere]$ oc get pods -o wide
NAME                                          READY   STATUS    RESTARTS   AGE     IP            NODE                             NOMINATED NODE   READINESS GATES
cluster-autoscaler-operator-d9546b555-fbml2   2/2     Running   1          55m     10.129.0.5    miyadav2110-bqj7k-master-0       <none>           <none>
dep1-64495756b4-mll4f                         1/1     Running   0          6m15s   10.131.0.20   miyadav2110-bqj7k-worker-qd2z5   <none>           <none>
machine-api-controllers-78ffbf8794-6pqsp      7/7     Running   4          51m     10.130.0.6    miyadav2110-bqj7k-master-2       <none>           <none>
machine-api-operator-794b4c65c5-jpbqn         2/2     Running   1          55m     10.129.0.2    miyadav2110-bqj7k-master-0       <none>           <none>
[miyadav@miyadav vsphere]$ oc get machines
NAME                             PHASE     TYPE   REGION   ZONE   AGE
miyadav2110-bqj7k-master-0       Running                          55m
miyadav2110-bqj7k-master-1       Running                          55m
miyadav2110-bqj7k-master-2       Running                          55m
miyadav2110-bqj7k-worker-4s9hl   Running                          6m36s
miyadav2110-bqj7k-worker-qd2z5   Running                          52m
[miyadav@miyadav vsphere]$ 


Moved to PASSED

Comment 8 errata-xmlrpc 2021-11-03 21:01:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.49 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4009