Bug 2104479

Summary: [4.12] Cloned VM's snapshot restore fails if the source VM disk is deleted
Product: Container Native Virtualization (CNV) Reporter: nijin ashok <nashok>
Component: StorageAssignee: skagan
Status: CLOSED ERRATA QA Contact: dalia <dafrank>
Severity: high Docs Contact:
Priority: high    
Version: 4.10.2CC: alitke, apinnick, dafrank, mrashish, sjhala, yadu
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: CNV v4.12.0-360 Doc Type: Known Issue
Doc Text:
The restore process of a cloned virtual machine (VM) snapshot fails if the source VM is deleted because the DataVolumeTemplate value in the VirtualMachineSnapshotContent CR which refers to the persistent volume claim (PVC) of the source VM triggers a creation of clone datavolume from a non existing source which is stuck waiting on the source PVC to be ready. Instead the datavolume should be created and completed right away since the target PVC for the clone is already populated (from the volumesnapshot). As a workaround, edit the DataVolumeTemplate value of the VirtualMachineSnapshotContent CR to refer to the existing PVC, this will skip the wait for the non existing PVC and see that the PVC is already populated hence the datavolume will complete succeessfully.
Story Points: ---
Clone Of:
: 2109406 2109407 (view as bug list) Environment:
Last Closed: 2023-01-24 13:37:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2109406, 2109407    

Description nijin ashok 2022-07-06 11:39:54 UTC
Description of problem:

[1] Created a new VM and is having below PVC:


rhel8-tory-koala                                                       Bound    pvc-6bafb6a3-e418-4860-91b2-f80af872a11f   30Gi          RWX            ocs-external-storagecluster-ceph-rbd   24s

[2] Created a clone from this VM:

rhel8-tory-koala-clone-rhel8-tory-koala-1m97p                          Bound    pvc-90e296b3-20ba-40d4-b0b7-5d9769ae2657   30Gi          RWX            ocs-external-storagecluster-ceph-rbd   4s

[3] Created a snapshot on the cloned VM:


NAME                                                                      READYTOUSE   SOURCEPVC                                       SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                                     SNAPSHOTCONTENT                                    CREATIONTIME   AGE
vmsnapshot-70be6e17-b80d-46fb-a544-c52f6ee76a61-volume-rhel8-tory-koala   true         rhel8-tory-koala-clone-rhel8-tory-koala-1m97p                           30Gi          ocs-external-storagecluster-rbdplugin-snapclass   snapcontent-4ce6eab1-62b0-42f1-bd72-f4a0a2d63dae   19s            20s


[4] Deleted the VM and disk in [1].

[5] Tried to restore the snapshot. Restoration failed with the error below:

~~~
Error creating DataVolume restore-558e00db-857b-415d-8f37-bd7369194419-rhel8-tory-koala: admission webhook "datavolume-validate.cdi.kubevirt.io" denied the request: Source PVC default/rhel8-tory-koala not found
~~~

It is looking for PVC in [1] instead of PVC of the cloned VM.

Version-Release number of selected component (if applicable):

kubevirt-hyperconverged-operator.v4.10.2   

How reproducible:
100 %

Steps to Reproduce:

Please refer above.

Actual results:

Cloned VM's snapshot restore fails if the source VM disk is deleted.

Expected results:

Snapshot restore should work.

Additional info:

Comment 1 skagan 2022-07-14 14:17:39 UTC
Hi @nashok I will really appreciate the yamls of the original VM, and of the cloned VM. Also an explanation of the process of the VM clone that was done in this case. Thanks

Comment 2 nijin ashok 2022-07-18 12:58:59 UTC
Attaching the yamls of VMs.

The VM clone was done from the OpenShift console using "clone" option. 

Looks like the issue is because the VirtualMachineSnapshotContent of cloned VM refers to the source VM PVC instead of cloned PVC.

~~~
yq -y '.spec.source.virtualMachine.spec.dataVolumeTemplates' /tmp/vmsnapshot-content-53237ca8-7ca8-4894-ab85-0ba132a968e0.yaml
- metadata:
    creationTimestamp: null
    name: rhel8-resident-heron-clone-rhel8-resident-heron-2vald
  spec:
    source:
      pvc:
        name: rhel8-resident-heron <<<<
        namespace: nijin-cnv
    storage:
      accessModes:
        - ReadWriteMany
      resources:
        requests:
          storage: 30Gi
      storageClassName: ocs-external-storagecluster-ceph-rbd
      volumeMode: Block
~~~

The restore works if I manually edit this and changed it to cloned PVC.

Comment 4 Shikha Jhala 2022-07-27 20:10:48 UTC
Added known issue to 4.11 release notes. @dafrank Please review: https://github.com/openshift/openshift-docs/pull/48328. Thank you.

Comment 6 dalia 2022-11-03 11:18:38 UTC
verification blocked by this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=2139738

Comment 7 dalia 2022-11-24 13:01:45 UTC
verified on CNV 4.12

Comment 11 errata-xmlrpc 2023-01-24 13:37:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.12.0 Images security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0408