2149654 – [4.12] VMSnaphot and WaitForFirstConsumer storage: VMRestore is not Complete

Bug 2149654 - [4.12] VMSnaphot and WaitForFirstConsumer storage: VMRestore is not Complete

Summary: [4.12] VMSnaphot and WaitForFirstConsumer storage: VMRestore is not Complete

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.12.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.12.3
Assignee:	skagan
QA Contact:	Jenia Peimer
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2172612
TreeView+	depends on / blocked

Reported:	2022-11-30 14:14 UTC by Jenia Peimer
Modified:	2023-05-23 22:31 UTC (History)
CC List:	4 users (show)
Fixed In Version:	CNV-v4.12.3-26
Doc Type:	Known Issue
Doc Text:	* When restoring a VM snapshot for storage whose binding mode is `WaitForFirstConsumer`, the restored PVCs remain in the `Pending` state and the restore operation does not progress. ** As a workaround, start the restored VM, stop it, and then start it again. The VM will be scheduled, the PVCs will be in the `Bound` state, and the restore operation will complete.
Clone Of:
Clones:	2172612 (view as bug list)
Environment:
Last Closed:	2023-05-23 22:31:22 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt kubevirt pull 9416	None	Merged	[release-0.58] Fix vmrestore with WFFC snapshotable storage class	2023-03-20 13:05:46 UTC
Red Hat Issue Tracker	CNV-23028	None	None	None	2022-11-30 14:35:59 UTC
Red Hat Product Errata	RHEA-2023:3283	None	None	None	2023-05-23 22:31:39 UTC

Description Jenia Peimer 2022-11-30 14:14:39 UTC

Description of problem:
VMRestore doesn't get to the Complete state,
restore DV stays WaitForFirstConsumer,
restore PVC is Pending
restore VM is Stopped and not Ready

Version-Release number of selected component (if applicable):
4.12

How reproducible:
Always on SNO cluster with snapshot capable storage with WaitForFirstConsumer volumeBindingMode (TopoLVM storage in our case - odf-lvm-vg1)

Steps to Reproduce:
1. Create a VM - VM is Running
2. Create a VMSnapshot - VMSnapshot is ReadyToUse
3. Create a VMRestore

Actual results:
VMRestore is not Complete

   $ oc get vmrestore
   NAME            TARGETKIND       TARGETNAME    COMPLETE   RESTORETIME   ERROR
   restore-my-vm   VirtualMachine   vm-restored   false  

Expected results:
VMRestore is Complete (PVC Bound, DV Succeded and garbage collected)

Workaround and ONE MORE ISSUE:
1. Start the restored VM
2. See the VM is Ready and Running, DV succeeded, PVC Bound
3. See the VMRestore is still not Complete:

   $ oc get vmrestore
   NAME            TARGETKIND       TARGETNAME    COMPLETE   RESTORETIME   ERROR
   restore-my-vm   VirtualMachine   vm-restored   false  

   $ oc describe vmrestore restore-my-vm | grep Events -A 10
   Events:
     Type     Reason                      Age                    From                Message
     ----     ------                      ----                   ---- 
   Warning  VirtualMachineRestoreError  4m4s (x23 over 4m21s)  restore-controller  VirtualMachineRestore encountered error invalid RunStrategy "Always"

4. See the restored VM runStrategy:
   $ oc get vm vm-restored -oyaml | grep running
      running: true


***
PLEASE NOTE that the restored VM on OCS with Immediate volumeBindingMode on the multi-node cluster gets the "running: false", despite that the source VM had it "true", and we are not getting the above error, and VMRestore becomes Complete:
   $ oc get vm vm-restored-ocs -oyaml | grep running
      running: false
***


5. Stop the restored VM
6. See the VMRestore is Complete:
   $ oc get vmrestore
   NAME            TARGETKIND       TARGETNAME    COMPLETE   RESTORETIME   ERROR
   restore-my-vm   VirtualMachine   vm-restored   true       1s            


Additional info:

VM yaml: 

$ cat vm.yaml 
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  name: vm-cirros-source
  labels:
    kubevirt.io/vm: vm-cirros-source
spec:
  dataVolumeTemplates:
  - metadata:
      name: cirros-dv-source
    spec:
      storage:
        resources:
          requests:
            storage: 1Gi
        storageClassName: odf-lvm-vg1
      source:
        http:
          url: <cirros-0.4.0-x86_64-disk.qcow2>
  running: true
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-cirros-source
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: datavolumev
        machine:
          type: ""
        resources:
          requests:
            memory: 100M
      terminationGracePeriodSeconds: 0
      volumes:
      - dataVolume:
          name: cirros-dv-source
        name: datavolumev


VMSnapshot yaml:

$ cat snap.yaml 
apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineSnapshot
metadata:
  name: my-vmsnapshot 
spec:
  source:
    apiGroup: kubevirt.io
    kind: VirtualMachine
    name: vm-cirros-source


VMRestore yaml:

$ cat vmrestore.yaml 
apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineRestore
metadata:
  name: restore-my-vm
spec:
  target:
    apiGroup: kubevirt.io
    kind: VirtualMachine
    name: vm-restored
  virtualMachineSnapshotName: my-vmsnapshot

Comment 2 Jenia Peimer 2023-02-19 13:25:23 UTC

Just to keep the info in this BZ: this bug was discussed at KubeVirt SIG-Storage Meeting, and the current approach to fix it is to mark VMRestore Complete when DV is WFFC and PVC is Pending.

Comment 4 Jenia Peimer 2023-04-19 13:58:33 UTC

Verified on SNO cluster with TopoLVM with WFFC

Comment 10 errata-xmlrpc 2023-05-23 22:31:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 4.12.3 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:3283

Note You need to log in before you can comment on or make changes to this bug.