2172612 – [4.13] VMSnaphot and WaitForFirstConsumer storage: VMRestore is not Complete

Bug 2172612 - [4.13] VMSnaphot and WaitForFirstConsumer storage: VMRestore is not Complete

Summary: [4.13] VMSnaphot and WaitForFirstConsumer storage: VMRestore is not Complete

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.13.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.13.0
Assignee:	skagan
QA Contact:	Jenia Peimer
Docs Contact:
URL:
Whiteboard:
Depends On:	2149654
Blocks:
TreeView+	depends on / blocked

Reported:	2023-02-22 17:41 UTC by Jenia Peimer
Modified:	2023-05-18 02:58 UTC (History)
CC List:	6 users (show)
Fixed In Version:	CNV-v4.13.0.rhel9-1808
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	2149654
Environment:
Last Closed:	2023-05-18 02:57:49 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt kubevirt pull 9376	None	Merged	Fix vmrestore with WFFC snapshotable storage class	2023-03-20 13:06:42 UTC
Github	kubevirt kubevirt pull 9408#event-8741990316	None	Merged	[release-0.59] Fix vmrestore with WFFC snapshotable storage class	2023-03-14 11:56:41 UTC
Red Hat Issue Tracker	CNV-26063	None	None	None	2023-02-22 18:01:35 UTC
Red Hat Product Errata	RHSA-2023:3205	None	None	None	2023-05-18 02:58:01 UTC

Description Jenia Peimer 2023-02-22 17:41:55 UTC

+++ This bug was initially created as a clone of Bug #2149654 +++

Description of problem:
VMRestore doesn't get to the Complete state,
restore DV stays WaitForFirstConsumer,
restore PVC is Pending
restore VM is Stopped and not Ready

Version-Release number of selected component (if applicable):
4.12

How reproducible:
Always on SNO cluster with snapshot capable storage with WaitForFirstConsumer volumeBindingMode (TopoLVM storage in our case - lvms-vg1)

Steps to Reproduce:
1. Create a VM - VM is Running
2. Create a VMSnapshot - VMSnapshot is ReadyToUse
3. Create a VMRestore

Actual results:
VMRestore is not Complete

   $ oc get vmrestore
   NAME            TARGETKIND       TARGETNAME    COMPLETE   RESTORETIME   ERROR
   restore-my-vm   VirtualMachine   vm-restored   false  

Expected results:
VMRestore is Complete (PVC Bound, DV Succeded and garbage collected)

Workaround and ONE MORE ISSUE:
1. Start the restored VM
2. See the VM is Ready and Running, DV succeeded, PVC Bound
3. See the VMRestore is still not Complete:

   $ oc get vmrestore
   NAME            TARGETKIND       TARGETNAME    COMPLETE   RESTORETIME   ERROR
   restore-my-vm   VirtualMachine   vm-restored   false  

   $ oc describe vmrestore restore-my-vm | grep Events -A 10
   Events:
     Type     Reason                      Age                    From                Message
     ----     ------                      ----                   ---- 
   Warning  VirtualMachineRestoreError  4m4s (x23 over 4m21s)  restore-controller  VirtualMachineRestore encountered error invalid RunStrategy "Always"

4. See the restored VM runStrategy:
   $ oc get vm vm-restored -oyaml | grep running
      running: true


***
PLEASE NOTE that the restored VM on OCS with Immediate volumeBindingMode on the multi-node cluster gets the "running: false", despite that the source VM had it "true", and we are not getting the above error, and VMRestore becomes Complete:
   $ oc get vm vm-restored-ocs -oyaml | grep running
      running: false
***


5. Stop the restored VM
6. See the VMRestore is Complete:
   $ oc get vmrestore
   NAME            TARGETKIND       TARGETNAME    COMPLETE   RESTORETIME   ERROR
   restore-my-vm   VirtualMachine   vm-restored   true       1s            


Additional info:

VM yaml: 

$ cat vm.yaml 
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  name: vm-cirros-source
  labels:
    kubevirt.io/vm: vm-cirros-source
spec:
  dataVolumeTemplates:
  - metadata:
      name: cirros-dv-source
    spec:
      storage:
        resources:
          requests:
            storage: 1Gi
        storageClassName: odf-lvm-vg1
      source:
        http:
          url: <cirros-0.4.0-x86_64-disk.qcow2>
  running: true
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-cirros-source
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: datavolumev
        machine:
          type: ""
        resources:
          requests:
            memory: 100M
      terminationGracePeriodSeconds: 0
      volumes:
      - dataVolume:
          name: cirros-dv-source
        name: datavolumev


VMSnapshot yaml:

$ cat snap.yaml 
apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineSnapshot
metadata:
  name: my-vmsnapshot 
spec:
  source:
    apiGroup: kubevirt.io
    kind: VirtualMachine
    name: vm-cirros-source


VMRestore yaml:

$ cat vmrestore.yaml 
apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineRestore
metadata:
  name: restore-my-vm
spec:
  target:
    apiGroup: kubevirt.io
    kind: VirtualMachine
    name: vm-restored
  virtualMachineSnapshotName: my-vmsnapshot

--- Additional comment from Jenia Peimer on 2023-02-19 13:25:23 UTC ---

Just to keep the info in this BZ: this bug was discussed at KubeVirt SIG-Storage Meeting, and the current approach to fix it is to mark VMRestore Complete when DV is WFFC and PVC is Pending.

Comment 1 Jenia Peimer 2023-03-21 15:14:41 UTC

Verified on SNO cluster with TopoLVM with WFFC

Comment 4 errata-xmlrpc 2023-05-18 02:57:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3205

Note You need to log in before you can comment on or make changes to this bug.