Bug 2098225

Summary: [4.11] VM Snapshot Restore hangs indefinitely when backed by a snapshotclass
Product: Container Native Virtualization (CNV) Reporter: Adam Litke <alitke>
Component: StorageAssignee: Adam Litke <alitke>
Status: CLOSED ERRATA QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: high Docs Contact:
Priority: high    
Version: 4.11.0CC: akalenyu, alitke, cnv-qe-bugs, cwilkers, kgoldbla, mhenriks, ngavrilo, pelauter, yadu
Target Milestone: ---Keywords: Reopened
Target Release: 4.11.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.11.0-600 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2070366 Environment:
Last Closed: 2022-12-01 21:10:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2070366    
Bug Blocks:    

Description Adam Litke 2022-06-17 15:57:50 UTC
+++ This bug was initially created as a clone of Bug #2070366 +++

Description of problem:

New "restore" PVC appears to be waiting on a CDI upload server pod to finish, but work is never sent to that pod.

Version-Release number of selected component (if applicable):
4.10.0

How reproducible:
Always

Steps to Reproduce:
1. Take a snapshot of a (in my case Running) VM
2. Shut down VM
3. Use UI to Restore to snapshot

Actual results:
VM is indefinitely in pending state, no log activity in cdi uploader pod

Expected results:
VM is recreated with PVC referencing VolumeSnapshot, and back-end snapshot class handles restore; VM starts quickly.

Additional info:

--- Additional comment from Michael Henriksen on 2022-03-31 12:18:02 UTC ---

vm/datavolume/pvc yamls pre and post restore would be very helpful

--- Additional comment from Chandler Wilkerson on 2022-03-31 13:51:44 UTC ---

Pre:

VM: http://pastebin.test.redhat.com/1041370
DV: http://pastebin.test.redhat.com/1041371
PVC: http://pastebin.test.redhat.com/1041369

--- Additional comment from Chandler Wilkerson on 2022-03-31 13:57:32 UTC ---

Post restore:

VM: http://pastebin.test.redhat.com/1041376
DV: http://pastebin.test.redhat.com/1041373
PVC: http://pastebin.test.redhat.com/1041374

--- Additional comment from Michael Henriksen on 2022-04-01 00:07:14 UTC ---

This issue should only affect DataVolumes created via network clone operations.

Is a regression introduced in this PR:  https://github.com/kubevirt/containerized-data-importer/pull/1922

--- Additional comment from Michael Henriksen on 2022-04-01 00:53:57 UTC ---

Somewhat related to this PR in progress:  https://github.com/kubevirt/containerized-data-importer/pull/2205

--- Additional comment from Bartosz Rybacki on 2022-05-30 09:25:04 UTC ---

Michael, Adam, I think we should update this bug. 

There are two fixes [1] [2] that were merged to CDI 1.49 which landed in OpenShift 4.11. Do we want to backport it to 1.43 so it fixes the bug in 4.10?


[1] https://github.com/kubevirt/containerized-data-importer/pull/2205
[2] https://github.com/kubevirt/containerized-data-importer/pull/2227

Comment 1 Adam Litke 2022-06-17 17:07:46 UTC
Not needed because the fixes were taken into main prior to forking release-v1.49.

Comment 2 Adam Litke 2022-07-12 12:24:12 UTC
Reopening to consume the additional fix identified in Bug 2070366 - https://github.com/kubevirt/kubevirt/pull/8078

Comment 3 Adam Litke 2022-07-13 16:05:21 UTC
A workaround for this issue has been documented here: https://bugzilla.redhat.com/show_bug.cgi?id=2070366#c14

Consequently, I am removing blocker+ and pushing this out to 4.11.1

Comment 4 Adam Litke 2022-08-01 18:49:50 UTC
Assigning back to you Bartosz.  This will need a backport.

Comment 5 Bartosz Rybacki 2022-08-03 13:04:07 UTC
I need to backport to 4.11 - for kubevirt it means upstream 0.53 - in progress

Comment 6 Bartosz Rybacki 2022-08-11 15:59:21 UTC
kubevirt backport merged, waiting for a version D/S

https://github.com/kubevirt/kubevirt/commit/dee5e32564ab06cefaa8ca082c5792d8ce7e7bc1

Comment 8 Kevin Alon Goldblatt 2022-10-25 09:01:01 UTC
Verified with the following code:
-----------------------------------------------
Client Version: 4.11.0-202209201358.p0.g262ac9c.assembly.stream-262ac9c
Kustomize Version: v4.5.4
Server Version: 4.11.10
Kubernetes Version: v1.24.6+5157800

c get csv -n openshift-cnv
NAME                                       DISPLAY                    VERSION   REPLACES                                   PHASE
kubevirt-hyperconverged-operator.v4.11.1   OpenShift Virtualization   4.11.1    kubevirt-hyperconverged-operator.v4.11.0   Succeeded


Verified with the following scenario:
-----------------------------------------------
1. Take a snapshot of a (in my case Running) VM
2. Shut down VM
3. Use UI to Restore to snapshot

Actual results:
VM is recreated with PVC referencing VolumeSnapshot, and back-end snapshot class handles restore; VM starts quickly.

Moving to VERIFIED!

Comment 17 errata-xmlrpc 2022-12-01 21:10:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.11.1 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8750