Bug 2098225 - [4.11] VM Snapshot Restore hangs indefinitely when backed by a snapshotclass
Summary: [4.11] VM Snapshot Restore hangs indefinitely when backed by a snapshotclass
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 4.11.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.1
Assignee: Adam Litke
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On: 2070366
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-17 15:57 UTC by Adam Litke
Modified: 2022-12-01 21:10 UTC (History)
9 users (show)

Fixed In Version: v4.11.0-600
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2070366
Environment:
Last Closed: 2022-12-01 21:10:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 8078 0 None Merged Must clear "cloneRequest" annotation on restore of PVC 2022-08-01 18:00:08 UTC
Github kubevirt kubevirt pull 8241 0 None Merged [release-v0.53] Must clear "cloneRequest" annotation on restore of PVC 2022-08-11 15:59:12 UTC
Red Hat Issue Tracker CNV-19200 0 None None None 2022-10-31 23:37:51 UTC

Description Adam Litke 2022-06-17 15:57:50 UTC
+++ This bug was initially created as a clone of Bug #2070366 +++

Description of problem:

New "restore" PVC appears to be waiting on a CDI upload server pod to finish, but work is never sent to that pod.

Version-Release number of selected component (if applicable):
4.10.0

How reproducible:
Always

Steps to Reproduce:
1. Take a snapshot of a (in my case Running) VM
2. Shut down VM
3. Use UI to Restore to snapshot

Actual results:
VM is indefinitely in pending state, no log activity in cdi uploader pod

Expected results:
VM is recreated with PVC referencing VolumeSnapshot, and back-end snapshot class handles restore; VM starts quickly.

Additional info:

--- Additional comment from Michael Henriksen on 2022-03-31 12:18:02 UTC ---

vm/datavolume/pvc yamls pre and post restore would be very helpful

--- Additional comment from Chandler Wilkerson on 2022-03-31 13:51:44 UTC ---

Pre:

VM: http://pastebin.test.redhat.com/1041370
DV: http://pastebin.test.redhat.com/1041371
PVC: http://pastebin.test.redhat.com/1041369

--- Additional comment from Chandler Wilkerson on 2022-03-31 13:57:32 UTC ---

Post restore:

VM: http://pastebin.test.redhat.com/1041376
DV: http://pastebin.test.redhat.com/1041373
PVC: http://pastebin.test.redhat.com/1041374

--- Additional comment from Michael Henriksen on 2022-04-01 00:07:14 UTC ---

This issue should only affect DataVolumes created via network clone operations.

Is a regression introduced in this PR:  https://github.com/kubevirt/containerized-data-importer/pull/1922

--- Additional comment from Michael Henriksen on 2022-04-01 00:53:57 UTC ---

Somewhat related to this PR in progress:  https://github.com/kubevirt/containerized-data-importer/pull/2205

--- Additional comment from Bartosz Rybacki on 2022-05-30 09:25:04 UTC ---

Michael, Adam, I think we should update this bug. 

There are two fixes [1] [2] that were merged to CDI 1.49 which landed in OpenShift 4.11. Do we want to backport it to 1.43 so it fixes the bug in 4.10?


[1] https://github.com/kubevirt/containerized-data-importer/pull/2205
[2] https://github.com/kubevirt/containerized-data-importer/pull/2227

Comment 1 Adam Litke 2022-06-17 17:07:46 UTC
Not needed because the fixes were taken into main prior to forking release-v1.49.

Comment 2 Adam Litke 2022-07-12 12:24:12 UTC
Reopening to consume the additional fix identified in Bug 2070366 - https://github.com/kubevirt/kubevirt/pull/8078

Comment 3 Adam Litke 2022-07-13 16:05:21 UTC
A workaround for this issue has been documented here: https://bugzilla.redhat.com/show_bug.cgi?id=2070366#c14

Consequently, I am removing blocker+ and pushing this out to 4.11.1

Comment 4 Adam Litke 2022-08-01 18:49:50 UTC
Assigning back to you Bartosz.  This will need a backport.

Comment 5 Bartosz Rybacki 2022-08-03 13:04:07 UTC
I need to backport to 4.11 - for kubevirt it means upstream 0.53 - in progress

Comment 6 Bartosz Rybacki 2022-08-11 15:59:21 UTC
kubevirt backport merged, waiting for a version D/S

https://github.com/kubevirt/kubevirt/commit/dee5e32564ab06cefaa8ca082c5792d8ce7e7bc1

Comment 8 Kevin Alon Goldblatt 2022-10-25 09:01:01 UTC
Verified with the following code:
-----------------------------------------------
Client Version: 4.11.0-202209201358.p0.g262ac9c.assembly.stream-262ac9c
Kustomize Version: v4.5.4
Server Version: 4.11.10
Kubernetes Version: v1.24.6+5157800

c get csv -n openshift-cnv
NAME                                       DISPLAY                    VERSION   REPLACES                                   PHASE
kubevirt-hyperconverged-operator.v4.11.1   OpenShift Virtualization   4.11.1    kubevirt-hyperconverged-operator.v4.11.0   Succeeded


Verified with the following scenario:
-----------------------------------------------
1. Take a snapshot of a (in my case Running) VM
2. Shut down VM
3. Use UI to Restore to snapshot

Actual results:
VM is recreated with PVC referencing VolumeSnapshot, and back-end snapshot class handles restore; VM starts quickly.

Moving to VERIFIED!

Comment 17 errata-xmlrpc 2022-12-01 21:10:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.11.1 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8750


Note You need to log in before you can comment on or make changes to this bug.