Bug 1769595 - virtual-machinecontroller retries DataVolume creation that can't succeed without cleanup
Summary: virtual-machinecontroller retries DataVolume creation that can't succeed with...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 2.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 2.3.0
Assignee: Adam Litke
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-06 22:41 UTC by Stephen Gordon
Modified: 2020-05-04 19:10 UTC (History)
4 users (show)

Fixed In Version: virt-cdi-operator-container-v2.3.0-32 hco-bundle-registry-container-v2.2.0-353
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-04 19:10:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
import error - PVC already exist (85.00 KB, image/png)
2019-11-10 12:41 UTC, Israel Pinto
no flags Details
importer_log.log (6.56 KB, text/plain)
2019-11-10 12:50 UTC, Israel Pinto
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt containerized-data-importer pull 1099 0 None closed Remove code that potentially let DV get into failed state with Restar… 2020-03-10 12:22:32 UTC
Red Hat Product Errata RHEA-2020:2011 0 None None None 2020-05-04 19:10:48 UTC

Description Stephen Gordon 2019-11-06 22:41:10 UTC
Description of problem:

Used the create VM wizard in the UI to create a VM from URL of the Fedora 31 qcow2 cloud image.

The event log shows that the DataVolume fedora-31-rootdisk failed to import disk image. A short while later the operation is retried but fails because the previous operation had already created the PVC.

It would appear there is no way to unstick this without the user manually going in and performing cleanup of the underlying objects created by the failed run?

Version-Release number of selected component (if applicable):

2.1

Comment 1 Israel Pinto 2019-11-10 12:41:09 UTC
Created attachment 1634566 [details]
import error - PVC already exist

Comment 2 Israel Pinto 2019-11-10 12:49:20 UTC
@Stu,
For the retries it looks like CDI / Storage issue for not cleaning the PVC. 
Can you take a look?

Comment 3 Israel Pinto 2019-11-10 12:50:17 UTC
Created attachment 1634567 [details]
importer_log.log

Comment 4 sgott 2019-11-11 14:15:00 UTC
Moving this component to storage, please let us know if you think this is in error.

Comment 5 Alexander Wels 2019-11-19 19:12:10 UTC
I am fairly certain this is a case of: https://github.com/kubevirt/containerized-data-importer/issues/642 because of the failure, the DV is marked failed (shouldn't be) which causes kubevirt to try again. I suspect kubevirt deletes the DV and tries to create another one. However CDI is actually still trying to import the image, which means the importer pod is still running, which means the PVC cannot be deleted by kubernetes until the pod is finished. So now kubevirt retrying is going to fail on the existing PVC from the previous try which CDI is still using to import. CDI needs to do 2 things I think:

1. While it is still retrying, do NOT mark the DV as failed, and thus kubevirt will not attempt to create a new one.
2. When the DV is deleted, make sure any pods that are still running associated with that DV are deleted as well. (either manually, or setting ownerRefs on the pods (prefer ownerRefs))

Comment 6 Adam Litke 2020-02-19 14:42:51 UTC
This is fixed in CDI-1.13.0 so will update this bug accordingly.

Comment 7 Kevin Alon Goldblatt 2020-03-04 15:28:58 UTC
Verified with the following code:
----------------------------------------------------
oc version
Client Version: 4.4.0-0.nightly-2020-02-17-022408
Server Version: 4.4.0-0.nightly-2020-03-02-011520
Kubernetes Version: v1.17.1

oc get csv --all-namespaces
NAMESPACE                              NAME                                      DISPLAY                           VERSION   REPLACES   PHASE
openshift-cnv                          kubevirt-hyperconverged-operator.v2.3.0   Container-native virtualization   2.3.0                Succeeded
openshift-operator-lifecycle-manager   packageserver                             Package Server                    0.14.1               Succeeded



Created with the following scenario:
---------------------------------------------------
Used the create VM wizard in the UI to create a VM from URL of the Fedora 31 qcow2 cloud image.

The dv was created 
The import succeeded
The vm was created successfully


Moving to VERIFIED

Comment 10 errata-xmlrpc 2020-05-04 19:10:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2011


Note You need to log in before you can comment on or make changes to this bug.