Bug 1900634 - [CNV-2.4.3] CrashLoopBackOff when Creating multiple vms from the same image (using alternate namespace)
Summary: [CNV-2.4.3] CrashLoopBackOff when Creating multiple vms from the same image (...
Keywords:
Status: CLOSED DUPLICATE of bug 1907624
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 2.4.3
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 2.4.z
Assignee: Maya Rashish
QA Contact: Alex Kalenyuk
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-23 13:02 UTC by Benjamin Schmaus
Modified: 2021-05-19 18:51 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-11 17:25:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Benjamin Schmaus 2020-11-23 13:02:39 UTC
Description of problem:
CrashLoopBackOff when Creating multiple vms from the same image (using alternate namespace)

-all VMS come up
-PVCs created with an error have a finalizer configured: -cdi.kubevirt.io/cloneSource which prevents cleanup
-some pods completes cdi-clone-source
-some pods with error:

VOLUME_MODE=filesystem
MOUNT_POINT=/var/run/cdi/clone/source
/var/run/cdi/clone/source /
UPLOAD_BYTES=14857818112
./
./disk.img
I1110 04:06:02.944960      12 clone-source.go:111] content_type is "filesystem-clone"
I1110 04:06:02.945007      12 clone-source.go:112] upload_bytes is 14857818112
I1110 04:06:02.945016      12 clone-source.go:122] Starting cloner target
I1110 04:06:03.221711      12 clone-source.go:134] Set header to filesystem-clone
F1110 04:06:03.230194      12 clone-source.go:139] Error Post https://cdi-upload-disk-0-arc2.abaker9.svc/v1alpha1/upload: dial tcp: lookup cdi-upload-disk-0-arc2.abaker9.svc on 172.30.0.10:53: no such host POSTing to https://cdi-upload-disk-0-arc2.abaker9.svc/v1alpha1/upload

Version-Release number of selected component (if applicable):
CNV 2.4.3
OCP 4.5.17

How reproducible:


Steps to Reproduce:
1. Create many VMs from same image 
2. 
3.

Actual results:
Crashloop backoffs when creating images

Expected results:
No error messages

Additional info:

Comment 6 Maya Rashish 2020-12-01 14:41:29 UTC
What storage class is used for this? (please provide the output of `oc get storageclass`)

for background: I can reproduce this scenario on one storage class.
It allows two pods on the same node to both claim a ReadWriteOnly PVC, but the first pod starts receiving errors, and CDI is not recovering from this.
this kind of failure might be best fixed in the storage class.

Comment 7 Maya Rashish 2020-12-01 14:43:57 UTC
Worth noting that we don't expect this failure to happen with OCS as the storage class.

Comment 8 Michael Henriksen 2020-12-01 16:20:22 UTC
Dup of https://bugzilla.redhat.com/show_bug.cgi?id=1893363

Comment 9 Maya Rashish 2020-12-01 16:56:27 UTC
qualifying the comment above: we don't expect it to happen with the latest OCS, apparently it did in older versions :-)

Comment 10 Michael Henriksen 2020-12-01 17:20:24 UTC
I believe I misled Maya in a previous discussion.  After doing more testing, I see that it is still possible with latest OCS.

Keep an eye on other bz for updates

Comment 11 Natalie Gavrielov 2020-12-02 13:29:28 UTC

*** This bug has been marked as a duplicate of bug 1893363 ***

Comment 12 Maya Rashish 2020-12-21 19:17:56 UTC
Re-opening as MODIFIED because this bug has a different target release than bz#1893363 (which is targeting 2.6.0).
Targeting the upcoming 2.5.3 release first.

Sorry for the noise.

Comment 14 Adam Litke 2021-01-11 17:25:37 UTC
We were already tracking the backport of this via 1907624.  Closing again.

*** This bug has been marked as a duplicate of bug 1907624 ***


Note You need to log in before you can comment on or make changes to this bug.