1900634 – [CNV-2.4.3] CrashLoopBackOff when Creating multiple vms from the same image (using alternate namespace)

Bug 1900634 - [CNV-2.4.3] CrashLoopBackOff when Creating multiple vms from the same image (using alternate namespace)

Summary: [CNV-2.4.3] CrashLoopBackOff when Creating multiple vms from the same image (...

Keywords:
Status:	CLOSED DUPLICATE of bug 1907624
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	2.4.3
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	2.4.z
Assignee:	Maya Rashish
QA Contact:	Alex Kalenyuk
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-11-23 13:02 UTC by Benjamin Schmaus
Modified:	2021-05-19 18:51 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-01-11 17:25:37 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Benjamin Schmaus 2020-11-23 13:02:39 UTC

Description of problem:
CrashLoopBackOff when Creating multiple vms from the same image (using alternate namespace)

-all VMS come up
-PVCs created with an error have a finalizer configured: -cdi.kubevirt.io/cloneSource which prevents cleanup
-some pods completes cdi-clone-source
-some pods with error:

VOLUME_MODE=filesystem
MOUNT_POINT=/var/run/cdi/clone/source
/var/run/cdi/clone/source /
UPLOAD_BYTES=14857818112
./
./disk.img
I1110 04:06:02.944960      12 clone-source.go:111] content_type is "filesystem-clone"
I1110 04:06:02.945007      12 clone-source.go:112] upload_bytes is 14857818112
I1110 04:06:02.945016      12 clone-source.go:122] Starting cloner target
I1110 04:06:03.221711      12 clone-source.go:134] Set header to filesystem-clone
F1110 04:06:03.230194      12 clone-source.go:139] Error Post https://cdi-upload-disk-0-arc2.abaker9.svc/v1alpha1/upload: dial tcp: lookup cdi-upload-disk-0-arc2.abaker9.svc on 172.30.0.10:53: no such host POSTing to https://cdi-upload-disk-0-arc2.abaker9.svc/v1alpha1/upload

Version-Release number of selected component (if applicable):
CNV 2.4.3
OCP 4.5.17

How reproducible:


Steps to Reproduce:
1. Create many VMs from same image 
2. 
3.

Actual results:
Crashloop backoffs when creating images

Expected results:
No error messages

Additional info:

Comment 6 Maya Rashish 2020-12-01 14:41:29 UTC

What storage class is used for this? (please provide the output of `oc get storageclass`)

for background: I can reproduce this scenario on one storage class.
It allows two pods on the same node to both claim a ReadWriteOnly PVC, but the first pod starts receiving errors, and CDI is not recovering from this.
this kind of failure might be best fixed in the storage class.

Comment 7 Maya Rashish 2020-12-01 14:43:57 UTC

Worth noting that we don't expect this failure to happen with OCS as the storage class.

Comment 8 Michael Henriksen 2020-12-01 16:20:22 UTC

Dup of https://bugzilla.redhat.com/show_bug.cgi?id=1893363

Comment 9 Maya Rashish 2020-12-01 16:56:27 UTC

qualifying the comment above: we don't expect it to happen with the latest OCS, apparently it did in older versions :-)

Comment 10 Michael Henriksen 2020-12-01 17:20:24 UTC

I believe I misled Maya in a previous discussion.  After doing more testing, I see that it is still possible with latest OCS.

Keep an eye on other bz for updates

Comment 11 Natalie Gavrielov 2020-12-02 13:29:28 UTC


*** This bug has been marked as a duplicate of bug 1893363 ***

Comment 12 Maya Rashish 2020-12-21 19:17:56 UTC

Re-opening as MODIFIED because this bug has a different target release than bz#1893363 (which is targeting 2.6.0).
Targeting the upcoming 2.5.3 release first.

Sorry for the noise.

Comment 14 Adam Litke 2021-01-11 17:25:37 UTC

We were already tracking the backport of this via 1907624.  Closing again.

*** This bug has been marked as a duplicate of bug 1907624 ***

Note You need to log in before you can comment on or make changes to this bug.