1867122 – If importing VM disks from URL takes more than 10 minutes, VMI get destroyed and recreated generating noise in user facing events

Bug 1867122 - If importing VM disks from URL takes more than 10 minutes, VMI get destroyed and recreated generating noise in user facing events

Summary: If importing VM disks from URL takes more than 10 minutes, VMI get destroyed ...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Virtualization
Sub Component:
Version:	2.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	future
Assignee:	sgott
QA Contact:	Israel Pinto
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-07 11:57 UTC by Simone Tiraboschi
Modified:	2021-02-17 13:13 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-17 13:12:38 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
VMI killed and recreated (178.12 KB, image/png) 2020-08-07 11:58 UTC, Simone Tiraboschi	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1862701	0	medium	CLOSED	OpenShift Virtualization defaults to LiveMigration eviction strategy even when not available (VM creation fails)	2021-07-27 14:21:52 UTC

Description Simone Tiraboschi 2020-08-07 11:57:47 UTC

Description of problem:
I was trying to reproduce https://bugzilla.redhat.com/1862701 exactly as for the attached screenshots.

In my case, for some strange reason (a wrong/overloaded mirror???), downloading the disk image from https://dl.fedoraproject.org/pub/fedora/linux/releases/32/Cloud/x86_64/images/Fedora-Cloud-Base-32-1.6.x86_64.qcow2 took more than 10 minutes:

 NAME                                            PHASE              PROGRESS   RESTARTS   AGE
 datavolume.cdi.kubevirt.io/fedora-vm-rootdisk   ImportInProgress   86.25%     0          10m


So, after 10 minutes, a readiness probe error got triggered on virt-launcher pod and this caused its VMI object to be destroyed and recreated.

A the end the VM successfully started as expected but we have a lot of noise in user facing events.
See the attached screenshot.

I also got VMI related error events with:
(combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2020-08-07T10:23:03.895236Z qemu-kvm: -blockdev {\"driver\":\"file\",\"filename\":\"/var/run/kubevirt-private/vmi-disks/rootdisk/disk.img\",\"node-name\":\"libvirt-2-storage\",\"auto-read-only\":true,\"discard\":\"unmap\"}: Could not open '/var/run/kubevirt-private/vmi-disks/rootdisk/disk.img': Permission denied')"


Version-Release number of selected component (if applicable):


How reproducible:
100%


Steps to Reproduce:
1. try to start a VM from UI wizard specifying a URL source
2. ensure that CDI takes more than 10 minutes to download the source disk
3.

Actual results:
- many VMI related error events
- VMI got deleted and recreated after 10 minutes

Expected results:
No false negative events if the download is progressing

Additional info:

Comment 1 Simone Tiraboschi 2020-08-07 11:58:36 UTC

Created attachment 1710792 [details]
VMI killed and recreated

Comment 2 Nelly Credi 2020-09-30 12:47:50 UTC

@Israel, @Stu, should this bug be on Virt?
(removing useless events from the log vs making them disappear in the UI)

Comment 4 sgott 2020-10-21 12:45:02 UTC

Moving this BZ to virtualization for proper tracking. This isn't a UX BZ

Comment 7 Roman Mohr 2020-12-02 13:32:35 UTC

If this issue can be reproduced, here is what should happen, to provide hints where to look for a fix:

 1) We check if all datavolumes are imported
 2) if they are not imported, we don't create a pod
 3) once all DVs are done with the import, we create the pod

Comment 8 Kedar Bidarkar 2021-01-27 13:31:40 UTC

Try to reproduce this with CNV-2.6.0

Comment 9 Kedar Bidarkar 2021-01-29 22:38:17 UTC

Summary: Was unable to reproduce this issue on CNV-2.6.0 (cnv/virt-operator/v2.6.0-106)

1) Created a VM( on a cluster in US) from the UI Wizard, using a link from (EMEA)
2) Download took more than 10 mins and no false negative events observed anymore.

Attaching a screenshot shortly.

Comment 12 Kedar Bidarkar 2021-02-17 13:13:34 UTC

Cannot reproduce in the current release CNV-2.6.0

Note You need to log in before you can comment on or make changes to this bug.