Bug 1867122 - If importing VM disks from URL takes more than 10 minutes, VMI get destroyed and recreated generating noise in user facing events
Summary: If importing VM disks from URL takes more than 10 minutes, VMI get destroyed ...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.4.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: future
Assignee: sgott
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-07 11:57 UTC by Simone Tiraboschi
Modified: 2021-02-17 13:13 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-17 13:12:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
VMI killed and recreated (178.12 KB, image/png)
2020-08-07 11:58 UTC, Simone Tiraboschi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1862701 0 medium CLOSED OpenShift Virtualization defaults to LiveMigration eviction strategy even when not available (VM creation fails) 2021-07-27 14:21:52 UTC

Description Simone Tiraboschi 2020-08-07 11:57:47 UTC
Description of problem:
I was trying to reproduce https://bugzilla.redhat.com/1862701 exactly as for the attached screenshots.

In my case, for some strange reason (a wrong/overloaded mirror???), downloading the disk image from https://dl.fedoraproject.org/pub/fedora/linux/releases/32/Cloud/x86_64/images/Fedora-Cloud-Base-32-1.6.x86_64.qcow2 took more than 10 minutes:

 NAME                                            PHASE              PROGRESS   RESTARTS   AGE
 datavolume.cdi.kubevirt.io/fedora-vm-rootdisk   ImportInProgress   86.25%     0          10m


So, after 10 minutes, a readiness probe error got triggered on virt-launcher pod and this caused its VMI object to be destroyed and recreated.

A the end the VM successfully started as expected but we have a lot of noise in user facing events.
See the attached screenshot.

I also got VMI related error events with:
(combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2020-08-07T10:23:03.895236Z qemu-kvm: -blockdev {\"driver\":\"file\",\"filename\":\"/var/run/kubevirt-private/vmi-disks/rootdisk/disk.img\",\"node-name\":\"libvirt-2-storage\",\"auto-read-only\":true,\"discard\":\"unmap\"}: Could not open '/var/run/kubevirt-private/vmi-disks/rootdisk/disk.img': Permission denied')"


Version-Release number of selected component (if applicable):


How reproducible:
100%


Steps to Reproduce:
1. try to start a VM from UI wizard specifying a URL source
2. ensure that CDI takes more than 10 minutes to download the source disk
3.

Actual results:
- many VMI related error events
- VMI got deleted and recreated after 10 minutes

Expected results:
No false negative events if the download is progressing

Additional info:

Comment 1 Simone Tiraboschi 2020-08-07 11:58:36 UTC
Created attachment 1710792 [details]
VMI killed and recreated

Comment 2 Nelly Credi 2020-09-30 12:47:50 UTC
@Israel, @Stu, should this bug be on Virt?
(removing useless events from the log vs making them disappear in the UI)

Comment 4 sgott 2020-10-21 12:45:02 UTC
Moving this BZ to virtualization for proper tracking. This isn't a UX BZ

Comment 7 Roman Mohr 2020-12-02 13:32:35 UTC
If this issue can be reproduced, here is what should happen, to provide hints where to look for a fix:

 1) We check if all datavolumes are imported
 2) if they are not imported, we don't create a pod
 3) once all DVs are done with the import, we create the pod

Comment 8 Kedar Bidarkar 2021-01-27 13:31:40 UTC
Try to reproduce this with CNV-2.6.0

Comment 9 Kedar Bidarkar 2021-01-29 22:38:17 UTC
Summary: Was unable to reproduce this issue on CNV-2.6.0 (cnv/virt-operator/v2.6.0-106)

1) Created a VM( on a cluster in US) from the UI Wizard, using a link from (EMEA)
2) Download took more than 10 mins and no false negative events observed anymore.

Attaching a screenshot shortly.

Comment 12 Kedar Bidarkar 2021-02-17 13:13:34 UTC
Cannot reproduce in the current release CNV-2.6.0


Note You need to log in before you can comment on or make changes to this bug.