Bug 1867122

Summary: If importing VM disks from URL takes more than 10 minutes, VMI get destroyed and recreated generating noise in user facing events
Product: Container Native Virtualization (CNV) Reporter: Simone Tiraboschi <stirabos>
Component: VirtualizationAssignee: sgott
Status: CLOSED WORKSFORME QA Contact: Israel Pinto <ipinto>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.4.0CC: cnv-qe-bugs, fdeutsch, ipinto, kbidarka, mrashish, ncredi, rmohr, sgott
Target Milestone: ---   
Target Release: future   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-17 13:12:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
VMI killed and recreated none

Description Simone Tiraboschi 2020-08-07 11:57:47 UTC
Description of problem:
I was trying to reproduce https://bugzilla.redhat.com/1862701 exactly as for the attached screenshots.

In my case, for some strange reason (a wrong/overloaded mirror???), downloading the disk image from https://dl.fedoraproject.org/pub/fedora/linux/releases/32/Cloud/x86_64/images/Fedora-Cloud-Base-32-1.6.x86_64.qcow2 took more than 10 minutes:

 NAME                                            PHASE              PROGRESS   RESTARTS   AGE
 datavolume.cdi.kubevirt.io/fedora-vm-rootdisk   ImportInProgress   86.25%     0          10m


So, after 10 minutes, a readiness probe error got triggered on virt-launcher pod and this caused its VMI object to be destroyed and recreated.

A the end the VM successfully started as expected but we have a lot of noise in user facing events.
See the attached screenshot.

I also got VMI related error events with:
(combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2020-08-07T10:23:03.895236Z qemu-kvm: -blockdev {\"driver\":\"file\",\"filename\":\"/var/run/kubevirt-private/vmi-disks/rootdisk/disk.img\",\"node-name\":\"libvirt-2-storage\",\"auto-read-only\":true,\"discard\":\"unmap\"}: Could not open '/var/run/kubevirt-private/vmi-disks/rootdisk/disk.img': Permission denied')"


Version-Release number of selected component (if applicable):


How reproducible:
100%


Steps to Reproduce:
1. try to start a VM from UI wizard specifying a URL source
2. ensure that CDI takes more than 10 minutes to download the source disk
3.

Actual results:
- many VMI related error events
- VMI got deleted and recreated after 10 minutes

Expected results:
No false negative events if the download is progressing

Additional info:

Comment 1 Simone Tiraboschi 2020-08-07 11:58:36 UTC
Created attachment 1710792 [details]
VMI killed and recreated

Comment 2 Nelly Credi 2020-09-30 12:47:50 UTC
@Israel, @Stu, should this bug be on Virt?
(removing useless events from the log vs making them disappear in the UI)

Comment 4 sgott 2020-10-21 12:45:02 UTC
Moving this BZ to virtualization for proper tracking. This isn't a UX BZ

Comment 7 Roman Mohr 2020-12-02 13:32:35 UTC
If this issue can be reproduced, here is what should happen, to provide hints where to look for a fix:

 1) We check if all datavolumes are imported
 2) if they are not imported, we don't create a pod
 3) once all DVs are done with the import, we create the pod

Comment 8 Kedar Bidarkar 2021-01-27 13:31:40 UTC
Try to reproduce this with CNV-2.6.0

Comment 9 Kedar Bidarkar 2021-01-29 22:38:17 UTC
Summary: Was unable to reproduce this issue on CNV-2.6.0 (cnv/virt-operator/v2.6.0-106)

1) Created a VM( on a cluster in US) from the UI Wizard, using a link from (EMEA)
2) Download took more than 10 mins and no false negative events observed anymore.

Attaching a screenshot shortly.

Comment 12 Kedar Bidarkar 2021-02-17 13:13:34 UTC
Cannot reproduce in the current release CNV-2.6.0