Description of problem: Just after creating a PVC with CDI annotation, before CDI had completed I attached that PVC to a VM object and started that VM (testing negative flow). The CDI importer pod gave Error with `unable to write to file`, but more surprisingly the VMI shows in `Running` state. in the VMI events it does show the below error ``` Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 25m virtualmachine-controller Created virtual machine pod virt-launcher-fedora-vmmwj Normal SuccessfulHandOver 23m virtualmachine-controller Pod owner ship transferred to the node virt-launcher-fedora-vmmwj Normal Created 23m (x2 over 23m) virt-handler, cnv-executor-vatsal-test-node1.example.com VirtualMachineInstance defined. Normal Started 23m virt-handler, cnv-executor-vatsal-test-node1.example.com VirtualMachineInstance started. Warning SyncFailed 5m (x16 over 16m) virt-handler, cnv-executor-vatsal-test-node1.example.com server error. command Launcher.Sync failed: virError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required') Warning SyncFailed 1m (x3 over 16m) virt-handler, cnv-executor-vatsal-test-node1.example.com server error. command Launcher.Sync failed: virError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required') ``` But there's nothing error in the Web Console, it shows running, everything looks fine, even shows metrics for VMI. Version-Release number of selected component (if applicable): openshift v3.11.16 CNV: 1.2 from Stage-CDN How reproducible: Steps to Reproduce: 1.Create a CDI importer PVC 2.Attach that PVC to a VM before it completes 3.Start the VM Actual results: The VMI shows in running stage Expected results: Additional info:
Adam, remind me, this is one of those cases we discussed but couldn't handle in 1.2 because DataVolume support is partial?
As noted, this is an unusual flow: The VM was started before CDI finished cloning. This by itself is expected to lead to an error when booting the VM. There are now a few things: 1. KubeVirt should fail more gracefully with such an error 2. The VMI should not be in a Running state (well maybe, but with error condition) All in all DV would have avoidded the problem of launchign the VMI before the cloning is completed. All in all this is something to fix, but not blocking 1.2
Does this work for you, Nelly?
I wouldnt block either, but should we highlight it in the documentation somehow?
+1 to document this in 1.2 release notes. Fabian can you work with Pan for that?
For 1.2 I can put a warning in the Known Issues section of the KBase article, stating something like this: "After creating a PVC with CDI annotation, if you attach the PVC to a VM object and start the VM before allowing CDI to complete, the VM may erroneously be listed as `Running` with no errors shown in the web console. This issue is being tracked in bug #1640505." Anything I should add or change in the above description?
Pan, no need to change anything. Sounds good.
Vatsal, can you reproduce this error? 'internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required' While I can reproduce that the VM doesn't boot successfully (I can't access it via console), I don't see any error in the virt-handler and -launcher logs, it successfully syncs the vmi. Without an error, it's impossible to provide a better vmi status :/
(In reply to Fabian Deutsch from comment #7) > Pan, no need to change anything. Sounds good. Thanks! I made the addition: https://access.redhat.com/articles/3500741#Reference IIUC this is only relevant to 1.2. Let me know if it needs to be added to 1.3.
Pan, we also need it for 1.3.
Thanks for letting me know. I added it to the 1.3 release notes: https://github.com/openshift/openshift-docs/pull/12518/commits/f62b95f2e091551cfec5a995af623b30ac2e04db Let me know if anything else is needed. Thanks!
Thanks!
I check 1.3 release notes https://cnv_setup--ocpdocs.netlify.com/openshift-enterprise/latest/cnv_release_notes/cnv_release_notes.html already have fix, so move it to verified.
restating: This is not a bug. When KubeVirt consumes a PVC, then we expect that it's ready to use. DataVolumes are the approach which orchestrate the population of a PV.