Bug 1640505 - VM shows running even when CDI had failed
Summary: VM shows running even when CDI had failed
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 1.2
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 1.3
Assignee: Marc Sluiter
QA Contact: zhe peng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-18 09:05 UTC by Vatsal Parekh
Modified: 2019-01-08 14:27 UTC (History)
9 users (show)

Fixed In Version: 1.3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-08 14:27:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Vatsal Parekh 2018-10-18 09:05:23 UTC
Description of problem:
Just after creating a PVC with CDI annotation, before CDI had completed I attached that PVC to a VM object and started that VM (testing negative flow).
The CDI importer pod gave Error with `unable to write to file`, but more surprisingly the VMI shows in `Running` state.
in the VMI events it does show the below error
```
  Type     Reason              Age                From                                                      Message
  ----     ------              ----               ----                                                      -------
  Normal   SuccessfulCreate    25m                virtualmachine-controller                                 Created virtual machine pod virt-launcher-fedora-vmmwj
  Normal   SuccessfulHandOver  23m                virtualmachine-controller                                 Pod owner ship transferred to the node virt-launcher-fedora-vmmwj
  Normal   Created             23m (x2 over 23m)  virt-handler, cnv-executor-vatsal-test-node1.example.com  VirtualMachineInstance defined.
  Normal   Started             23m                virt-handler, cnv-executor-vatsal-test-node1.example.com  VirtualMachineInstance started.
  Warning  SyncFailed          5m (x16 over 16m)  virt-handler, cnv-executor-vatsal-test-node1.example.com  server error. command Launcher.Sync failed: virError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required')
  Warning  SyncFailed          1m (x3 over 16m)   virt-handler, cnv-executor-vatsal-test-node1.example.com  server error. command Launcher.Sync failed: virError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required')
```

But there's nothing error in the Web Console, it shows running, everything looks fine, even shows metrics for VMI.

Version-Release number of selected component (if applicable):
openshift v3.11.16
CNV: 1.2 from Stage-CDN

How reproducible:


Steps to Reproduce:
1.Create a CDI importer PVC
2.Attach that PVC to a VM before it completes
3.Start the VM

Actual results:
The VMI shows in running stage

Expected results:

Additional info:

Comment 1 Federico Simoncelli 2018-10-18 10:08:42 UTC
Adam, remind me, this is one of those cases we discussed but couldn't handle in 1.2 because DataVolume support is partial?

Comment 2 Fabian Deutsch 2018-10-18 10:14:20 UTC
As noted, this is an unusual flow: The VM was started before CDI finished cloning. This by itself is expected to lead to an error when booting the VM.

There are now a few things:

1. KubeVirt should fail more gracefully with such an error
2. The VMI should not be in a Running state (well maybe, but with error condition)

All in all DV would have avoidded the problem of launchign the VMI before the cloning is completed.

All in all this is something to fix, but not blocking 1.2

Comment 3 Fabian Deutsch 2018-10-18 12:04:54 UTC
Does this work for you, Nelly?

Comment 4 Nelly Credi 2018-10-18 15:05:26 UTC
I wouldnt block either, but should we highlight it in the documentation somehow?

Comment 5 Federico Simoncelli 2018-10-18 15:07:52 UTC
+1 to document this in 1.2 release notes.
Fabian can you work with Pan for that?

Comment 6 Pan Ousley 2018-10-22 19:32:25 UTC
For 1.2 I can put a warning in the Known Issues section of the KBase article, stating something like this:

"After creating a PVC with CDI annotation, if you attach the PVC to a VM object and start the VM before allowing CDI to complete, the VM may erroneously be listed as `Running` with no errors shown in the web console. This issue is being tracked in bug #1640505."


Anything I should add or change in the above description?

Comment 7 Fabian Deutsch 2018-11-05 09:10:28 UTC
Pan, no need to change anything. Sounds good.

Comment 8 Marc Sluiter 2018-11-05 12:44:26 UTC
Vatsal, can you reproduce this error?
'internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required'

While I can reproduce that the VM doesn't boot successfully (I can't access it via console), I don't see any error in the virt-handler and -launcher logs, it successfully syncs the vmi. Without an error, it's impossible to provide a better vmi status :/

Comment 9 Pan Ousley 2018-11-05 15:57:11 UTC
(In reply to Fabian Deutsch from comment #7)
> Pan, no need to change anything. Sounds good.

Thanks! I made the addition: https://access.redhat.com/articles/3500741#Reference

IIUC this is only relevant to 1.2. Let me know if it needs to be added to 1.3.

Comment 12 Fabian Deutsch 2018-11-06 13:55:42 UTC
Pan, we also need it for 1.3.

Comment 13 Pan Ousley 2018-11-06 21:42:35 UTC
Thanks for letting me know. I added it to the 1.3 release notes: https://github.com/openshift/openshift-docs/pull/12518/commits/f62b95f2e091551cfec5a995af623b30ac2e04db

Let me know if anything else is needed. Thanks!

Comment 14 Fabian Deutsch 2018-11-07 11:26:00 UTC
Thanks!

Comment 16 zhe peng 2018-11-12 07:06:51 UTC
I check 1.3 release notes
https://cnv_setup--ocpdocs.netlify.com/openshift-enterprise/latest/cnv_release_notes/cnv_release_notes.html

already have fix, so move it to verified.

Comment 17 Fabian Deutsch 2018-11-13 14:46:24 UTC
restating: This is not a bug.
When KubeVirt consumes a PVC, then we expect that it's ready to use.

DataVolumes are the approach which orchestrate the population of a PV.


Note You need to log in before you can comment on or make changes to this bug.