Description of problem: A customer tried to stop hundreds of VMs which were in scheduling and pending states using virtctl. This did not work, because virtctl said that they are not running. This is true, but even if they are not running, virtctl should be capable of setting the state to stopped. Version-Release number of selected component (if applicable): 2.6.3 How reproducible: Always Steps to Reproduce: 1. Start a VM 2. Wait for it to be in pending or scheduling state 3. Stop the VM with virtctl stop … Actual results: virtctl stop … fails with "VM is not running" Expected results: virtctl stop … succeeds Additional info:
If I recall the design choice driving this behavior is that deletion of a VMI that's pending or scheduling can have potential race conditions. If a pod is created at the same time as the VMI that owns it is deleted, we can run into a situation where a pod is orphaned. That said, this behavior should be revisited to see if it can be refined.
Stu, We fix this issue in: https://bugzilla.redhat.com/show_bug.cgi?id=1893790
Stu or @lpivarc can you confirm that bug #1893790 - which was about halting in pending state - will allow a user to halt a VM in whatever state it is (except when it's already declared to be halted)? Then we should be good and close this as a dupe.
I don't believe these issues are the same. https://bugzilla.redhat.com/show_bug.cgi?id=1893790 is a specific case for the pending state due to DataVolumes. The issue in this BZ is slightly more general. That said, I believe this BZ might be a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1900631 where we introduced the "--force" flag to virtctl for stopping VMs. Ashley, do either of the PRs introduced in fixing https://bugzilla.redhat.com/show_bug.cgi?id=1900631 allow us to halt a pending VMI?
I agree with Stu, the error comes from the virt-api where we deny or accept the state request of stopping. The 1900631 or the current PR that is addressing it doesn't solve this issue either.
No neither of my my PRs change any of the logic around when stopping a VMI is allowed. It uses all the same logic as stop and if `--force --grace-period=0` is specified it updates the VMIs `terminationGracePeriodSeconds` to be 0.
Thanks, then IIUIC today it is not allowed to set a VM as not running, if it is in a Failed, Scheduling, or Pending state. In future this should be permitted.
I am unable to reproduce this bug. The only way the response "VM is not running" would be returned for statuses `Scheduling` and `Pending` would be if there is an error with the api returning VMI objects [1]. This can be updated to a better response. The code that checks the status of the VMI and returns "VMI not running" is if the status is `Failed` [2] both `Scheduling` and `Pending` will try to be stopped. This was also the case when I tested. I was able to successfully stop a VMI in `Scheduling` and `Pending`. Do we want to be able to stop a `Failed` VMI? It is mentioned in the title but not the description. Then in [2] I would need to change the `vmi.IsFinal()` to only check for VMI Succeeded. I can also update the response returned when there is an issue getting the VMI from the API. [1] https://github.com/kubevirt/kubevirt/blob/v0.36.3/pkg/virt-api/rest/subresource.go#L656-L665 [2] https://github.com/kubevirt/kubevirt/blob/v0.36.3/pkg/virt-api/rest/subresource.go#L666
Good to know that Schduling and Pending VMs can already be stopped. WRT Failed - There is value in keeping a VMI if it's failed, because it might contain information about why it failed. At the same time it would then be good to still allow virtctl to shut it down. COuld "virtctl stop failed_vm" lead to a message that is telling the user the VMI is in a failed state, in order to stop it "--force" must be used? > I can also update the response returned when there is an issue getting the VMI from the API. This sounds reasonable as well.
There could be many reasons why a VM could have got into FAILED state. Though in this case was able to successfully stop the VM [kbidarka@localhost fedora]$ oc get vmi NAME AGE PHASE IP NODENAME vm12-fedora 3m6s Failed [kbidarka@localhost fedora]$ virtctl stop vm12-fedora --force --grace-period 0 VM vm12-fedora was scheduled to stop [kbidarka@localhost fedora]$ oc get vmi No resources found in default namespace. Was also able to VERIFY that, Pending and Scheduling VMI also can stop. VERIFIED with virt-operator-container-v4.8.2-2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.8.2 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3598