1972819 – Failed, Pending and Scheduling VMs can not be stopped

Bug 1972819 - Failed, Pending and Scheduling VMs can not be stopped

Summary: Failed, Pending and Scheduling VMs can not be stopped

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Virtualization
Sub Component:
Version:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.8.2
Assignee:	sgott
QA Contact:	Israel Pinto
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-16 16:50 UTC by Fabian Deutsch
Modified:	2024-10-01 18:42 UTC (History)
CC List:	5 users (show)
Fixed In Version:	virt-operator-container-v4.8.2-2 hco-bundle-registry-container-v4.8.2-9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-09-21 11:06:44 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt kubevirt pull 5953	None	open	Update stop vm responses	2021-06-30 07:12:34 UTC
Red Hat Issue Tracker	CNV-12548	None	None	None	2024-10-01 18:42:11 UTC
Red Hat Product Errata	RHSA-2021:3598	None	None	None	2021-09-21 11:07:53 UTC

Description Fabian Deutsch 2021-06-16 16:50:53 UTC

Description of problem:
A customer tried to stop hundreds of VMs which were in scheduling and pending states using virtctl.
This did not work, because virtctl said that they are not running. This is true, but even if they are not running, virtctl should be capable of setting the state to stopped.

Version-Release number of selected component (if applicable):
2.6.3

How reproducible:
Always

Steps to Reproduce:
1. Start a VM
2. Wait for it to be in pending or scheduling state
3. Stop the VM with virtctl stop …

Actual results:
virtctl stop … fails with "VM is not running"

Expected results:
virtctl stop … succeeds

Additional info:

Comment 1 sgott 2021-06-16 16:59:57 UTC

If I recall the design choice driving this behavior is that deletion of a VMI that's pending or scheduling can have potential race conditions. If a pod is created at the same time as the VMI that owns it is deleted, we can run into a situation where a pod is orphaned.

That said, this behavior should be revisited to see if it can be refined.

Comment 2 Israel Pinto 2021-06-17 06:17:59 UTC

Stu,
We fix this issue in: https://bugzilla.redhat.com/show_bug.cgi?id=1893790

Comment 3 Fabian Deutsch 2021-06-17 07:21:10 UTC

Stu or @lpivarc can you confirm that bug #1893790 - which was about halting in pending state - will allow a user to halt a VM in whatever state it is (except when it's already declared to be halted)?

Then we should be good and close this as a dupe.

Comment 4 sgott 2021-06-17 11:27:43 UTC

I don't believe these issues are the same. https://bugzilla.redhat.com/show_bug.cgi?id=1893790 is a specific case for the pending state due to DataVolumes. The issue in this BZ is slightly more general.

That said, I believe this BZ might be a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1900631 where we introduced the "--force" flag to virtctl for stopping VMs.

Ashley, do either of the PRs introduced in fixing https://bugzilla.redhat.com/show_bug.cgi?id=1900631 allow us to halt a pending VMI?

Comment 5 lpivarc 2021-06-17 11:37:02 UTC

I agree with Stu, the error comes from the virt-api where we deny or accept the state request of stopping. The 1900631 or the current PR that is addressing it doesn't solve this issue either.

Comment 6 aschuett 2021-06-18 09:14:56 UTC

No neither of my my PRs change any of the logic around when stopping a VMI is allowed. It uses all the same logic as stop and if `--force --grace-period=0` is specified it updates the VMIs `terminationGracePeriodSeconds` to be 0.

Comment 7 Fabian Deutsch 2021-06-18 09:25:24 UTC

Thanks, then IIUIC today it is not allowed to set a VM as not running, if it is in a Failed, Scheduling, or Pending state.

In future this should be permitted.

Comment 9 aschuett 2021-06-29 08:36:51 UTC

I am unable to reproduce this bug. The only way the response "VM is not running" would be returned for statuses `Scheduling` and `Pending` would be if there is an error with the api returning VMI objects [1]. This can be updated to a better response. The code that checks the status of the VMI and returns "VMI not running" is if the status is `Failed` [2]  both `Scheduling` and `Pending` will try to be stopped. This was also the case when I tested. I was able to successfully stop a VMI in `Scheduling` and `Pending`.

Do we want to be able to stop a `Failed` VMI? It is mentioned in the title but not the description. Then in [2] I would need to change the `vmi.IsFinal()` to only check for VMI Succeeded. I can also update the response returned when there is an issue getting the VMI from the API. 

[1] https://github.com/kubevirt/kubevirt/blob/v0.36.3/pkg/virt-api/rest/subresource.go#L656-L665
[2] https://github.com/kubevirt/kubevirt/blob/v0.36.3/pkg/virt-api/rest/subresource.go#L666

Comment 10 Fabian Deutsch 2021-06-29 09:07:37 UTC

Good to know that Schduling and Pending VMs can already be stopped.

WRT Failed - There is value in keeping a VMI if it's failed, because it might contain information about why it failed. At the same time it would then be good to still allow virtctl to shut it down.
COuld "virtctl stop failed_vm" lead to a message that is telling the user the VMI is in a failed state, in order to stop it "--force" must be used?

> I can also update the response returned when there is an issue getting the VMI from the API. 

This sounds reasonable as well.

Comment 14 Kedar Bidarkar 2021-09-03 12:07:58 UTC

There could be many reasons why a VM could have got into FAILED state.
Though in this case was able to successfully stop the VM 

[kbidarka@localhost fedora]$ oc get vmi 
NAME          AGE    PHASE     IP             NODENAME
vm12-fedora   3m6s   Failed

[kbidarka@localhost fedora]$ virtctl stop vm12-fedora --force --grace-period 0
VM vm12-fedora was scheduled to stop

[kbidarka@localhost fedora]$ oc get vmi 
No resources found in default namespace.


Was also able to VERIFY that, Pending and Scheduling VMI also can stop.

VERIFIED with virt-operator-container-v4.8.2-2

Comment 19 errata-xmlrpc 2021-09-21 11:06:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.2 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3598

Note You need to log in before you can comment on or make changes to this bug.