Description of problem: If a Windows VM crashes or becomes unresponsive, before a host agent is responding, there is not an apparent way to stop the VM. virtctl will respond with "halted does not support manual restart requests" Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Blue screen a Windows VM 2. Attempt to stop VM 3. VM stays up (can view in the console), neither the UI nor virtctl will properly halt the machine. Actual results: A manageable way for end-users to restart Windows VMs that are crashed without the host agent reporting back to the platform. Expected results: A manual/force power off of VM without deleting it. Additional info:
There exists flags for virtctl (--grace-period 0 --force) that should halt the machine. Did you try that?
Created attachment 1851600 [details] launcher pod log
Created attachment 1851601 [details] UI details
Yes. It seems the flag of force and grace-period are only valid with restart. Either way, the results are the same: [pmo@pmo-rhel ~]$ virtctl version Client Version: version.Info{GitVersion:"v0.30.7", GitCommit:"af8ac92fbb1fc4c1c4fda6a2d6ddb04eaded797e", GitTreeState:"clean", BuildDate:"2021-06-07T10:07:04Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"} [pmo@pmo-rhel ~]$ virtctl restart win10 --force --grace-period=0 Error restarting VirtualMachine, Operation cannot be fulfilled on virtualmachine.kubevirt.io "win10": Halted does not support manual restart requests [pmo@pmo-rhel ~]$ virtctl stop win10 --grace-period=0 --force unknown flag: --grace-period [pmo@pmo-rhel ~]$ virtctl stop win10 Error stopping VirtualMachine Operation cannot be fulfilled on virtualmachine.kubevirt.io "win10": Halted does not support manual stop requests
Raising the severity of this because it's hard to avoid once it's been triggered. It can be done but that requires deleting the pod. The real bug here is that KubeVirt should honor a second halt request if the user issues a newer shorter timeout.
(In reply to sgott from comment #5) > The real bug here is that KubeVirt should honor a second halt request if the > user issues a newer shorter timeout. One interesting thing: if the VM is stuck on boot (i.e. pause on SeaBIOS), the second halt request returns the same error in the CLI, but the VM is actually shutdown immediatly. This is on 4.9.21 with 4.9.2, windows vm. Unfortunately deleting the virt-launcher pod does not work, the pod is gone but the VMI is still there. # oc get vmi NAME AGE PHASE IP NODENAME READY win2k16-happy-pelican 11m Running 10.129.2.37 worker-1.lab-cluster.toca.local False # oc get pods | grep virt-launcher # That vmi stays there, not cleaning up. Force deleting it does not work too, hangs forever without doing anything. # oc delete vmi win2k16-happy-pelican --force --grace-period=0 warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. virtualmachineinstance.kubevirt.io "win2k16-happy-pelican" force deleted ^C The only thing I can find that really works and makes the cleanup happen is to finish that job that was initially started: kill qemu process on the node.
Added Release note > known issue You cannot attempt to stop a VM multiple times because KubeVirt prevents multiple stop attempts. If a VM crashes during shutdown, then you cannot issue a new stop attempt and you cannot easily remove the VM from the UI. (BZ#2040766) https://github.com/openshift/openshift-docs/pull/42530 https://deploy-preview-42530--osdocs.netlify.app/openshift-enterprise/latest/virt/virt-4-10-release-notes#virt-4-10-known-issues Future link: After the OpenShift Virtualization 4.10 releases, you can find the release notes here: https://docs.openshift.com/container-platform/4.10/virt/virt-4-10-release-notes.html or on the portal, https://access.redhat.com/documentation/en-us/openshift_container_platform/4.10
This comment was flagged a spam, view the edit history to see the original text if required.
verify with build: Server Version: 4.11.0-fc.3 $ virtctl version Client Version: version.Info{GitVersion:"v0.53.2-16-gd3854bb91", GitCommit:"d3854bb91a447946d3ef626f243e001c4766d5a4", GitTreeState:"clean", BuildDate:"2022-06-19T10:27:57Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{GitVersion:"v0.53.2-37-gd8a6ac7e7", GitCommit:"d8a6ac7e78042ed77d99601fce197cae58d16f5a", GitTreeState:"clean", BuildDate:"2022-06-26T10:19:51Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"} step: 1. create a windows vm 2. start vm, within vm, run cmd "TASKKILL /IM svchost.exe /F" to trigger a windows BSoD 3. use vitctl to stop or restart vm stop-1: $ virtctl stop vm-win10 --grace-period=0 --force VM vm-win10 was scheduled to stop $ oc get vm NAME AGE STATUS READY vm-win10 31m Stopped False stop-2: $ virtctl stop vm-win10 VM vm-win10 was scheduled to stop $ oc get vm NAME AGE STATUS READY vm-win10 33m Stopped False restart: $ virtctl restart vm-win10 --force --grace-period=0 VM vm-win10 was scheduled to restart $ oc get vm NAME AGE STATUS READY vm-win10 27m Running True also test vm with RunStrategy setting test "Manual" and "Halted", worked as expect. move to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.11.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6526