Bug 2040766
| Summary: | A crashed Windows VM cannot be restarted with virtctl or the UI | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | pmoses | ||||||
| Component: | Virtualization | Assignee: | Prita Narayan <prnaraya> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | zhe peng <zpeng> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 4.8.8 | CC: | acardace, cnv-qe-bugs, ctomasko, fdeutsch, gveitmic, kbidarka, sgott, ycui | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 4.11.0 | ||||||||
| Hardware: | All | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | hco-bundle-registry-container-v4.11.0-491 | Doc Type: | Known Issue | ||||||
| Doc Text: |
KubeVirt prevents a VM stop request from being processed multiple times. As a consequence, if a VM hangs during shutdown, then it is not possible to issue a new request for immediate shutdown, for example, by using the "--force --grace-period 0" flags.
A VM stuck in terminating state cannot be easily stopped from the UI. However, it is possible to directly delete the virt-launcher pod.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2022-09-14 19:28:30 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
pmoses
2022-01-14 16:29:54 UTC
There exists flags for virtctl (--grace-period 0 --force) that should halt the machine. Did you try that? Created attachment 1851600 [details]
launcher pod log
Created attachment 1851601 [details]
UI details
Yes. It seems the flag of force and grace-period are only valid with restart. Either way, the results are the same:
[pmo@pmo-rhel ~]$ virtctl version
Client Version: version.Info{GitVersion:"v0.30.7", GitCommit:"af8ac92fbb1fc4c1c4fda6a2d6ddb04eaded797e", GitTreeState:"clean", BuildDate:"2021-06-07T10:07:04Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}
[pmo@pmo-rhel ~]$ virtctl restart win10 --force --grace-period=0
Error restarting VirtualMachine, Operation cannot be fulfilled on virtualmachine.kubevirt.io "win10": Halted does not support manual restart requests
[pmo@pmo-rhel ~]$ virtctl stop win10 --grace-period=0 --force
unknown flag: --grace-period
[pmo@pmo-rhel ~]$ virtctl stop win10
Error stopping VirtualMachine Operation cannot be fulfilled on virtualmachine.kubevirt.io "win10": Halted does not support manual stop requests
Raising the severity of this because it's hard to avoid once it's been triggered. It can be done but that requires deleting the pod. The real bug here is that KubeVirt should honor a second halt request if the user issues a newer shorter timeout. (In reply to sgott from comment #5) > The real bug here is that KubeVirt should honor a second halt request if the > user issues a newer shorter timeout. One interesting thing: if the VM is stuck on boot (i.e. pause on SeaBIOS), the second halt request returns the same error in the CLI, but the VM is actually shutdown immediatly. This is on 4.9.21 with 4.9.2, windows vm. Unfortunately deleting the virt-launcher pod does not work, the pod is gone but the VMI is still there. # oc get vmi NAME AGE PHASE IP NODENAME READY win2k16-happy-pelican 11m Running 10.129.2.37 worker-1.lab-cluster.toca.local False # oc get pods | grep virt-launcher # That vmi stays there, not cleaning up. Force deleting it does not work too, hangs forever without doing anything. # oc delete vmi win2k16-happy-pelican --force --grace-period=0 warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. virtualmachineinstance.kubevirt.io "win2k16-happy-pelican" force deleted ^C The only thing I can find that really works and makes the cleanup happen is to finish that job that was initially started: kill qemu process on the node. Added Release note > known issue You cannot attempt to stop a VM multiple times because KubeVirt prevents multiple stop attempts. If a VM crashes during shutdown, then you cannot issue a new stop attempt and you cannot easily remove the VM from the UI. (BZ#2040766) https://github.com/openshift/openshift-docs/pull/42530 https://deploy-preview-42530--osdocs.netlify.app/openshift-enterprise/latest/virt/virt-4-10-release-notes#virt-4-10-known-issues Future link: After the OpenShift Virtualization 4.10 releases, you can find the release notes here: https://docs.openshift.com/container-platform/4.10/virt/virt-4-10-release-notes.html or on the portal, https://access.redhat.com/documentation/en-us/openshift_container_platform/4.10 This comment was flagged a spam, view the edit history to see the original text if required. verify with build:
Server Version: 4.11.0-fc.3
$ virtctl version
Client Version: version.Info{GitVersion:"v0.53.2-16-gd3854bb91", GitCommit:"d3854bb91a447946d3ef626f243e001c4766d5a4", GitTreeState:"clean", BuildDate:"2022-06-19T10:27:57Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{GitVersion:"v0.53.2-37-gd8a6ac7e7", GitCommit:"d8a6ac7e78042ed77d99601fce197cae58d16f5a", GitTreeState:"clean", BuildDate:"2022-06-26T10:19:51Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
step:
1. create a windows vm
2. start vm, within vm, run cmd "TASKKILL /IM svchost.exe /F" to trigger a windows BSoD
3. use vitctl to stop or restart vm
stop-1:
$ virtctl stop vm-win10 --grace-period=0 --force
VM vm-win10 was scheduled to stop
$ oc get vm
NAME AGE STATUS READY
vm-win10 31m Stopped False
stop-2:
$ virtctl stop vm-win10
VM vm-win10 was scheduled to stop
$ oc get vm
NAME AGE STATUS READY
vm-win10 33m Stopped False
restart:
$ virtctl restart vm-win10 --force --grace-period=0
VM vm-win10 was scheduled to restart
$ oc get vm
NAME AGE STATUS READY
vm-win10 27m Running True
also test vm with RunStrategy setting
test "Manual" and "Halted", worked as expect.
move to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.11.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6526 |