Bug 2079853
| Summary: | hard resetting the ocp worker hosting a vmi hangs the vmi : stuck in suspend | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | pkomarov |
| Component: | Virtualization | Assignee: | sgott |
| Status: | CLOSED NOTABUG | QA Contact: | Kedar Bidarkar <kbidarka> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.8.4 | CC: | acardace |
| Target Milestone: | --- | Flags: | kbidarka:
needinfo?
(pkomarov) |
| Target Release: | 4.14.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-07-13 13:55:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
]$ oc get vmi -o wide NAME AGE PHASE IP NODENAME READY LIVE-MIGRATABLE PAUSED vm2-rhel85 14s Running 11.xx.yy.zz node-13.redhat.com True True The VM Paused status can be obtained using the above command. Which would show that the VM is PAUSED: True , if the VMI is in paused state. @pkomarov, How long did you wait for the Pod State to Change from "Running" state after you pulled the plug? And we feel this could be an issue with OpenShift. Deferring to 4.13 due to capacity. @pkomarov, Few questions here about this bug, 1) Could you please reply to the above query in comment 4 ? 2) Did you try to do a hard shutdown using ILO/Mgmt console link? What are the symptoms or do we still see this issue when shutting it down using ILO/Mgmt link ? Moving this to CNV 4.14 due to the questions remaining in comment 7 and due to severity and capacity. In order to get Expected Result, probably you need to install using IPI and enable Machine Health checks, to enable HA functionality. Discussed this with Virt Devs and we decided to close the bug due to the above reason. |
Description of problem: hard reset (echo b>/proc/sysrq-trigger) of a openshift node holding the vmi causes the vmi to become suspended and unreachable: How to reproduce : #sshing to a ocp node hosting the vmi: [ocp@titan88 ~]$ ssh core.111.10 (ocp node) echo 'b'>/proc/sysrq-trigger vmi controller-0 hangs : [ocp@titan88 ~]$ virtctl console controller-0 Successfully connected to controller-0 console. The escape sequence is ^] #no response... [ocp@titan88 ~]$ oc get pods -o wide virt-launcher-controller-0-pr4nz 1/1 Running 0 47m 10.129.0.92 ostest-master-1 <none> <none> [ocp@titan88 ~]$ oc exec -it virt-launcher-controller-0-pr4nz bash kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] – [COMMAND] instead. [root@controller-0 /]# virsh list Id Name State --------------------------------------- 1 openstack_controller-0 paused