Bug 2079853 - hard resetting the ocp worker hosting a vmi hangs the vmi : stuck in suspend [NEEDINFO]
Summary: hard resetting the ocp worker hosting a vmi hangs the vmi : stuck in suspend
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.8.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.14.0
Assignee: sgott
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-28 11:38 UTC by pkomarov
Modified: 2023-07-13 13:55 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-13 13:55:12 UTC
Target Upstream Version:
Embargoed:
kbidarka: needinfo? (pkomarov)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNV-17879 0 None None None 2022-12-15 08:39:15 UTC

Description pkomarov 2022-04-28 11:38:36 UTC
Description of problem:
hard reset (echo b>/proc/sysrq-trigger) of a openshift node holding the vmi causes the vmi to become suspended and unreachable:  

How to reproduce : 

#sshing to a ocp node hosting the vmi: 

[ocp@titan88 ~]$ ssh core.111.10  (ocp node)

echo 'b'>/proc/sysrq-trigger

 

vmi controller-0 hangs :

[ocp@titan88 ~]$ virtctl console controller-0
Successfully connected to controller-0 console. The escape sequence is ^]
                     #no response...

[ocp@titan88 ~]$  oc get pods  -o wide

virt-launcher-controller-0-pr4nz                                  1/1     Running     0          47m    10.129.0.92      ostest-master-1   <none>           <none>

[ocp@titan88 ~]$ oc exec -it virt-launcher-controller-0-pr4nz bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] – [COMMAND] instead.
[root@controller-0 /]# virsh list
 Id   Name                     State
---------------------------------------
 1    openstack_controller-0   paused

Comment 3 Kedar Bidarkar 2022-05-05 11:24:38 UTC
]$ oc get vmi -o wide 
NAME         AGE   PHASE     IP             NODENAME                                         READY   LIVE-MIGRATABLE   PAUSED
vm2-rhel85   14s   Running   11.xx.yy.zz   node-13.redhat.com   True    True    

The VM Paused status can be obtained using the above command. 
Which would show that the VM is PAUSED: True , if the VMI is in paused state.

Comment 4 Kedar Bidarkar 2022-09-14 12:37:09 UTC
@pkomarov, How long did you wait for the Pod State to Change from "Running" state after you pulled the plug? 

And we feel this could be an issue with OpenShift.

Comment 5 Antonio Cardace 2022-09-28 12:07:02 UTC
Deferring to 4.13 due to capacity.

Comment 7 Kedar Bidarkar 2023-01-25 13:18:04 UTC
@pkomarov,  Few questions here about this bug,
1) Could you please reply to the above query in comment 4 ?
2) Did you try to do a hard shutdown using ILO/Mgmt console link? What are the symptoms or do we still see this issue when shutting it down using ILO/Mgmt link ?

Comment 8 Kedar Bidarkar 2023-01-25 13:19:45 UTC
Moving this to CNV 4.14 due to the questions remaining in comment 7 and due to severity and capacity.

Comment 9 Kedar Bidarkar 2023-07-13 13:55:12 UTC
In order to get Expected Result, probably you need to install using IPI and enable Machine Health checks, to enable HA functionality.

Discussed this with Virt Devs and we decided to close the bug due to the above reason.


Note You need to log in before you can comment on or make changes to this bug.