Bug 2079853

Summary: hard resetting the ocp worker hosting a vmi hangs the vmi : stuck in suspend
Product: Container Native Virtualization (CNV) Reporter: pkomarov
Component: VirtualizationAssignee: sgott
Status: CLOSED NOTABUG QA Contact: Kedar Bidarkar <kbidarka>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.8.4CC: acardace
Target Milestone: ---Flags: kbidarka: needinfo? (pkomarov)
Target Release: 4.14.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-07-13 13:55:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description pkomarov 2022-04-28 11:38:36 UTC
Description of problem:
hard reset (echo b>/proc/sysrq-trigger) of a openshift node holding the vmi causes the vmi to become suspended and unreachable:  

How to reproduce : 

#sshing to a ocp node hosting the vmi: 

[ocp@titan88 ~]$ ssh core.111.10  (ocp node)

echo 'b'>/proc/sysrq-trigger

 

vmi controller-0 hangs :

[ocp@titan88 ~]$ virtctl console controller-0
Successfully connected to controller-0 console. The escape sequence is ^]
                     #no response...

[ocp@titan88 ~]$  oc get pods  -o wide

virt-launcher-controller-0-pr4nz                                  1/1     Running     0          47m    10.129.0.92      ostest-master-1   <none>           <none>

[ocp@titan88 ~]$ oc exec -it virt-launcher-controller-0-pr4nz bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] – [COMMAND] instead.
[root@controller-0 /]# virsh list
 Id   Name                     State
---------------------------------------
 1    openstack_controller-0   paused

Comment 3 Kedar Bidarkar 2022-05-05 11:24:38 UTC
]$ oc get vmi -o wide 
NAME         AGE   PHASE     IP             NODENAME                                         READY   LIVE-MIGRATABLE   PAUSED
vm2-rhel85   14s   Running   11.xx.yy.zz   node-13.redhat.com   True    True    

The VM Paused status can be obtained using the above command. 
Which would show that the VM is PAUSED: True , if the VMI is in paused state.

Comment 4 Kedar Bidarkar 2022-09-14 12:37:09 UTC
@pkomarov, How long did you wait for the Pod State to Change from "Running" state after you pulled the plug? 

And we feel this could be an issue with OpenShift.

Comment 5 Antonio Cardace 2022-09-28 12:07:02 UTC
Deferring to 4.13 due to capacity.

Comment 7 Kedar Bidarkar 2023-01-25 13:18:04 UTC
@pkomarov,  Few questions here about this bug,
1) Could you please reply to the above query in comment 4 ?
2) Did you try to do a hard shutdown using ILO/Mgmt console link? What are the symptoms or do we still see this issue when shutting it down using ILO/Mgmt link ?

Comment 8 Kedar Bidarkar 2023-01-25 13:19:45 UTC
Moving this to CNV 4.14 due to the questions remaining in comment 7 and due to severity and capacity.

Comment 9 Kedar Bidarkar 2023-07-13 13:55:12 UTC
In order to get Expected Result, probably you need to install using IPI and enable Machine Health checks, to enable HA functionality.

Discussed this with Virt Devs and we decided to close the bug due to the above reason.