Bug 1926746 - VM connect to SSH and consoles is not responsive after VM is up for 25 days
Summary: VM connect to SSH and consoles is not responsive after VM is up for 25 days
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.6.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Antonio Cardace
QA Contact: Ying Cui
URL:
Whiteboard:
Depends On: 1901335
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-09 12:01 UTC by Israel Pinto
Modified: 2023-09-15 01:00 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 14:23:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
virt-launcher and virt-handler logs (141.43 KB, application/zip)
2021-02-09 12:09 UTC, Israel Pinto
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 4840 0 None closed Generate K8s events on IO errors 2021-04-20 14:31:33 UTC
Red Hat Product Errata RHSA-2021:2920 0 None None None 2021-07-27 14:25:37 UTC

Description Israel Pinto 2021-02-09 12:01:25 UTC
Description of problem:
Have VM running for 25 days in longevity test, checking connectivity to VM via: SSH, consoles: serial,VNC
VM is not unresponsive.

Soft reboot with virsh reset did not helped vm is stuck: 
cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)"


Version-Release number of selected component (if applicable):
CNV 2.6

How reproducible:


Steps to Reproduce:
1.Create VM with disk on OCS and run it for more then 25 days
2. check connectivity to VM

Actual results:
Can't connect to SSH,Console

Adding logs


Additional info:

Comment 1 Israel Pinto 2021-02-09 12:09:48 UTC
Created attachment 1755903 [details]
virt-launcher and virt-handler logs

Comment 2 Roman Mohr 2021-02-09 12:10:06 UTC
More notes on this one:

Before the reboot we could kind-of connect to the VM. VNC and console connections were established and the inputs reached the VMs. On the console there was nothing returned and on VNC it got stuck after typing in the user name and hitting enter. In combination with the restart issue, it looks like the storage may have become unavailable. This may resolve into a more discoverable scenario once we have done the switch to setting a "stop" error policy once the disks become unresponsive.

Comment 3 Fabian Deutsch 2021-03-23 16:15:14 UTC
I'm a little worried here that we can not connect to SSH

@ipinto via which network/NIC did you try to connect to the VM?
Was the VM migrated?

Comment 4 Roman Mohr 2021-03-23 19:09:23 UTC
(In reply to Fabian Deutsch from comment #3)
> I'm a little worried here that we can not connect to SSH
> 
> @ipinto via which network/NIC did you try to connect to the VM?
> Was the VM migrated?

The VM was reachable from the serial console for instance. You could even see the username prompt. Also entering the username and password worked, but as soon as it tried to authenticate, it got stuck. This is a typical behaviour when the disk is blocked. I could not find any other hint that something was wrong, but also not a clear prove that it was the disk before the VM got rebooted.

Comment 5 Antonio Cardace 2021-03-24 09:52:08 UTC
Hi Israel, can you give me access to the cluster this VM is running on? Thanks.

Comment 6 sgott 2021-04-13 14:40:53 UTC
Israel, there's still an open-needinfo on this. Does Antonio have what he needs?

Comment 11 Ying Cui 2021-07-13 10:51:26 UTC
VERIFIED this bug on CNV 4.8.0-372 and OCP 4.8.0-fc.7
We created a few VMs(Centos7, RHEL8, Fedora 33, Windows2k16, Windows10, Windows19) running on longevity env., and did live migration/snapshot actions on the VMs. The env. is running 30+ days.

The issue in bug description can NOT be reproduced on this env.

Comment 14 errata-xmlrpc 2021-07-27 14:23:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2920

Comment 15 Red Hat Bugzilla 2023-09-15 01:00:50 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.