Description of problem: Have VM running for 25 days in longevity test, checking connectivity to VM via: SSH, consoles: serial,VNC VM is not unresponsive. Soft reboot with virsh reset did not helped vm is stuck: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)" Version-Release number of selected component (if applicable): CNV 2.6 How reproducible: Steps to Reproduce: 1.Create VM with disk on OCS and run it for more then 25 days 2. check connectivity to VM Actual results: Can't connect to SSH,Console Adding logs Additional info:
Created attachment 1755903 [details] virt-launcher and virt-handler logs
More notes on this one: Before the reboot we could kind-of connect to the VM. VNC and console connections were established and the inputs reached the VMs. On the console there was nothing returned and on VNC it got stuck after typing in the user name and hitting enter. In combination with the restart issue, it looks like the storage may have become unavailable. This may resolve into a more discoverable scenario once we have done the switch to setting a "stop" error policy once the disks become unresponsive.
I'm a little worried here that we can not connect to SSH @ipinto via which network/NIC did you try to connect to the VM? Was the VM migrated?
(In reply to Fabian Deutsch from comment #3) > I'm a little worried here that we can not connect to SSH > > @ipinto via which network/NIC did you try to connect to the VM? > Was the VM migrated? The VM was reachable from the serial console for instance. You could even see the username prompt. Also entering the username and password worked, but as soon as it tried to authenticate, it got stuck. This is a typical behaviour when the disk is blocked. I could not find any other hint that something was wrong, but also not a clear prove that it was the disk before the VM got rebooted.
Hi Israel, can you give me access to the cluster this VM is running on? Thanks.
Israel, there's still an open-needinfo on this. Does Antonio have what he needs?
VERIFIED this bug on CNV 4.8.0-372 and OCP 4.8.0-fc.7 We created a few VMs(Centos7, RHEL8, Fedora 33, Windows2k16, Windows10, Windows19) running on longevity env., and did live migration/snapshot actions on the VMs. The env. is running 30+ days. The issue in bug description can NOT be reproduced on this env.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2920
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days