+++ This bug was initially created as a clone of Bug #1635581 +++ Description of problem: After an issue with the storage which caused VMs to enter paused state, while the domain was re-activated, resume to these VMs fails on vdsm Version-Release number of selected component (if applicable): [root@host02 ~]# cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core) [root@host02 ~]# rpm -qa |grep libvirt libvirt-daemon-driver-secret-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-qemu-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-scsi-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-nodedev-4.5.0-10.el7_6.3.x86_64 libvirt-python-4.5.0-1.el7.x86_64 libvirt-daemon-driver-network-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-rbd-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-mpath-4.5.0-10.el7_6.3.x86_64 libvirt-lock-sanlock-4.5.0-10.el7_6.3.x86_64 libvirt-bash-completion-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-nwfilter-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-config-nwfilter-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-config-network-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-core-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-iscsi-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64 libvirt-client-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-lxc-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-logical-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-kvm-4.5.0-10.el7_6.3.x86_64 libvirt-libs-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-interface-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-disk-4.5.0-10.el7_6.3.x86_64 libvirt-4.5.0-10.el7_6.3.x86_64 [root@host02 ~]# rpm -qa |grep ovirt ovirt-imageio-common-1.4.5-0.el7.x86_64 ovirt-hosted-engine-setup-2.2.32-1.el7.noarch cockpit-ovirt-dashboard-0.11.37-1.el7.noarch ovirt-setup-lib-1.1.5-1.el7.noarch ovirt-release41-4.1.8-1.el7.centos.noarch ovirt-imageio-daemon-1.4.5-0.el7.noarch ovirt-provider-ovn-driver-1.2.16-1.el7.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7.noarch python-ovirt-engine-sdk4-4.2.9-2.el7.x86_64 ovirt-host-deploy-1.7.4-1.el7.noarch ovirt-host-dependencies-4.2.3-1.el7.x86_64 ovirt-host-4.2.3-1.el7.x86_64 ovirt-vmconsole-host-1.0.5-4.el7.centos.noarch ovirt-release42-4.2.7.1-1.el7.noarch ovirt-hosted-engine-ha-2.2.18-1.el7.noarch ovirt-vmconsole-1.0.5-4.el7.centos.noarch cockpit-machines-ovirt-176-4.el7.centos.noarch [root@host02 ~]# rpm -qa |grep qemu libvirt-daemon-driver-qemu-4.5.0-10.el7_6.3.x86_64 qemu-kvm-common-ev-2.12.0-18.el7_6.1.1.x86_64 qemu-kvm-tools-ev-2.12.0-18.el7_6.1.1.x86_64 qemu-img-ev-2.12.0-18.el7_6.1.1.x86_64 ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch qemu-kvm-ev-2.12.0-18.el7_6.1.1.x86_64 vdsm-hook-qemucmdline-4.20.43-1.el7.noarch [root@host02 ~]# rpm -qa |grep sanlock sanlock-lib-3.6.0-1.el7.x86_64 sanlock-python-3.6.0-1.el7.x86_64 libvirt-lock-sanlock-4.5.0-10.el7_6.3.x86_64 sanlock-3.6.0-1.el7.x86_64 [root@host02 ~]# rpm -qa |grep selinux selinux-policy-targeted-3.13.1-229.el7_6.6.noarch libselinux-python-2.5-14.1.el7.x86_64 libselinux-2.5-14.1.el7.i686 selinux-policy-3.13.1-229.el7_6.6.noarch libselinux-utils-2.5-14.1.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 [root@host02 ~]# rpm -qa |grep vdsm vdsm-jsonrpc-4.20.43-1.el7.noarch vdsm-hook-vhostmd-4.20.43-1.el7.noarch vdsm-yajsonrpc-4.20.43-1.el7.noarch vdsm-api-4.20.43-1.el7.noarch vdsm-client-4.20.43-1.el7.noarch vdsm-common-4.20.43-1.el7.noarch vdsm-python-4.20.43-1.el7.noarch vdsm-4.20.43-1.el7.x86_64 vdsm-hook-fcoe-4.20.43-1.el7.noarch vdsm-hook-macspoof-4.20.43-1.el7.noarch vdsm-network-4.20.43-1.el7.x86_64 vdsm-hook-vmfex-dev-4.20.43-1.el7.noarch vdsm-hook-ethtool-options-4.20.43-1.el7.noarch vdsm-hook-openstacknet-4.20.43-1.el7.noarch vdsm-http-4.20.43-1.el7.noarch vdsm-hook-qemucmdline-4.20.43-1.el7.noarch [root@host02 ~]# uname -a Linux host02 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 29 14:49:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux How reproducible: In this specific situation, cannot resume any VM Steps to Reproduce: 1. While there are VMs running with disk reside on domain A, disconnect domain A from the hosts. 2. Wait for the VMs to enter paused state and re-connect the hosts to the storage VM 'ae51780f-c1bf-48c3-b672-7018612c00f6' moved from 'Up' --> 'Paused' (vmId='ae51780f-c1bf-48c3-b672-7018612c00f6') abnormal vm stop device ua-e3a12af7-bc3b-4c8a-9d74-424dba4933d7 error eio (vmId='ae51780f-c1bf-48c3-b672-7018612c00f6') CPU stopped: onIOError Actual results: VMs should be resumed automatically, but that doesn't succeed: VM 'ae51780f-c1bf-48c3-b672-7018612c00f6'(test261218_1) moved from 'Paused' --> 'Up' EVENT_ID: VM_RECOVERED_FROM_PAUSE_ERROR(196), VM test261218_1 has recovered from paused back to up. It enters not responding: VM 'ae51780f-c1bf-48c3-b672-7018612c00f6'(test261218_1) moved from 'Up' --> 'NotResponding' In vdsm.log: (vmId='ae51780f-c1bf-48c3-b672-7018612c00f6') monitor became unresponsive (command timeout, age=988.350000001) could not run <function <lambda> at 0x7f69900d1a28> on ['ae51780f-c1bf-48c3-b672-7018612c00f6'] Worker blocked: <Worker name=jsonrpc/1 running <Task <JsonRpcTask {'params': {u'vmID': u'ae51780f-c1bf-48c3-b672-7018612c00f6'}, 'jsonrpc': '2.0', 'method': u'VM.cont', 'id': u'4f33fa5a-7ea4-4f6a-88ac-e1811432ed14'} at 0x7f6970027950> timeout=60, duration=900 at 0x7f6970027b50> task#=4 at 0x7f695876c9d0>, traceback: File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner self.run() File: "/usr/lib64/python2.7/threading.py", line 765, in run self.__target(*self.__args, **self.__kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, in run ret = func(*args, **kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run self._execute_task() File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task task() File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__ self._callable() File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 523, in __call__ self._handler(self._ctx, self._req) File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 566, in _serveRequest response = self._handle_request(req, ctx) File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in _handle_request res = method(**params) File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 197, in _dynamicMethod result = fn(*methodArgs) File: "<string>", line 2, in cont File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File: "<string>", line 2, in cont File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 122, in method ret = func(*args, **kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 167, in cont return self.vm.cont() File: "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1530, in cont self._underlyingCont() File: "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4272, in _underlyingCont self._dom.resume() File: "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f ret = attr(*args, **kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper ret = f(*args, **kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper return func(inst, *args, **kwargs) File: "/usr/lib64/python2.7/site-packages/libvirt.py", line 1976, in resume ret = libvirtmod.virDomainResume(self._o)
Why was this cloned?
Cause the original report was closed, and the problem is still exist in our installation.
Ok -- I closed the original because it was opened internally with no known external occurrences, and there wasn't sufficient information to reproduce it or start on a patch, and our QE had not seen it in months. I'll re-open that one and close this one. Can you add any logs you have or steps to reproduce to the original bug?
*** This bug has been marked as a duplicate of bug 1635581 ***