Description of problem: We see head-of-line blocking when migrating all instances from compute-2 to compute-3 with `max_concurrent_live_migrations=1` being configured with the default value of `1`. One concurrent live-migration is configured, and the migration of instance-000005b1 (UUID 3e891943-d71e-4244-bf51-b5d6b4d79df4) has been going on for 3 days: ~~~ | 301 | de342ef8-a5e4-4ff7-9674-c5929d1e15a7 | overcloud-compute-2.localdomain | overcloud-compute-3.localdomain | overcloud-compute-2.localdomain | overcloud-compute-3.localdomain | - | running | 3e891943-d71e-4244-bf51-b5d6b4d79df4 | 71 | 71 | 2019-12-13T14:30:28.000000 | 2019-12-13T14:31:31.000000 | live-migration | ~~~ However, we cannot abort this migration: ~~~ (overcloud) [stack@director-beta ~]$ nova live-migration-abort 3e891943-d71e-4244-bf51-b5d6b4d79df4 301 (overcloud) [stack@director-beta ~]$ (overcloud) [stack@director-beta ~]$ nova server-migration-show 3e891943-d71e-4244-bf51-b5d6b4d79df4 301 +------------------------+--------------------------------------+ | Property | Value | +------------------------+--------------------------------------+ | created_at | 2019-12-13T14:30:28.000000 | | dest_compute | overcloud-compute-3.localdomain | | dest_host | - | | dest_node | overcloud-compute-3.localdomain | | disk_processed_bytes | 0 | | disk_remaining_bytes | 0 | | disk_total_bytes | 0 | | id | 301 | | memory_processed_bytes | 0 | | memory_remaining_bytes | 0 | | memory_total_bytes | 0 | | server_uuid | 3e891943-d71e-4244-bf51-b5d6b4d79df4 | | source_compute | overcloud-compute-2.localdomain | | source_node | overcloud-compute-2.localdomain | | status | running | | updated_at | 2019-12-13T14:31:31.000000 | | uuid | de342ef8-a5e4-4ff7-9674-c5929d1e15a7 | +------------------------+--------------------------------------+ [root@overcloud-compute-2 desccb]# tail -f /var/log/containers/nova/nova-compute.log 2019-12-16 13:40:46.174 8 WARNING nova.compute.resource_tracker [req-3ccf9ab8-fb45-4f7a-a8c4-de55e2e85aa4 - - - - -] [instance: e810ff94-6d83-4ab6-8871-245ea7202f1a] Instance not resizing, skipping migration. 2019-12-16 13:40:46.174 8 WARNING nova.compute.resource_tracker [req-3ccf9ab8-fb45-4f7a-a8c4-de55e2e85aa4 - - - - -] [instance: 3e891943-d71e-4244-bf51-b5d6b4d79df4] Instance not resizing, skipping migration. 2019-12-16 13:40:47.005 8 INFO nova.compute.resource_tracker [req-3ccf9ab8-fb45-4f7a-a8c4-de55e2e85aa4 - - - - -] Final resource view: name=overcloud-compute-2.localdomain phys_ram=195229MB used_ram=95232MB phys_disk=96571GB used_disk=60GB total_vcpus=112 used_vcpus=65 pci_stats=[] 2019-12-16 13:41:07.198 8 ERROR nova.virt.libvirt.driver [req-8dfd81b8-c788-426d-9d1f-38bfcd807010 5172322575324a0a824341de449766ab a2e7b2ae4cef4c7e8d6397502132d575 - default default] [instance: 3e891943-d71e-4244-bf51-b5d6b4d79df4] Failed to cancel migration Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetJobStats): libvirtError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetJobStats) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server [req-8dfd81b8-c788-426d-9d1f-38bfcd807010 5172322575324a0a824341de449766ab a2e7b2ae4cef4c7e8d6397502132d575 - default default] Exception during message handling: libvirtError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetJobStats) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 76, in wrapped 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server function_name, call_dict, binary) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server self.force_reraise() 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 67, in wrapped 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 1021, in decorated_function 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 215, in decorated_function 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info()) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server self.force_reraise() 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 203, in decorated_function 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6378, in live_migration_abort 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server self.driver.live_migration_abort(instance) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6962, in live_migration_abort 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server dom.abortJob() 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server result = proxy_call(self._autowrap, f, *args, **kwargs) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server rv = execute(f, *args, **kwargs) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server six.reraise(c, e, tb) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server rv = meth(*args, **kwargs) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/libvirt.py", line 544, in abortJob 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server if ret == -1: raise libvirtError ('virDomainAbortJob() failed', dom=self) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server libvirtError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetJobStats) 2019-12-16 13:41:07.238 8 ERROR oslo_messaging.rpc.server ~~~ virsh cannot list any details for the qemu-kvm process: ~~~ [root@overcloud-compute-2 qemu]# virsh domjobinfo instance-000005b1 error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetJobStats) ~~~ Version-Release number of selected component (if applicable): [akaris@supportshell overcloud-compute-2]$ cat etc/redhat-release Red Hat Enterprise Linux Server release 7.7 (Maipo) [akaris@supportshell overcloud-compute-2]$ cat etc/rhosp-release Red Hat OpenStack Platform release 13.0.9 (Queens) [akaris@supportshell overcloud-compute-2]$ egrep 'libvirt|qemu|kernel' installed-rpms erlang-kernel-18.3.4.11-2.el7ost.x86_64 Tue Aug 27 16:23:00 2019 ipxe-roms-qemu-20180825-2.git133f4c.el7.noarch Tue Aug 27 16:23:30 2019 kernel-3.10.0-1062.4.1.el7.x86_64 Wed Oct 30 11:29:50 2019 kernel-3.10.0-1062.4.3.el7.x86_64 Wed Nov 13 18:58:11 2019 kernel-3.10.0-1062.el7.x86_64 Tue Aug 27 15:46:11 2019 kernel-tools-3.10.0-1062.4.3.el7.x86_64 Wed Nov 13 18:58:03 2019 kernel-tools-libs-3.10.0-1062.4.3.el7.x86_64 Wed Nov 13 18:58:03 2019 libvirt-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:55 2019 libvirt-bash-completion-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:47 2019 libvirt-client-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:47 2019 libvirt-daemon-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:41 2019 libvirt-daemon-config-network-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:42 2019 libvirt-daemon-config-nwfilter-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:42 2019 libvirt-daemon-driver-interface-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:41 2019 libvirt-daemon-driver-lxc-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:42 2019 libvirt-daemon-driver-network-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:42 2019 libvirt-daemon-driver-nodedev-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:41 2019 libvirt-daemon-driver-nwfilter-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:41 2019 libvirt-daemon-driver-qemu-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:42 2019 libvirt-daemon-driver-secret-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:41 2019 libvirt-daemon-driver-storage-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:46 2019 libvirt-daemon-driver-storage-core-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:41 2019 libvirt-daemon-driver-storage-disk-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:42 2019 libvirt-daemon-driver-storage-gluster-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:42 2019 libvirt-daemon-driver-storage-iscsi-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:42 2019 libvirt-daemon-driver-storage-logical-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:44 2019 libvirt-daemon-driver-storage-mpath-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:42 2019 libvirt-daemon-driver-storage-rbd-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:46 2019 libvirt-daemon-driver-storage-scsi-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:42 2019 libvirt-daemon-kvm-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:46 2019 libvirt-libs-4.5.0-23.el7_7.1.x86_64 Wed Oct 30 11:29:39 2019 libvirt-python-4.5.0-1.el7.x86_64 Tue Aug 27 16:24:03 2019 qemu-guest-agent-2.12.0-3.el7.x86_64 Tue Aug 27 15:46:18 2019 qemu-img-rhev-2.12.0-33.el7_7.4.x86_64 Wed Nov 13 18:57:40 2019 qemu-kvm-common-rhev-2.12.0-33.el7_7.4.x86_64 Wed Nov 13 18:57:53 2019 qemu-kvm-rhev-2.12.0-33.el7_7.4.x86_64 Wed Nov 13 18:58:05 2019 [akaris@supportshell overcloud-compute-2]$ cat uname Linux overcloud-compute-2 3.10.0-1062.4.3.el7.x86_64 #1 SMP Tue Nov 12 10:42:40 EST 2019 x86_64 x86_64 x86_64 GNU/Linux [akaris@supportshell overcloud-compute-2]$ grep libvirt ./sos_commands/docker/docker_ps 3e1229985236 10.16.20.2:8787/rhosp13/openstack-nova-libvirt:13.0-119 "dumb-init --singl..." 4 weeks ago Up 58 minutes nova_libvirt 4e5a23db26cb 10.16.20.2:8787/rhosp13/openstack-nova-libvirt:13.0-119 "dumb-init --singl..." 4 weeks ago Up 9 days nova_virtlogd How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
It turns out that a volume was attached in duplicate to the same instance 3e891943-d71e-4244-bf51-b5d6b4d79df4 (overcloud) [stack @ director-beta ~] $ openstack volume show 12128eb5-9af9-47e2-b10c-1eca0cec5d2c | attachments | [{u'server_id ': u'3e891943-d71e-4244-bf51-b5d6b4d79df4', u'attachment_id ': u'99f3e933-a373-4d94-8bc0-32e0895a5df9', u'attached_at ': u'2019-12-09T : 59: 44.000000 ', u'host_name': u'overcloud-compute-2.localdomain ', u'volume_id': u'12128eb5-9af9-47e2-b10c-1eca0cec5d2c ', u'device': u '/ dev / vdb ', u'id': u'12128eb5-9af9-47e2-b10c-1eca0cec5d2c '}, {u'server_id': u'3e891943-d71e-4244-bf51-b5d6b4d79df4 ', u'attachment_id': u'd0ed9602- d965-4c0e-9e1f-ea758da064e1 ', u'attached_at': u'2019-12-13T14: 31: 26.000000 ', u'host_name': u'overcloud-compute-3.localdomain ', u'volume_id': u ' 12128eb5-9af9-47e2-b10c-1eca0cec5d2c ', u'device': u '/ dev / vdb', u'id ': u'12128eb5-9af9-47e2-b10c-1eca0cec5d2c'}] | After deattaching this volume, live-migration worked. cinder reset-state --attach-status detached 12128eb5-9af9-47e2-b10c-1eca0cec5d2c