Description of problem: Following tempest tests fail on RHOS 16.1 env with iso9660 config drive enabled in nova.conf tempest.api.compute.admin.test_live_migration.LiveAutoBlockMigrationV225Test.test_live_block_migration_paused[id-1e107f21-61b2-4988-8f22-b196e938ab88] tempest.api.compute.admin.test_live_migration.LiveMigrationTest.test_live_block_migration_paused[id-1e107f21-61b2-4988-8f22-b196e938ab88] How reproducible: 100% Steps to Reproduce: 1. Deploy RHOS 16.1 with at least 2 computes and iso9660 config drive enabled 2. Spin up an instance and live block migrate or run following tempest tests: tempest.api.compute.admin.test_live_migration.LiveAutoBlockMigrationV225Test.test_live_block_migration_paused[id-1e107f21-61b2-4988-8f22-b196e938ab88] tempest.api.compute.admin.test_live_migration.LiveMigrationTest.test_live_block_migration_paused[id-1e107f21-61b2-4988-8f22-b196e938ab88] Actual results: 2020-08-19 14:06:59.835 8 DEBUG nova.virt.libvirt.guest [-] Failed to get job stats: Unable to read from monitor: Connection reset by peer get_job_info /usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:713 2020-08-19 14:06:59.836 8 WARNING nova.virt.libvirt.driver [-] [instance: 27940779-d831-4abf-9dac-e82418c88d93] Error monitoring migration: Unable to read from monitor: Connection reset by peer: libvirt.libvirtError: Unable to read from monitor: Connection reset by peer 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] Traceback (most recent call last): 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 9150, in _live_migration 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] finish_event, disk_paths) 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 8940, in _live_migration_monitor 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] info = guest.get_job_info() 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 697, in get_job_info 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] stats = self._domain.jobStats() 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] result = proxy_call(self._autowrap, f, *args, **kwargs) 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] rv = execute(f, *args, **kwargs) 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] six.reraise(c, e, tb) 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] raise value 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] rv = meth(*args, **kwargs) 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] File "/usr/lib64/python3.6/site-packages/libvirt.py", line 1594, in jobStats 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] if ret is None: raise libvirtError ('virDomainGetJobStats() failed', dom=self) 2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] libvirt.libvirtError: Unable to read from monitor: Connection reset by peer Expected results: Migration completed successfully Additional info:
The 'Failed to load virtio_pci'.....and virtio_rng looks like fallout rather than the cause. As far as I can tell the modern_queue_state:desc would only fail to load if the connection stopped, so it's more likely that the chain of events starts with the : qemu-kvm: block.c:5686: bdrv_inactivate_recurse: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed. on the source, so the migration fails, and then what you're seeing on the destination is just the result of the source failing.
then the question is where does the bdrv_inactivate_recurse problem come from; we do have bz 1713009 - which is similar, that bz is migrating a VM that's never run, started with -S, migrated and then migrated a 2nd time. I notice the name here is 'test_live_block_migration_paused' - what exactly is this test doing?
test_live_block_migration_paused boots, a vm wait for it to be active then pauses the vm and live migrates it while its paused. the block migration imples that it is also using local storage not shared storage so the qcow image need to be transfered to the dest vs a toplogy where we use ceph or an iscsi volume for the vm disk. in this partcalar case teh vm also has a cd rom disk attached which has some metadtata and script that are un by cloud init on first boot to set up things like ssh keys. hence [iso9660 config drive] in the name but that is not really related.
(In reply to Dr. David Alan Gilbert from comment #4) > then the question is where does the bdrv_inactivate_recurse problem come > from; > we do have bz 1713009 - which is similar, that bz is migrating a VM that's > never > run, started with -S, migrated and then migrated a 2nd time. > > I notice the name here is 'test_live_block_migration_paused' - what exactly > is this test doing? Yeah this is the same as bz 1713009, the test launches an instance, waits until that is reported as ACTIVE (running), pauses the instance, live migrates the paused instance (to compute0 in c#2) and then live migrates the still paused instance again (to compute1 in c#2): https://opendev.org/openstack/tempest/src/branch/master/tempest/api/compute/admin/test_live_migration.py#L139-L141 https://opendev.org/openstack/tempest/src/branch/master/tempest/api/compute/admin/test_live_migration.py#L93-L123 Any objections to marking this as a duplicate?
*** This bug has been marked as a duplicate of bug 1713009 ***