Bug 1870383 - [iso9660 config drive] Live block migration fails with "libvirt.libvirtError: Unable to read from monitor: Connection reset by peer"
Summary: [iso9660 config drive] Live block migration fails with "libvirt.libvirtError:...
Keywords:
Status: CLOSED DUPLICATE of bug 1713009
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Lee Yarwood
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-19 22:54 UTC by Archit Modi
Modified: 2023-03-21 19:34 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-21 16:26:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Archit Modi 2020-08-19 22:54:27 UTC
Description of problem: Following tempest tests fail on RHOS 16.1 env with iso9660 config drive enabled in nova.conf

tempest.api.compute.admin.test_live_migration.LiveAutoBlockMigrationV225Test.test_live_block_migration_paused[id-1e107f21-61b2-4988-8f22-b196e938ab88]

tempest.api.compute.admin.test_live_migration.LiveMigrationTest.test_live_block_migration_paused[id-1e107f21-61b2-4988-8f22-b196e938ab88]

How reproducible: 100%

Steps to Reproduce:
1. Deploy RHOS 16.1 with at least 2 computes and iso9660 config drive enabled
2. Spin up an instance and live block migrate or run following tempest tests:

tempest.api.compute.admin.test_live_migration.LiveAutoBlockMigrationV225Test.test_live_block_migration_paused[id-1e107f21-61b2-4988-8f22-b196e938ab88]

tempest.api.compute.admin.test_live_migration.LiveMigrationTest.test_live_block_migration_paused[id-1e107f21-61b2-4988-8f22-b196e938ab88]


Actual results:
2020-08-19 14:06:59.835 8 DEBUG nova.virt.libvirt.guest [-] Failed to get job stats: Unable to read from monitor: Connection reset by peer get_job_info /usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:713
2020-08-19 14:06:59.836 8 WARNING nova.virt.libvirt.driver [-] [instance: 27940779-d831-4abf-9dac-e82418c88d93] Error monitoring migration: Unable to read from monitor: Connection reset by peer: libvirt.libvirtError: Unable to read from monitor: Connection reset by peer
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] Traceback (most recent call last):
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 9150, in _live_migration
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]     finish_event, disk_paths)
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 8940, in _live_migration_monitor
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]     info = guest.get_job_info()
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 697, in get_job_info
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]     stats = self._domain.jobStats()
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]   File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]     result = proxy_call(self._autowrap, f, *args, **kwargs)
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]   File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]     rv = execute(f, *args, **kwargs)
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]   File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]     six.reraise(c, e, tb)
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]     raise value
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]   File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]     rv = meth(*args, **kwargs)
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]   File "/usr/lib64/python3.6/site-packages/libvirt.py", line 1594, in jobStats
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93]     if ret is None: raise libvirtError ('virDomainGetJobStats() failed', dom=self)
2020-08-19 14:06:59.836 8 ERROR nova.virt.libvirt.driver [instance: 27940779-d831-4abf-9dac-e82418c88d93] libvirt.libvirtError: Unable to read from monitor: Connection reset by peer

Expected results:
Migration completed successfully

Additional info:

Comment 3 Dr. David Alan Gilbert 2020-08-21 11:52:43 UTC
The 'Failed to load virtio_pci'.....and virtio_rng looks like fallout rather than the cause.
As far as I can tell the modern_queue_state:desc would only fail to load if the connection stopped, so it's more likely that the
chain of events starts with the :

qemu-kvm: block.c:5686: bdrv_inactivate_recurse: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.

on the source, so the migration fails, and then what you're seeing on the destination
is just the result of the source failing.

Comment 4 Dr. David Alan Gilbert 2020-08-21 11:56:01 UTC
then the question is where does the bdrv_inactivate_recurse problem come from;
we do have bz 1713009 - which is similar, that bz is migrating a VM that's never
run, started with -S, migrated and then migrated a 2nd time.

I notice the name here is 'test_live_block_migration_paused' - what exactly is this test doing?

Comment 5 smooney 2020-08-21 12:17:41 UTC
test_live_block_migration_paused boots, a vm wait for it to be active then pauses the vm and live migrates it while its paused.

the block migration imples that it is also using local storage not shared storage so the qcow image need to be transfered to the dest
vs a toplogy where we use ceph or an iscsi volume for the vm disk. in this partcalar case teh vm also has a cd rom disk attached
which has some metadtata and script that are un by cloud init on first boot to set up things like ssh keys. hence
 [iso9660 config drive]  in the name but that is not really related.

Comment 6 Lee Yarwood 2020-08-21 12:19:34 UTC
(In reply to Dr. David Alan Gilbert from comment #4)
> then the question is where does the bdrv_inactivate_recurse problem come
> from;
> we do have bz 1713009 - which is similar, that bz is migrating a VM that's
> never
> run, started with -S, migrated and then migrated a 2nd time.
> 
> I notice the name here is 'test_live_block_migration_paused' - what exactly
> is this test doing?

Yeah this is the same as bz 1713009, the test launches an instance, waits until that is reported as ACTIVE (running), pauses the instance, live migrates the paused instance (to compute0 in c#2) and then live migrates the still paused instance again (to compute1 in c#2):

https://opendev.org/openstack/tempest/src/branch/master/tempest/api/compute/admin/test_live_migration.py#L139-L141
https://opendev.org/openstack/tempest/src/branch/master/tempest/api/compute/admin/test_live_migration.py#L93-L123

Any objections to marking this as a duplicate?

Comment 7 Lee Yarwood 2020-08-21 16:26:32 UTC

*** This bug has been marked as a duplicate of bug 1713009 ***


Note You need to log in before you can comment on or make changes to this bug.