Bug 2369243 - virt-manager requires a reconnect to see newly-migrated VMs
Summary: virt-manager requires a reconnect to see newly-migrated VMs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: libvirt
Version: 42
Hardware: Unspecified
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Michal Privoznik
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2025-05-29 19:18 UTC by Jason Tibbitts
Modified: 2025-06-24 01:44 UTC (History)
13 users (show)

Fixed In Version: libvirt-11.0.0-3.fc42
Clone Of:
Environment:
Last Closed: 2025-06-24 01:44:16 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Log file showing a migration (45.93 KB, text/plain)
2025-06-03 19:19 UTC, Jason Tibbitts
no flags Details

Description Jason Tibbitts 2025-05-29 19:18:38 UTC
I have virt-manager connecting to eight hosts.  I noticed that when I migrate a VM between two of those hosts, virt-manager does not show the VM under the receiving host until I disconnect from that host and reconnect.

All hosts just run QEMU/KVM and I do the migration between hosts from the command line using something like:

virsh migrate --live --persistent --copy-storage-all --compressed --verbose --desturi qemu+ssh://root@vs01/system vm4

In case it matters, all of the VMs use LVM volumes as backing storage and I preallocate the LVs on the destination host manually.

virt-manager always used to immediately show the VM on the receiving host in state "Paused" which would then switch to "Running" as soon as the migration finished.  I don't know exactly when this stopped working, but I am pretty sure it was working before 5.0.0 was pushed.  I'm currently running virt-manager-5.0.0-1.fc41.noarch on a fully-updates F41 machine.

Reproducible: Always

Comment 1 Cole Robinson 2025-06-03 13:33:30 UTC
Did libvirt get a new version on either machine?

Logs may help: run `virt-manager --debug`, connect to both connections, run the migration with virsh, close virt-manager, attach log

Comment 2 Jason Tibbitts 2025-06-03 19:18:52 UTC
Thank you for the response.  Unfortunately I can't pinpoint exactly when this started happening because I usually don't migrate machines around often and the machine where I generally run virt-manager was upgraded from F39 (with 4.1.0) to F41 (with 5.0.0) so I didn't really see the behavior of anything that might have come in between.

Running with --debug while attempting a migration does show some things, including a couple of caught backtraces.  I will attach a log wherein I connect to hosts vs00 and vs01 and then migrate a VM "jlt2" by running "virsh migrate --live --persistent --copy-storage-all --compressed --verbose --desturi qemu+ssh://root@vs00/system jlt2" on vs01.  After the migration is complete, I disconnect from vs00 and then reconnect.  I'll paste a highlight in case that is useful.

[Tue, 03 Jun 2025 14:07:05 virt-manager 206867] DEBUG (connection:715) node device lifecycle event: nodedev=net_vnet0_fe_54_00_4f_07_46 state=VIR_NODE_DEVICE_EVENT_CREATED reason=0
[Tue, 03 Jun 2025 14:07:06 virt-manager 206867] DEBUG (connection:648) domain agent lifecycle event: domain=jlt2 state=VIR_CONNECT_DOMAIN_EVENT_AGENT_LIFECYCLE_STATE_DISCONNECTED reason=1
[Tue, 03 Jun 2025 14:07:07 virt-manager 206867] DEBUG (connection:633) domain lifecycle event: domain=jlt2 state=VIR_DOMAIN_EVENT_STARTED reason=VIR_DOMAIN_EVENT_STARTED_MIGRATED
[Tue, 03 Jun 2025 14:07:36 virt-manager 206867] DEBUG (libvirtobject:187) Error initializing libvirt state for <vmmDomain name=jlt2 id=0x7fed7c5dcac0>
Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 185, in init_libvirt_state
    self._init_libvirt_state()
    ~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/share/virt-manager/virtManager/object/domain.py", line 386, in _init_libvirt_state
    self._refresh_status(newstatus=info[0])
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 234, in _refresh_status
    self.ensure_latest_xml(nosignal=True)
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 277, in ensure_latest_xml
    self.__force_refresh_xml(nosignal=nosignal)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 292, in __force_refresh_xml
    active_xml = self._XMLDesc(self._active_xml_flags)
  File "/usr/share/virt-manager/virtManager/object/domain.py", line 1118, in _XMLDesc
    return self._backend.XMLDesc(flags)
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^
  File "/usr/lib64/python3.13/site-packages/libvirt.py", line 615, in XMLDesc
    raise libvirtError('virDomainGetXMLDesc() failed')
libvirt.libvirtError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePrepare3Params)
[Tue, 03 Jun 2025 14:07:36 virt-manager 206867] DEBUG (connection:1067) Blacklisting domain=jlt2
[Tue, 03 Jun 2025 14:07:36 virt-manager 206867] DEBUG (connection:1069) Object added in denylist, count=1
[Tue, 03 Jun 2025 14:07:37 virt-manager 206867] DEBUG (libvirtobject:187) Error initializing libvirt state for <vmmDomain name=jlt2 id=0x7fed7c5deb40>
Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 185, in init_libvirt_state
    self._init_libvirt_state()
    ~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/share/virt-manager/virtManager/object/domain.py", line 386, in _init_libvirt_state
    self._refresh_status(newstatus=info[0])
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 234, in _refresh_status
    self.ensure_latest_xml(nosignal=True)
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 277, in ensure_latest_xml
    self.__force_refresh_xml(nosignal=nosignal)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 292, in __force_refresh_xml
    active_xml = self._XMLDesc(self._active_xml_flags)
  File "/usr/share/virt-manager/virtManager/object/domain.py", line 1118, in _XMLDesc
    return self._backend.XMLDesc(flags)
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^
  File "/usr/lib64/python3.13/site-packages/libvirt.py", line 615, in XMLDesc
    raise libvirtError('virDomainGetXMLDesc() failed')
libvirt.libvirtError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePrepare3Params)
[Tue, 03 Jun 2025 14:07:37 virt-manager 206867] DEBUG (connection:1067) Blacklisting domain=jlt2
[Tue, 03 Jun 2025 14:07:37 virt-manager 206867] DEBUG (connection:1069) Object added in denylist, count=2

Comment 3 Jason Tibbitts 2025-06-03 19:19:39 UTC
Created attachment 2092902 [details]
Log file showing a migration

Comment 4 Jason Tibbitts 2025-06-03 19:28:00 UTC
Also, I see I didn't really answer your question.  For the test in question, the machine receiving the migrated VM (vs00) is basically a freshly set up F42 machine with libvirt 11.0.0-2.fc42.  The machine sending the VM (vs01) is also an F42 machine with the same version.  However, I've seen this behavior when migrating from a variety of hosts running versions back to (ack) F38.  Those hosts have all been updated now so I can't check the versions directly, and somehow dnf history has become somewhat useless after the upgrade to dnf5.  I think I can dig more exact versions out of the dnf logs if that information would be useful.

Comment 5 Cole Robinson 2025-06-05 17:43:36 UTC
OK the log shows what's happening. When the VM shows up on the new connection, virt-manager attempts an initial `XMLDesc` call (basically virsh dumpxml), but this fails because some locks are still held by the migrate process. virt-manager stops attempting to process the VM, because something simple like dumpxml failing usually means something is catastrophically wrong and we will just spam the logs repeatedly trying to re-init the VM.

In this case it's doubtful the VM isn't broken, but dumpxml is temporarily blocked. I think this is due to libvirt 6cc93bf28842526be2fd596a607ebca796b7fb2e which is new in libvirt 11.0.0, which makes dumpxml attempt to explicitly grab a read lock on the qemu monitor. That's going to make these types of failures more common I think.

I think fix is virt-manager shouldn't denylist the VM if init fails with VIR_ERR_OPERATION_TIMEOUT, and instead try again after a sleep or something.

Comment 6 Daniel Berrangé 2025-06-05 17:48:47 UTC
> In this case it's doubtful the VM isn't broken, but dumpxml is temporarily blocked. I think this is due to libvirt 6cc93bf28842526be2fd596a607ebca796b7fb2e which is new in libvirt 11.0.0, which makes dumpxml attempt to explicitly grab a read lock on the qemu monitor. That's going to make these types of failures more common I think.

Hmm, I think we might need to reconsider that commit. IMHO it is pretty unpleasant for a mere XML query to fail. Returning stale data is preferable to hard errors. All data is inherently potentially stale, because it can change on the VM before the XML is even transmitted back to the client. Stale data should self-correct the next time the VM is queried.

Comment 7 Cole Robinson 2025-06-11 17:10:18 UTC
(In reply to Daniel Berrangé from comment #6)
> > In this case it's doubtful the VM isn't broken, but dumpxml is temporarily blocked. I think this is due to libvirt 6cc93bf28842526be2fd596a607ebca796b7fb2e which is new in libvirt 11.0.0, which makes dumpxml attempt to explicitly grab a read lock on the qemu monitor. That's going to make these types of failures more common I think.
> 
> Hmm, I think we might need to reconsider that commit. IMHO it is pretty
> unpleasant for a mere XML query to fail. Returning stale data is preferable
> to hard errors. All data is inherently potentially stale, because it can
> change on the VM before the XML is even transmitted back to the client.
> Stale data should self-correct the next time the VM is queried.

Michal that's your commit. Thoughts?

commit 6cc93bf28842526be2fd596a607ebca796b7fb2e
Author: Michal Prívozník <mprivozn>
Date:   Wed Dec 11 13:26:45 2024 +0100

    qemu: Grab a QUERY job when formatting domain XML


More context above. From virt-manager perspective, XMLDesc failing basically always meant the VM/libvirt was catastrophically broken, but with this change seems possible that apps need to expect dumpxml may fail if other APIs are running. That might be painful to handle.

Comment 8 Michal Privoznik 2025-06-16 08:45:57 UTC
Yeah, this is bug in my code. I've posted a patch onto the list:

https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/H2F33M63EQOD742GHZZAT7PRJ2GXKCEH/

Comment 9 Michal Privoznik 2025-06-18 09:30:22 UTC
Merged upstream as:

441c23a7e6 qemu: Be more forgiving when acquiring QUERY job when formatting domain XML

v11.4.0-84-g441c23a7e6

Cole, will you handle Fedora backport please, or do you want me to?

Comment 10 Cole Robinson 2025-06-20 14:38:05 UTC
I'll do a build now!

Comment 11 Jason Tibbitts 2025-06-20 16:05:02 UTC
I can confirm that this appears to fix the issue.  When migrating a VM to a host which is running the 11.0.0-3.fc42 packages, the VM how appears in virt-manager as "Paused" until the migration completes.  This matches the behavior I'm used to from previous Fedora releases.

Comment 12 Fedora Update System 2025-06-20 17:04:54 UTC
FEDORA-2025-b69d97eee8 (libvirt-11.0.0-3.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-b69d97eee8

Comment 13 Fedora Update System 2025-06-21 03:00:18 UTC
FEDORA-2025-b69d97eee8 has been pushed to the Fedora 42 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-b69d97eee8`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-b69d97eee8

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 14 Jason Tibbitts 2025-06-23 22:18:27 UTC
My thanks to everyone for fixing this.

Comment 15 Fedora Update System 2025-06-24 01:44:16 UTC
FEDORA-2025-b69d97eee8 (libvirt-11.0.0-3.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.