Bug 2213398 - virt-manager and virsh are unable to connect to qemu:///system
Summary: virt-manager and virsh are unable to connect to qemu:///system
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 38
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-08 04:27 UTC by Dominic Tynes
Modified: 2023-07-19 03:14 UTC (History)
33 users (show)

Fixed In Version: systemd-253.7-1.fc38
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-19 03:14:13 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Dominic Tynes 2023-06-08 04:27:55 UTC
Something times out at some point causing virt-manager and virsh to be unable to connect to qemu:///system, both/either just hang trying to connect.  

No obvious errors in journalctl.

Restarting the libvirtd.service resolves temporarily (this service was set to enabled already).  Eventually the issue recurs within some not too lengthy time period.  Restarting libvirtd service will resolve again.

Potentially same as issue reported for Arch:
https://github.com/systemd/systemd/issues/27953

Reproducible: Always

Steps to Reproduce:
1.Restart
2.Wait a few minutes
3.Attempt to run virt-manager or virsh to connect to qemu:///system
4.Either should hang while trying to connect.
5.Kill virt-manager (or virsh).
6.systemctl restart libvirtd.service
7.virt-manager and virsh can now connect
8.Wait some period of time (a few minutes as opposed to hours, but not sure specifically), the problem will recur.
Actual Results:  
virt-manager or virsh are unable to connect to qemu:///system

Expected Results:  
virt-manager and virsh should connect to qemu:///system

Only starting occurring with recent patches to F38, within the last week at most.

Comment 1 Tom London 2023-06-08 17:41:36 UTC
Got this running "virt-manager --debug":

[Thu, 08 Jun 2023 10:40:05 virt-manager 15313] DEBUG (engine:299) Error polling connection qemu:///system
Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/engine.py", line 294, in _handle_tick_queue
    conn.tick_from_engine(**kwargs)
  File "/usr/share/virt-manager/virtManager/connection.py", line 1317, in tick_from_engine
    self._tick(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/connection.py", line 1200, in _tick
    self._hostinfo = self._backend.getInfo()
                     ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/site-packages/libvirt.py", line 4660, in getInfo
    raise libvirtError('virNodeGetInfo() failed')
libvirt.libvirtError: Cannot write data: Broken pipe
[Thu, 08 Jun 2023 10:40:05 virt-manager 15313] DEBUG (connection:838) conn.close() uri=qemu:///system
[Thu, 08 Jun 2023 10:40:05 virt-manager 15313] DEBUG (connection:852) Failed to deregister events in conn cleanup
Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/connection.py", line 844, in close
    self._backend.domainEventDeregisterAny(eid)
  File "/usr/lib64/python3.11/site-packages/libvirt.py", line 6020, in domainEventDeregisterAny
    raise libvirtError('virConnectDomainEventDeregisterAny() failed')
libvirt.libvirtError: internal error: client socket is closed

Comment 2 Tom London 2023-06-08 17:46:35 UTC
This appears to happen about the same time:

[11907.381504] ------------[ cut here ]------------
[11907.381509] WARNING: CPU: 2 PID: 12991 at arch/x86/kvm/mmu/mmu.c:7015 kvm_nx_huge_page_recovery_worker+0x3b1/0x3f0 [kvm]
[11907.381686] Modules linked in: vhost_net vhost vhost_iotlb tap tun uinput rfcomm snd_seq_dummy snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security bridge iptable_nat nf_nat stp nf_conntrack llc nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables qrtr bnep snd_sof_pci_intel_skl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils soundwire_bus snd_soc_avs snd_soc_hda_codec snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi iwlmvm
[11907.381768]  snd_soc_core snd_hda_codec_hdmi snd_compress ac97_bus mac80211 snd_pcm_dmaengine snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel libarc4 sunrpc snd_intel_dspcfg snd_intel_sdw_acpi iwlwifi snd_hda_codec intel_rapl_msr intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_core cfg80211 snd_hwdep snd_seq kvm snd_seq_device mei_hdcp btusb ee1004 mei_pxp iTCO_wdt snd_pcm intel_pmc_bxt iTCO_vendor_support btrtl btbcm btintel btmtk snd_timer irqbypass bluetooth rapl intel_cstate intel_uncore snd mei_me i2c_i801 i2c_smbus soundcore rfkill idma64 mei intel_pch_thermal intel_xhci_usb_role_switch ir_rc6_decoder rc_rc6_mce ite_cir acpi_pad loop zram dm_crypt i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni sdhci_pci polyval_generic cqhci sdhci i2c_algo_bit e1000e ghash_clmulni_intel drm_buddy drm_display_helper mmc_core sha512_ssse3 cec ttm video wmi pinctrl_sunrisepoint scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath fuse
[11907.381868] CPU: 2 PID: 12991 Comm: kvm-nx-lpage-re Tainted: G          I        6.3.5-200.fc38.x86_64 #1
[11907.381873] Hardware name:  /NUC6i5SYB, BIOS SYSKLi35.86A.0073.2020.0909.1625 09/09/2020
[11907.381875] RIP: 0010:kvm_nx_huge_page_recovery_worker+0x3b1/0x3f0 [kvm]
[11907.381983] Code: 48 8b 44 24 30 4c 39 e0 0f 85 fb fd ff ff 48 89 df e8 03 7d fa ff e9 03 fe ff ff 49 bc ff ff ff ff ff ff ff 7f e9 db fc ff ff <0f> 0b e9 04 ff ff ff 48 8b 44 24 40 65 48 2b 04 25 28 00 00 00 75
[11907.381986] RSP: 0018:ffffb397814e7e68 EFLAGS: 00010246
[11907.381990] RAX: 0000000000000000 RBX: ffff9259028e0000 RCX: ffff9257cc64f600
[11907.381993] RDX: 00000000000c0000 RSI: 000000000007fe00 RDI: ffffffffffffff68
[11907.381995] RBP: ffffb397814e9000 R08: ffffb397814e9488 R09: 0000000000000000
[11907.381998] R10: ffff92596f304730 R11: 0000000000000181 R12: ffffb397814e7e98
[11907.382000] R13: 0000000000000000 R14: ffff92596f3047c0 R15: 0000000000000009
[11907.382003] FS:  0000000000000000(0000) GS:ffff925b2ed00000(0000) knlGS:0000000000000000
[11907.382006] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11907.382009] CR2: 00007fc0e4002000 CR3: 00000002bdf16004 CR4: 00000000003726e0
[11907.382012] Call Trace:
[11907.382015]  <TASK>
[11907.382017]  ? kvm_nx_huge_page_recovery_worker+0x3b1/0x3f0 [kvm]
[11907.382122]  ? __warn+0x81/0x130
[11907.382131]  ? kvm_nx_huge_page_recovery_worker+0x3b1/0x3f0 [kvm]
[11907.382236]  ? report_bug+0x171/0x1a0
[11907.382245]  ? handle_bug+0x3c/0x80
[11907.382250]  ? exc_invalid_op+0x17/0x70
[11907.382254]  ? asm_exc_invalid_op+0x1a/0x20
[11907.382263]  ? kvm_nx_huge_page_recovery_worker+0x3b1/0x3f0 [kvm]
[11907.382366]  ? kvm_nx_huge_page_recovery_worker+0x305/0x3f0 [kvm]
[11907.382483]  ? __pfx_kvm_nx_huge_page_recovery_worker+0x10/0x10 [kvm]
[11907.382632]  kvm_vm_worker_thread+0xfc/0x1a0 [kvm]
[11907.382755]  ? __pfx_kvm_vm_worker_thread+0x10/0x10 [kvm]
[11907.382869]  kthread+0xdb/0x110
[11907.382876]  ? __pfx_kthread+0x10/0x10
[11907.382883]  ret_from_fork+0x29/0x50
[11907.382897]  </TASK>
[11907.382899] ---[ end trace 0000000000000000 ]---

Comment 3 Tom London 2023-06-08 18:28:37 UTC
Related?

https://www.spinics.net/lists/kvm/msg316211.html

Comment 4 Michael Riss 2023-06-08 21:26:19 UTC
One problem I see is that you restart libvirtd.service while the Fedora 38 has moved over to the modular service scheme (https://fedoraproject.org/wiki/Changes/LibvirtModularDaemons). Now it's a forest of inactive services which get woken up by clients connecting to unix pipes which makes the systemd start the corresponding service. (Nice idea, I like it.)

Did you upgrade your Fedora system or did you install it fresh?
If it's an upgraded installation you might look into how to make the switch, maybe this helps?
If it's a fresh installation, especially in the context with the kernel stack trace - uff, no idea.

That said, I came across this bug because I have found an issue myself (but in user space) and I'm checking whether it already has been reported. So, there are still issues, but your specific case I do not see on my end.

Comment 5 Cristian Ciupitu 2023-06-08 22:23:46 UTC
I seem to have the same issue with:
systemd-253.5-1.fc38.x86_64
libvirt-daemon-driver-qemu-9.2.0-1.fc38.x86_64

I have upgraded from Fedora 37 to 38, but I had no issues until this week.

Comment 6 Michael Riss 2023-06-08 22:26:17 UTC
My bug report is here https://bugzilla.redhat.com/show_bug.cgi?id=2213660 in case this is related. The symptoms are the same, the problems have emerged at the same time, but the kernel stack trace - this is different. I don't know whether these are related.

Comment 7 redhat 2023-06-13 04:21:36 UTC
Also affects me, especially after the laptop has been sent to standby and been resumed. virt-manager refuses to connect and using virt-viewer to connect to running (gui) VMs starts locking up after about 10s.

Fedora 38

systemd.x86_64 253.5-1.fc38 updates
libvirt-daemon-driver-qemu.x86_64 9.0.0-3.fc38 (different version than @Cristian Ciupitu - system is up to date)

Comment 8 Cristian Ciupitu 2023-06-13 07:34:10 UTC
I forgot to mention that I'm using the Virtualization Preview Repository [1].

[1]: https://fedoraproject.org/wiki/Virtualization_Preview_Repository

Comment 9 Tom London 2023-06-14 15:52:20 UTC
Since recent updates (kernel-6.3.7-200.fc38.x86_64, etc.): the kernel stack trace hasn't occurred. 

Issues connecting to qemu:///system remain however, so the stack trace is likely unrelated...

Comment 10 Denis Fateyev 2023-06-14 16:31:49 UTC
I can confirm that there is no more stack trace with kernel-6.3.7-200.fc38.x86_64, but qemu:///system connection issue persists.
Worked around with downgrading systemd to 253.2-1.fc38.x86_64 version, after that the connection is stable.

Comment 11 Michael Riss 2023-06-14 18:54:29 UTC
Downgrading is one option, but version pinning may be a bit cumbersome when installing further updates.
This here - https://bugzilla.redhat.com/show_bug.cgi?id=2213660#c4 - is an alternative which works fine for me.

Comment 12 Bram Mertens 2023-06-19 14:50:03 UTC
I'm experiencing the same issue.
I'm on Fedora 38 CSB, fresh install.

Creating the following file appears to resolve the issue:
# cat /etc/sysconfig/virtnetworkd
  VIRTNETWORKD_ARGS=

Comment 13 Zbigniew Jędrzejewski-Szmek 2023-07-11 22:51:32 UTC
https://github.com/systemd/systemd/pull/28000

Comment 14 Fedora Update System 2023-07-17 16:58:11 UTC
FEDORA-2023-b07a6a9665 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-b07a6a9665

Comment 15 Fedora Update System 2023-07-18 01:26:21 UTC
FEDORA-2023-b07a6a9665 has been pushed to the Fedora 38 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-b07a6a9665`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-b07a6a9665

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 16 Fedora Update System 2023-07-19 03:14:13 UTC
FEDORA-2023-b07a6a9665 has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.