Bug 2177547 - libvirt-daemon dies after being idle
Summary: libvirt-daemon dies after being idle
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: libvirt
Version: 37
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-12 20:57 UTC by Todd
Modified: 2024-01-12 23:12 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-01-12 23:01:59 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Todd 2023-03-12 20:57:07 UTC
Fedora 37
libvirt-daemon-8.6.0-3.fc37.x86_64

Since the last round of updates, libvirt-daemon will die on me.

The symptom is that all of the virtual machines (VM) are missing from virt-manager. Or if you leave virt-manager open and click on one of them, you get a prompt telling perhaps you misspelled the VM.  Scary until you realize what is going on.

The solution it to restart libvirtd
   # systemctl restart libvirtd

Then restart virt-manager.  Then it will work for several hours.  Open VM's continue to work regardless or libvirtd's condition.  But you can't stop and restart them.

Here is my status of a crashed libvirtd:
# systemctl status libvirtd

○ libvirtd.service - Virtualization daemon
     Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; preset: disabled)
     Active: inactive (dead) since Sat 2023-03-11 11:14:09 PST; 14h ago
   Duration: 2min 50ms
TriggeredBy: ○ libvirtd.socket
             ○ libvirtd-tls.socket
             ○ libvirtd-tcp.socket
             ○ libvirtd-ro.socket
             ○ libvirtd-admin.socket
       Docs: man:libvirtd(8)
             https://libvirt.org
    Process: 36575 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=exited, status=0/SUCCESS)
   Main PID: 36575 (code=exited, status=0/SUCCESS)
      Tasks: 2 (limit: 32768)
     Memory: 21.8M
        CPU: 528ms
     CGroup: /system.slice/libvirtd.service
             ├─36701 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
             └─36702 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper

Mar 11 11:12:21 rn6.acme.local dnsmasq[36701]: using nameserver 127.0.0.1#53
Mar 11 11:12:21 rn6.acme.local dnsmasq[36701]: reading /etc/resolv.conf
Mar 11 11:12:21 rn6.acme.local dnsmasq[36701]: using nameserver 127.0.0.1#53
Mar 11 11:12:21 rn6.acme.local dnsmasq[36701]: reading /etc/resolv.conf
Mar 11 11:12:21 rn6.acme.local dnsmasq[36701]: using nameserver 127.0.0.1#53
Mar 11 11:14:09 rn6.acme.local systemd[1]: libvirtd.service: Deactivated successfully.
Mar 11 11:14:09 rn6.acme.local systemd[1]: libvirtd.service: Unit process 36701 (dnsmasq) remains running after unit stopped.
Mar 11 11:14:09 rn6.acme.local systemd[1]: libvirtd.service: Unit process 36702 (dnsmasq) remains running after unit stopped.
Mar 11 11:14:19 rn6.acme.local dnsmasq[36701]: reading /etc/resolv.conf
Mar 11 11:14:19 rn6.acme.local dnsmasq[36701]: using nameserver 127.0.0.1#53

Comment 1 Todd 2023-04-07 21:44:39 UTC
Hmmm.  posts other than the original post here seem to disappear.  Is that on purpose?

Just posted this upstream:  https://gitlab.com/libvirt/libvirt/-/issues/461
Please do not remove this from the bug report.

Comment 2 Todd 2023-04-22 06:32:57 UTC
Hi All,

The guys upstream figured this out.  And it is not there problem either.  It is specifically a problem with the Fedora RPM of qemu-kvm.

To quote upstream:

>  libvirtd not running at start of the host is not a problem though, actually it's even intended by how it's started if it gets activated by the apropriate socket via systemd.

>  On modern distros which includes the Fedora you are running libvirtd is no longer the default setup, but rather the separate modular daemons such as virtqemud. This means that you either kept upgrading your Fedora [I did since Fedora 20] from before modular daemons became the default (Fedora 35 IIRC) or you tried to change back to the monolithic daemon yourself.

> More on the modular daemons, the reason behind them and also a guide how to switch to modular daemons (which can be used, in reverse, as a guide to switch back to libvirtd): https://www.libvirt.org/daemons.html#switching-to-modular-daemons


So what need fixing in in the qemu-kvm RPM is as follows:

1)  A dependency of virtproxyd need to be added to the specfile for both new installs and upgrades

2)  The post install script need to disable libvirtd and enable everything that is need for virtqemud.  This need to trigger for both new installs and upgrades.  Here are the virtqemud changeover instructions: 

      https://www.libvirt.org/daemons.html#switching-to-modular-daemons

Many thanks,
-T

Comment 3 Todd 2023-04-25 00:05:21 UTC
Are you guys trying to kill me?   The update also knocked out my br0 bridge taking down all my qemu-kvm virtual machines.  Please find a way to stop doing this.

Here are my notes on restoring bre0:

How to add or restore a bridge (br0) when knocked out by an upgrade:

Reference:
   https://www.thegeeksearch.com/how-to-configure-network-bridge-in-centos-rhel-7-using-nmcli-command/

# nmcli con show
3: eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:62:10:06 brd ff:ff:ff:ff:ff:ff
    altname enp0s31f6

# nmcli con add type bridge con-name br0 ifname br0
Connection 'br0' (1ec4bd21-d652-40cf-8de9-392e35ce7445) successfully added.

# nmcli con add type bridge-slave con-name br0-port1 ifname enp0s31f6 master br0
Connection 'br0-port1' (e5c19095-d8a8-4f76-a8bd-fa8b11d19218) successfully added.

Note: enp0s31f6 is the "altname" from `nmcli con show`

# brctl show
bridge name	bridge id		STP enabled	interfaces
br0		8000.fe627ffe1fdf	yes		
virbr0		8000.52540085fab2	yes	

# systemctl restart systemd-networkd.service;

Comment 4 Todd 2023-04-25 03:03:15 UTC
my better updated notes on the broken bridge:

How to add or restore a bridge (br0) when knocked out by an upgrade:

Reference:
   https://www.thegeeksearch.com/how-to-configure-network-bridge-in-centos-rhel-7-using-nmcli-command/
   https://www.thegeekdiary.com/how-to-create-a-bridge-interface-using-nmcli-in-centos-rhel-7-and-8/

Note: qemu-kvm virtual machines
    After restoring, if br0 will not pass traffic in your virtual machines (VM's), 
    go into virt-manager and set the networm interface to comethig else, then set 
    it back to br0


Note: to remove a bridge:
   # nmcli con show

   # ip link set br0 down
   # brctl delbr br0
   # nmcli connection delete br0
   # nmcli connection delete br0-port1
   # nmcli connection delete bridge-br0

   Test if you got it with with:
   # nmcli con show
   # brctl show




1. Get a list of the system’s active network connections:

# nmcli conn show --active
NAME                UUID                                  TYPE      DEVICE 
Wired connection 1  3bbdbba2-be96-3a0f-b0d5-b8a34912e7b7  ethernet  eno1   
virbr0              5ed314a7-f7d5-4719-9f8a-e144718b6288  bridge    virbr

Note: plug something into the eno1, such as teh router to 
activate it and do a
    # nmcli device up eno1



2. Next, create a network bridge by typing:

# nmcli conn add type bridge con-name br0 ifname br0
NAME                UUID                                  TYPE      DEVICE 
Wired connection 1  3bbdbba2-be96-3a0f-b0d5-b8a34912e7b7  ethernet  eno1   
virbr0              5ed314a7-f7d5-4719-9f8a-e144718b6288  bridge    virbr


3. Next, set a static IPv4 address for the bridge network:

Do not set a gateway

# nmcli conn mod br0 ipv4.address '192.168.255.10/24'
########## nmcli conn mod br0 ipv4.gateway '192.168.250.1'
# nmcli conn mod br0 ipv4.method manual


4. Now, add the ethernet interface, eno1, to the bridge, br0, connection:

# nmcli conn add type ethernet slave-type bridge con-name bridge-br0 ifname eno1 master br0
Connection 'bridge-br0' (cb23d052-f04d-4e0b-ae21-18cbc67f1bd6) successfully added.


5. Activate the bridge connection:

# nmcli conn up br0
Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/25)


6. Deactivate the ethernet interface, eno1:

# nmcli conn down eno1
Error: 'eno1' is not an active connection.
Error: no active connection provided.


7. Get a list of the active network connections:

# nmcli conn show --active


8. Display the current bridge port configuration and flags:

# bridge link show
<nothing>

9. Display the new network bridge interface:
# nmcli device up br0
Device 'br0' successfully activated with '8d395a48-3de6-40e8-ad3a-c57aa5f35315'.

# nmcli conn up bridge-br0
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/28)


# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UP group default qlen 1000
    link/ether 52:12:34:56:78:5d brd ff:ff:ff:ff:ff:ff
4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:12:34:56:78:5d brd ff:ff:ff:ff:ff:ff
    inet 192.168.xxx.yy/27 brd 192.168.122.31 scope global noprefixroute br0
       valid_lft forever preferred_lft forever


10:  more info:
# nmcli con show br0

Comment 5 Laine Stump 2023-04-25 14:14:59 UTC
libvirt doesn't do anything to/with host bridges created outside of libvirt (as yours was) other than guest tap devices to it. If something during an upgrade "knocked out" your bridge (not sure of the definition of "knocked out", but I guess the configuration was changed?) then you'll need to look towards NetworkManager (a first guest) to find the cause of the trouble since libvirt wouldn't be modifying the host bridge configuration.

(Also, in general please file a new BZ for a different issue rather than tacking onto the end of an existing but non-related BZ. It makes it easier to redirect just that issue to somewhere else, and also avoids muddying any discussion of the other problem.)

Comment 6 Aoife Moloney 2023-11-23 01:26:25 UTC
This message is a reminder that Fedora Linux 37 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 37 on 2023-12-05.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '37'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 37 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 7 Aoife Moloney 2024-01-12 23:01:59 UTC
Fedora Linux 37 entered end-of-life (EOL) status on 2023-12-05.

Fedora Linux 37 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 8 Todd 2024-01-12 23:12:15 UTC
The issue was corrected in both FC38 and FC39.  Thank you!


Note You need to log in before you can comment on or make changes to this bug.