Bug 2218987 - virtnetworkd.service is not triggered by socket after first deactivation
Summary: virtnetworkd.service is not triggered by socket after first deactivation
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: libvirt
Version: 38
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-30 19:45 UTC by sid
Modified: 2023-07-22 20:56 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)

Description sid 2023-06-30 19:45:01 UTC
virtnetworkd.service is triggered by the following sockets:

virtnetworkd.socket
virtnetworkd-admin.socket
virtnetworkd-ro.socket

When an libvirt client ( say, virsh / GNOME Boxes ) tries to connect to qemu:///system bus, the libvirt qemu daemon sends the appropriate messages to the above virtnetworkd sockets, which then triggers the virtnetworkd.service to start.

A working virtnetworkd.service is shown below:

# systemctl status virtnetworkd.service
● virtnetworkd.service - Virtualization network daemon
     Loaded: loaded (/usr/lib/systemd/system/virtnetworkd.service; disabled; preset: disabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: active (running) since Fri 2023-06-30 19:07:32 UTC; 2min 8s ago
TriggeredBy: ● virtnetworkd.socket
             ● virtnetworkd-admin.socket
             ● virtnetworkd-ro.socket
       Docs: man:virtnetworkd(8)
             https://libvirt.org
   Main PID: 5986 (virtnetworkd)
      Tasks: 21 (limit: 14182)
     Memory: 4.0M
        CPU: 343ms
     CGroup: /system.slice/virtnetworkd.service
             ├─5986 /usr/sbin/virtnetworkd --timeout 120
             ├─6057 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
             └─6058 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper

In the above case, the following command succeeds as shown below:

$ virsh -c qemu:///system net-list --all 
 Name      State    Autostart   Persistent
--------------------------------------------
 default   active   yes         yes

As noted, there is a timeout of 2 minutes for the virtnetworkd daemon. So, if the daemon is inactive for more that 2 minutes, the primary daemon process (virtnetworkd - pid 5986), will exit. This is to preserve system resources. 

An exited virtnetworkd daemon is shown below:

# systemctl status virtnetworkd.service
● virtnetworkd.service - Virtualization network daemon
     Loaded: loaded (/usr/lib/systemd/system/virtnetworkd.service; disabled; preset: disabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: active (running) since Fri 2023-06-30 19:07:32 UTC; 2min 11s ago
TriggeredBy: ● virtnetworkd.socket
             ● virtnetworkd-admin.socket
             ● virtnetworkd-ro.socket
       Docs: man:virtnetworkd(8)
             https://libvirt.org
    Process: 5986 ExecStart=/usr/sbin/virtnetworkd $VIRTNETWORKD_ARGS (code=exited, status=0/SUCCESS)
   Main PID: 5986 (code=exited, status=0/SUCCESS)
      Tasks: 2 (limit: 14182)
     Memory: 740.0K
        CPU: 349ms
     CGroup: /system.slice/virtnetworkd.service
             ├─6057 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
             └─6058 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper

[A] - If a client wants to connect to virtnetworkd.service, it should communicate with one of the sockets above directly, or via libvirt-qemu service to trigger virtnetworkd.service to start again.

However, [A] doesn't work as expected.

In the above state, the following command hangs and has to be terminated as shown below:

$ time virsh -c qemu:///system net-list --all 
^C

real	0m18.874s
user	0m0.021s
sys	0m0.022s

Explanation:
------------

As observed above, the virtnetworkd.service is still in "active (running)" state, even after the primary daemon process has exited. So, [A] doesn't actually work. This causes the clients to wait indefinitely for virtnetworkd daemon to start. Hence, above "virsh" command needs to be terminated with Ctrl+C.

I assume, maybe since the 2 dnsmasq process belong to the same cgroup are running, systemd reports virtnetworkd.service as "active (running)". This causes the trigger sockets to think virtnetworkd.service is still active, and servicing clients. 

Note: These "dnsmasq" processes will never be stopped when the service is stopped ( so running VMs will not be interrupted ).

Workaround:
-----------

# systemctl stop virtnetworkd.service

This stops the virtnetworkd.service, so the trigger sockets now trigger the service to start the stopped service correctly.

Example shown below:

[root@fedora libvirt] # systemctl stop virtnetworkd.service
Warning: Stopping virtnetworkd.service, but it can still be activated by:
  virtnetworkd.socket
  virtnetworkd-admin.socket
  virtnetworkd-ro.socket

[root@fedora libvirt] # systemctl status virtnetworkd.service
○ virtnetworkd.service - Virtualization network daemon
     Loaded: loaded (/usr/lib/systemd/system/virtnetworkd.service; disabled; preset: disabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: inactive (dead) since Fri 2023-06-30 19:37:50 UTC; 1s ago
   Duration: 9min 15.479s
TriggeredBy: ● virtnetworkd.socket
             ● virtnetworkd-admin.socket
             ● virtnetworkd-ro.socket
       Docs: man:virtnetworkd(8)
             https://libvirt.org
    Process: 8045 ExecStart=/usr/sbin/virtnetworkd $VIRTNETWORKD_ARGS (code=exited, status=0/SUCCESS)
   Main PID: 8045 (code=exited, status=0/SUCCESS)
      Tasks: 2 (limit: 14182)
     Memory: 748.0K
        CPU: 296ms
     CGroup: /system.slice/virtnetworkd.service
             ├─6057 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
             └─6058 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper

Jun 30 19:28:35 fedora dnsmasq-dhcp[6057]: read /var/lib/libvirt/dnsmasq/default.hostsfile

Now, "virsh" command works again.

$ virsh -c qemu:///system net-list --all 
 Name      State    Autostart   Persistent
--------------------------------------------
 default   active   yes         yes


Reproducible: Always

Steps to Reproduce:
1. Run the following command as normal user ( requires password )

$ virsh -c qemu:///system net-list --all 

2. If above command succeeds ( i.e prints output within a couple of seconds ), exit all libvirt clients like GNOME Boxes. Do not run any virsh commands.

3. Wait for 2 minutes.

4. Re-run command in [1].

5. Command should hang.
Actual Results:  
Command hangs indefinitely and needs to be terminated with Ctrl+C.

$ virsh -c qemu:///system net-list --all 
^C
$


Expected Results:  
Command should print some output like:

$ virsh -c qemu:///system net-list --all 
 Name      State    Autostart   Persistent
--------------------------------------------
 default   active   yes         yes

I think this is a regression due to:

https://fedoraproject.org/wiki/Changes/LibvirtModularDaemons

Comment 1 sid 2023-07-22 20:56:31 UTC
I guess this is a systemd bug - https://bugzilla.redhat.com/show_bug.cgi?id=2213660


Note You need to log in before you can comment on or make changes to this bug.