Description of problem: Run this command as *non-root*: guestfish -a /dev/null --network -v -x run If it hangs for about 60 seconds near here: + iface=eth0 + touch /etc/fstab + dhclient --version + dhclient eth0 /usr/sbin/dhclient-script: line 798: cd: /etc/sysconfig/network-scripts: No such file or directory grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory then you have observed the bug. In all the glorious complexity which is libvirt networking I'm not really sure what component is to blame, but somehow libvirt isn't setting up the network correctly. Sometimes reinstalling libvirt or rebooting fixes it for a while. Version-Release number of selected component (if applicable): libvirt-client-3.8.0-1.fc28.x86_64 libvirt-daemon-3.8.0-1.fc28.x86_64 libvirt-daemon-config-network-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-interface-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-libxl-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-lxc-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-network-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-nodedev-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-nwfilter-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-qemu-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-secret-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-storage-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-storage-core-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-storage-disk-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-storage-gluster-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-storage-iscsi-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-storage-logical-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-storage-mpath-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-storage-rbd-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-storage-scsi-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-storage-sheepdog-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-storage-zfs-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-uml-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-vbox-3.8.0-1.fc28.x86_64 libvirt-daemon-driver-xen-3.8.0-1.fc28.x86_64 libvirt-daemon-kvm-3.8.0-1.fc28.x86_64 libvirt-daemon-qemu-3.8.0-1.fc28.x86_64 libvirt-devel-3.8.0-1.fc28.x86_64 libvirt-glib-1.0.0-2.fc26.x86_64 libvirt-libs-3.8.0-1.fc28.x86_64 libvirt-python-3.2.0-1.fc26.x86_64 libvirt-python3-3.2.0-1.fc26.x86_64 How reproducible: Unknown, but quite a lot on Rawhide. Steps to Reproduce: 1. See description above. Additional info: guestfish full output, generated libvirt XML and generated qemu command line will be attached.
Created attachment 1340100 [details] guestfish.log Notice in the output that eth0 is not initialized in the appliance: + ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group defaul t qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 brd 127.255.255.255 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP gr oup default qlen 1000 link/ether 52:54:00:bb:63:a3 brd ff:ff:ff:ff:ff:ff inet6 fe80::5054:ff:febb:63a3/64 scope link valid_lft forever preferred_lft forever + ip r
Created attachment 1340101 [details] qemu command line
Created attachment 1340103 [details] libvirt XML generated by guestfish
So the host side is connected to the default virtual network. Being unconfigured in the guest presumably means that DHCP failed for some reason. Does 'journalctl -u libvirtd.service' report anything from dnsmasq about DHCP requests/replies ? And are there any logs available from the appliance about its DHCP status ?
Note that we're running everything as non-root, so it should be using the session libvirtd. I ran the guestfish command again and there are some logs generated: Oct 18 11:22:50 trick.home.annexia.org dnsmasq-dhcp[1580]: DHCPDISCOVER(virbr0) 52:54:00:07:8b:4e Oct 18 11:22:50 trick.home.annexia.org dnsmasq-dhcp[1580]: DHCPOFFER(virbr0) 192.168.122.40 52:54:00:07:8b:4e The full log from an earlier run of the appliance is available in comment 1. As part of that it runs commands such as ‘ip a’ and others which show that the network was not brought up properly.
Setting SELinux to Permissive does not fix it, so I guess it's not that.
Note the guest XML has this: <interface type="bridge"> <source bridge="virbr0"/> <model type="virtio"/> </interface> which means session libvirtd will be using QEMU's setuid network helper to create a TAP device attached to the virtual network created by system libvirtd.
Restarting (root) libvirtd and killing (therefore restarting) session libvirtd does not help either.
(In reply to Daniel Berrange from comment #7) > Note the guest XML has this: > > <interface type="bridge"> > <source bridge="virbr0"/> > <model type="virtio"/> > </interface> > > which means session libvirtd will be using QEMU's setuid network helper to > create a TAP device attached to the virtual network created by system > libvirtd. Right, that's intentional. Is there any further debugging we can do of that mechanism? # cat /etc/qemu/bridge.conf allow virbr0
Assuming 'brctl' on the host shows the tap device associated with the virbr0, it should be working - there's no debugging in the setuid helper itself afaik.
Was the output in Comment 11 taken while the guest was *not* running? If so, do it while the guest is running. If the guest was running, then apparently the qemu bridge helper is failing in some way - I see no tap device in the output of "ip a" or attached to virbr0. (the device would be named, e.g. "tap0" rather than "vnet0" since qemu-bridge-helper picks the name). You *might* be able to learn something useful by running qemu-bridge-helper manually...
Correct, output from comment 11 was with the guest not running. With it running the output is below. [BTW I made comment 11 private because I accidentally posted my IPv6 public address. I have excised it from the following] # brctl show ; ip a ; ip r ; virsh net-list --all bridge name bridge id STP enabled interfaces br0 8000.408d5caa0fdb no enp4s0 virbr0 8000.525400cbf757 yes tap0 virbr0-nic 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 40:8d:5c:aa:0f:db brd ff:ff:ff:ff:ff:ff inet 192.168.0.138/24 brd 192.168.0.255 scope global dynamic br0 valid_lft 2442sec preferred_lft 2442sec inet6 ***/64 scope global mngtmpaddr dynamic valid_lft 86348sec preferred_lft 14348sec inet6 fe80::90c1:79ff:feb6:5241/64 scope link valid_lft forever preferred_lft forever 5: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:cb:f7:57 brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever 6: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000 link/ether 52:54:00:cb:f7:57 brd ff:ff:ff:ff:ff:ff 7: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br0 state UP group default qlen 1000 link/ether 40:8d:5c:aa:0f:db brd ff:ff:ff:ff:ff:ff inet6 fe80::428d:5cff:feaa:fdb/64 scope link valid_lft forever preferred_lft forever 8: enp5s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 40:8d:5c:aa:0f:dc brd ff:ff:ff:ff:ff:ff 175: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master virbr0 state UNKNOWN group default qlen 1000 link/ether fe:2a:ab:17:b9:2d brd ff:ff:ff:ff:ff:ff inet6 fe80::fc2a:abff:fe17:b92d/64 scope link valid_lft forever preferred_lft forever default via 192.168.0.1 dev br0 10.0.0.0/8 via 192.168.0.254 dev br0 proto zebra metric 20 169.254.0.0/16 dev br0 scope link metric 1004 192.168.0.0/24 dev br0 proto kernel scope link src 192.168.0.138 192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 Name State Autostart Persistent ---------------------------------------------------------- default active yes yes
Okay, that looks better - the tap device exists and is attached to the bridge at least. I'd still say it must be related to use of the qemu-bridge-helper though. Does the same thing happen if you install the latest upstream libvirt on something older than rawhide? What if you build and install the latest upstream qemu/qemu-bridge-helper on that same system?
Used tcpdump on the tap0 interface and I can see the guest making multiple DHCP requests like the one below, but there are no replies. 08:35:22.072345 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:67:a4:5d (oui Unknown), length 300
I recompiled qemu & qemu-bridge-helper from source (latest git) but it has the same problem, so I guess it's not qemu or any aspect of how qemu is installed. In the process of doing this I found out the command line that libvirt uses when calling qemu-bridge-helper: /usr/libexec/qemu-bridge-helper --use-vnet --br=virbr0 --fd=27 which all seems fine.
Other things to try: * run tcpdump on virbr0 to see if the packets are being received on virbr0 after coming in on tap0 * run a loop of something like: iptables -S -v -Z | grep -e "-c" | grep -v -e "-c 0 0" to look at what iptables rules are being hit while the dhcp requests are being sent. * run "brctl showstp virbr0" to see if the tap0 port is in state "forwarding", or if it's still learning. * Try changing the config of the network to disable stp (rather than just setting the forward delay to 0, which actually sets it to some value a bit higher than 0 due to limits set in the kernel). * set a static IP address in the guest and see if it's able to communicate.
On virbr0 during dhclient hanging: 21:03:43.948674 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 21:03:43.950437 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:13:02:9e (oui Unknown), length 300 21:03:43.952135 ARP, Request who-has 192.168.122.133 tell trick.home.annexia.org, length 28 21:03:44.004555 IP6 :: > ff02::1:ff13:29e: ICMP6, neighbor solicitation, who has fe80::5054:ff:fe13:29e, length 32 21:03:44.274523 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 21:03:45.005523 ARP, Request who-has 192.168.122.133 tell trick.home.annexia.org, length 28 21:03:45.043216 IP6 fe80::5054:ff:fe13:29e > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 21:03:45.043270 IP6 fe80::5054:ff:fe13:29e > ff02::2: ICMP6, router solicitation, length 16 21:03:45.875166 IP6 fe80::5054:ff:fe13:29e > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 21:03:46.029511 ARP, Request who-has 192.168.122.133 tell trick.home.annexia.org, length 28 21:03:46.206703 IP trick.home.annexia.org > 192.168.122.133: ICMP echo request, id 2389, seq 0, length 28 21:03:46.207575 IP trick.home.annexia.org.bootps > 192.168.122.133.bootpc: BOOTP/DHCP, Reply, length 300 21:03:46.208765 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:13:02:9e (oui Unknown), length 300 21:03:48.946704 IP6 fe80::5054:ff:fe13:29e > ff02::2: ICMP6, router solicitation, length 16 21:03:51.105656 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:13:02:9e (oui Unknown), length 300 21:03:51.533508 ARP, Request who-has 192.168.122.133 tell trick.home.annexia.org, length 28 21:03:52.557512 ARP, Request who-has 192.168.122.133 tell trick.home.annexia.org, length 28 21:03:53.581505 ARP, Request who-has 192.168.122.133 tell trick.home.annexia.org, length 28 21:03:56.882484 IP6 fe80::5054:ff:fe13:29e > ff02::2: ICMP6, router solicitation, length 16 21:03:57.486386 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:13:02:9e (oui Unknown), length 300 21:04:04.843578 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:13:02:9e (oui Unknown), length 300 21:04:13.266398 IP6 fe80::5054:ff:fe13:29e > ff02::2: ICMP6, router solicitation, length 16 21:04:14.737897 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:13:02:9e (oui Unknown), length 300 21:04:25.242672 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:13:02:9e (oui Unknown), length 300 21:04:43.482099 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:13:02:9e (oui Unknown), length 300 21:04:45.522374 IP6 fe80::5054:ff:fe13:29e > ff02::2: ICMP6, router solicitation, length 16 The logs from dnsmasq corresponding to that interaction are: Oct 19 21:03:46 trick.home.annexia.org dnsmasq-dhcp[1580]: DHCPDISCOVER(virbr0) 52:54:00:13:02:9e Oct 19 21:03:46 trick.home.annexia.org dnsmasq-dhcp[1580]: DHCPOFFER(virbr0) 192.168.122.133 52:54:00:13:02:9e It looks as if the DHCP replies (and only the replies, very selectively) get deleted. The host certainly acts as if .122.133 has been assigned, as in it starts to ping it.
There's a lot of noise in the iptables output, but as I understand it the relevant rules that are hit during DHCP negotiation are: -A INPUT -i virbr0 -p udp -m udp --dport 67 -c 1 328 -j ACCEPT -A OUTPUT -o virbr0 -p udp -m udp --dport 68 -c 1 328 -j ACCEPT (Other rules are hit too, but those other rules seem to be hit much of the time) The full list of rules is huge ... # iptables -S -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT -N FORWARD_IN_ZONES -N FORWARD_IN_ZONES_SOURCE -N FORWARD_OUT_ZONES -N FORWARD_OUT_ZONES_SOURCE -N FORWARD_direct -N FWDI_FedoraServer -N FWDI_FedoraServer_allow -N FWDI_FedoraServer_deny -N FWDI_FedoraServer_log -N FWDO_FedoraServer -N FWDO_FedoraServer_allow -N FWDO_FedoraServer_deny -N FWDO_FedoraServer_log -N INPUT_ZONES -N INPUT_ZONES_SOURCE -N INPUT_direct -N IN_FedoraServer -N IN_FedoraServer_allow -N IN_FedoraServer_deny -N IN_FedoraServer_log -N OUTPUT_direct -A INPUT -i virbr0 -p udp -m udp --dport 53 -j ACCEPT -A INPUT -i virbr0 -p tcp -m tcp --dport 53 -j ACCEPT -A INPUT -i virbr0 -p udp -m udp --dport 67 -j ACCEPT -A INPUT -i virbr0 -p tcp -m tcp --dport 67 -j ACCEPT -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -j INPUT_direct -A INPUT -j INPUT_ZONES_SOURCE -A INPUT -j INPUT_ZONES -A INPUT -m conntrack --ctstate INVALID -j DROP -A INPUT -j REJECT --reject-with icmp-host-prohibited -A FORWARD -d 192.168.122.0/24 -o virbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -s 192.168.122.0/24 -i virbr0 -j ACCEPT -A FORWARD -i virbr0 -o virbr0 -j ACCEPT -A FORWARD -o virbr0 -j REJECT --reject-with icmp-port-unreachable -A FORWARD -i virbr0 -j REJECT --reject-with icmp-port-unreachable -A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -i lo -j ACCEPT -A FORWARD -j FORWARD_direct -A FORWARD -j FORWARD_IN_ZONES_SOURCE -A FORWARD -j FORWARD_IN_ZONES -A FORWARD -j FORWARD_OUT_ZONES_SOURCE -A FORWARD -j FORWARD_OUT_ZONES -A FORWARD -m conntrack --ctstate INVALID -j DROP -A FORWARD -j REJECT --reject-with icmp-host-prohibited -A OUTPUT -o virbr0 -p udp -m udp --dport 68 -j ACCEPT -A OUTPUT -j OUTPUT_direct -A FORWARD_IN_ZONES -i br0 -j FWDI_FedoraServer -A FORWARD_IN_ZONES -j FWDI_FedoraServer -A FORWARD_OUT_ZONES -o br0 -j FWDO_FedoraServer -A FORWARD_OUT_ZONES -j FWDO_FedoraServer -A FWDI_FedoraServer -j FWDI_FedoraServer_log -A FWDI_FedoraServer -j FWDI_FedoraServer_deny -A FWDI_FedoraServer -j FWDI_FedoraServer_allow -A FWDI_FedoraServer -p icmp -j ACCEPT -A FWDO_FedoraServer -j FWDO_FedoraServer_log -A FWDO_FedoraServer -j FWDO_FedoraServer_deny -A FWDO_FedoraServer -j FWDO_FedoraServer_allow -A INPUT_ZONES -i br0 -j IN_FedoraServer -A INPUT_ZONES -j IN_FedoraServer -A IN_FedoraServer -j IN_FedoraServer_log -A IN_FedoraServer -j IN_FedoraServer_deny -A IN_FedoraServer -j IN_FedoraServer_allow -A IN_FedoraServer -p icmp -j ACCEPT -A IN_FedoraServer_allow -p tcp -m tcp --dport 22 -m conntrack --ctstate NEW -j ACCEPT -A IN_FedoraServer_allow -p udp -m udp --dport 520 -m conntrack --ctstate NEW -j ACCEPT
tap0 is state "forwarding" the whole time. virbr0 bridge id 8000.525400cbf757 designated root 8000.525400cbf757 root port 0 path cost 0 max age 20.00 bridge max age 20.00 hello time 2.00 bridge hello time 2.00 forward delay 2.00 bridge forward delay 2.00 ageing time 300.00 hello timer 0.31 tcn timer 0.00 topology change timer 0.00 gc timer 23.81 flags tap0 (2) port id 8002 state forwarding designated root 8000.525400cbf757 path cost 100 designated bridge 8000.525400cbf757 message age timer 0.00 designated port 8002 forward delay timer 0.00 designated cost 0 hold timer 0.00 flags virbr0-nic (1) port id 8001 state disabled designated root 8000.525400cbf757 path cost 100 designated bridge 8000.525400cbf757 message age timer 0.00 designated port 8001 forward delay timer 0.00 designated cost 0 hold timer 0.00 flags
(In reply to Laine Stump from comment #17) > * Try changing the config of the network to disable stp (rather than just > setting the forward delay to 0, which actually sets it to some value a bit > higher than 0 due to limits set in the kernel). Assuming that means: # echo 0 > /sys/class/net/virbr0/bridge/stp_state then no, it does not fix the problem. > * set a static IP address in the guest and see if it's able to communicate. Yes the network works if I set a static IP address (only able to test simple HTTP connections).
About the tcpdump output in comment 18 - I don't see the DHCP Offer ever going on the line. Also, There are several ARP requests for 192.168.122.133 but no replies, and then it begins sending ICMP packets anyway. Something is messed up there. The ARP requests *could* be dhcp checking to see if the address it's about to give out is already allocated to somebody (or maybe not). This smells like a *very* old (RHEL6 host with RHEL5 guest) bug where the checksums on DHCP packets were incorrect because DHCP was bypassing part of the network stack, and the virtio drivers (guest or host I forget which) were also bypassing some stuff and the result was that the packets were never checksummed. Based on that experience, I would suggest setting the guest to use a different ethernet (maybe e1000) and see if it begins to work. Beyond that, you *might* get some more useful information by adding -vv to tcpdump to see exactly what's being sent in the DHCP offer - is the guest suggesting 192.168.122.133? It's been too long since I messed around with DHCP, but I seem to remember that changes behavior (like maybe the response isn't a broadcast packet or something? I forget now).
Here's a DHCP *request* (not offer) on virbr0: 08:41:44.344170 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328) 0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request from 52:54:00:c6:6a:3d (oui Unknown), length 300, xid 0x965e8e3a, secs 59, Flags [none] (0x0000) Client-Ethernet-Address 52:54:00:c6:6a:3d (oui Unknown) Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: Discover Parameter-Request Option 55, length 13: Subnet-Mask, BR, Time-Zone, Classless-Static-Route Domain-Name, Domain-Name-Server, Hostname, YD YS, NTP, MTU, Option 119 Default-Gateway Client-ID Option 61, length 23: hardware-type 255, 00:c6:6a:3d:00:04:10:97:dd:7e:c5:83:48:ae:94:04:52:9f:4d:85:57:f3 As you say, no DHCP offers have been seen on any interface so I can't really say more about what's in them. I wonder if dnsmasq is unable to call send(2) ...
Did you notice how although the guest sends out several DHCP requests, dnsmasq was only printing a single pair of log messages for DHCPDISCOVER/ DHCPOFFER? Well it turns out that dnsmasq only responds to the first request. It sends out the reply: 08:52:08 sendmsg(3, {msg_name={sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("192.168.122.98")}, msg_namelen=16, msg_iov=[{iov_base="\2\1\6\0\33\305&\0\0\0\0\0\0\0\0\0\300\250zb\300\250z\1\0\0\0\0RT\0\307\354L\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0c\202Sc5\1\0026\4\300\250z\0013\4\0\0\16\20:\4\0\0\7\10;\4\0\0\fN\1\4\377\377\377\0\34\4\300\250z\377\3\4\300\250z\1\6\4\300\250z\1\377\0\0\0\0\0\0\0\0", iov_len=300}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("virbr0"), ipi_spec_dst=inet_addr("0.0.0.0"), ipi_addr=inet_addr("255.255.255.255")}}], msg_controllen=28, msg_flags=0}, 0) = 300 and then it waits the rest of the time in poll. It's even polling fd=3 which is supposed to be listening on virbr0 port 68, but nothing else is received: 08:52:08 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLIN}]) 08:52:53 fcntl(4, F_GETFL) = 0x2 (flags O_RDWR) ^^ NB about 45 seconds later
Upgrading dnsmasq (to dnsmasq-2.78-1.fc28.x86_64) and restarting the libvirt default network did not help. I verified that dnsmasq was restarted. This is looking like a kernel bug.
Created attachment 1341159 [details] strace of dnsmasq during hang Full strace of dnsmasq during the whole process. Note the times in the left hand column, and lines matching \b68\b which contain the received and sent packets.
Upgraded 4.14.0-0.rc3.git3.1.fc28.x86_64 -> 4.14.0-0.rc5.git2.1.fc28 It is now working, although since I had to reboot to upgrade the kernel, and since rebooting has sometimes been observed to "fix" this before, I'm not going to draw any firm conclusions until the machine has been running for a while.
Hey Rich - have you ever again seen this behavior? If not, I think 2 years is long enough to safely close this BZ :-)
While I do have problems with libvirt networking on new installed (usually because the right libvirt-<whatever>-network package isn't installed) I have not seen this particular problem for a while so I will close this.