In the last two months we have started to see a fair number of virtinterfaced crashes [1] in cockpit-machines tests, on Fedora 38 and 39 (we only just started testing on 40, but likely they happen there as well). As c-machines doesn't talk to virtinterfaced directly, it doesn't actually break the UI, but the tests also watch out for unexpected journal messages like crashes. One of the two common libvirt crashes happens in e.g. [2][3][4][5]. All of these have "journal" links to the full systemd journals. The stack trace of the relevant thread looks like this, with udevInterfaceLookupByMACString() being the key part: Stack trace of thread 2381: #0 0x00007f7b8a78e834 __pthread_kill_implementation (libc.so.6 + 0x90834) #1 0x00007f7b8a73c8ee raise (libc.so.6 + 0x3e8ee) #2 0x00007f7b8a7248ff abort (libc.so.6 + 0x268ff) #3 0x00007f7b8a7257d0 __libc_message.cold (libc.so.6 + 0x277d0) #4 0x00007f7b8a7987a5 malloc_printerr (libc.so.6 + 0x9a7a5) #5 0x00007f7b8a79bdac _int_malloc (libc.so.6 + 0x9ddac) #6 0x00007f7b8a79cd24 __libc_malloc (libc.so.6 + 0x9ed24) #7 0x00007f7b890c51fb strv_new_ap (libudev.so.1 + 0x151fb) #8 0x00007f7b890d0001 strv_new_internal (libudev.so.1 + 0x20001) #9 0x00007f7b890bac50 update_match_strv (libudev.so.1 + 0xac50) #10 0x00007f7b890bddf0 udev_enumerate_add_match_sysattr (libudev.so.1 + 0xddf0) #11 0x00007f7b890eb634 udevInterfaceLookupByMACString (libvirt_driver_interface.so + 0x3634) #12 0x00007f7b8af5ce17 virInterfaceLookupByMACString (libvirt.so.0 + 0x35ce17) #13 0x000056105558eaad remoteDispatchInterfaceLookupByMACStringHelper.lto_priv.0 (virtinterfaced + 0x57aad) #14 0x00007f7b8ae280c5 virNetServerProgramDispatch (libvirt.so.0 + 0x2280c5) #15 0x00007f7b8ae2eb23 virNetServerProcessMsg (libvirt.so.0 + 0x22eb23) #16 0x00007f7b8ae2ec36 virNetServerHandleJob (libvirt.so.0 + 0x22ec36) #17 0x00007f7b8ad66c45 virThreadPoolWorker (libvirt.so.0 + 0x166c45) #18 0x00007f7b8ad65e00 virThreadHelper (libvirt.so.0 + 0x165e00) #19 0x00007f7b8a78c897 start_thread (libc.so.6 + 0x8e897) #20 0x00007f7b8a81380c __clone3 (libc.so.6 + 0x11580c) This seems to be some kind of memory corruption. I've seen three different journal messages for that: virtinterfaced[1858]: Assertion '*q > 0' failed at src/libudev/libudev.c:91, function udev_ref(). Aborting. virtinterfaced[2379]: malloc(): unaligned fastbin chunk detected abrt-notification[25240]: Process 24697 (virtinterfaced) crashed in __libc_calloc() [1] https://github.com/cockpit-project/cockpit-machines/issues/1391#issuecomment-1963676507 [2] https://cockpit-logs.us-east-1.linodeobjects.com/pull-1432-20240208-104519-fcd215b4-fedora-38/log.html#57 [3] https://cockpit-logs.us-east-1.linodeobjects.com/pull-5903-20240212-230638-e54961ef-fedora-39-cockpit-project-cockpit-machines/log.html#9 [4] https://cockpit-logs.us-east-1.linodeobjects.com/pull-1399-20240124-130142-e14a91d3-fedora-38/log.html#80 [5] https://cockpit-logs.us-east-1.linodeobjects.com/pull-1429-20240208-095258-546ddf20-fedora-39-firefox/log.html#61 Reproducible: Couldn't Reproduce Steps to Reproduce: Unfortunately there is no direct reproducer. It happens sporadically, and in all tests (i.e. not one or two specific ones), which corresponds to "we don't assert anything in it directly". libvirt-9.7.0-2.fc39.src.rpm
> I've seen three different journal messages for that: I meant, it crashes with *one* of the three messages, not all three of them at the same time.
Strange. I don't understand what would be calling into virtinterfaced in the first place. virt-install/virt-xml don't use interface APIs for a long time. libvirt qemu driver doesn't call into it in any way AFAICT Upstream the driver has practically nothing but cleanups/refactorings for years, it's pretty much a historical relic. Would be interesting to try your CI with the daemon removed or socket disabled, and see if anything breaks? Maybe I missed something, or some other virt tool is waking it up.
Just occurred to me to check libvirt-dbus, I'm guessing that's the virtinterfaced client
I tested that in [1] with systemctl disable --now virtinterfaced.service virtinterfaced-admin.socket virtinterfaced-ro.socket virtinterfaced.socket and it does fail [2] a lot. There are some UI-ish bits where it shows a warning like warning: getAllInterfaces action failed: Failed to connect socket to '/var/run/libvirt/virtinterfaced-sock': No such file or directory (that happens through libvirt-dbus org.libvirt.Connect.ListInterfaces()) but it even fails in some CLI parts which call `virsh iface-list`, which probably amounts to the same thing: error: Failed to list interfaces error: Failed to connect socket to '/var/run/libvirt/virtinterfaced-sock': No such file or directory so it seems it is needed after all. [1] https://github.com/cockpit-project/cockpit-machines/pull/1475 [2] https://cockpit-logs.us-east-1.linodeobjects.com/pull-1475-20240301-143427-cc99a46e-fedora-39/log.html#65-2
Add 'Secruity' keyword because it is a daemon crash that a read-only connection could cause.
Personally I don't know any tricks here. Never had success catching heisenbugs unless I could reproduce locally under gdb Any idea when these issues first cropped up? Are these using stock libvirt distro packages, or updated bits from virt-preview? libvirt in f38 has only been updated twice in the past year (may 2023 and jan 2024) FWIW Back on the topic of cockpit and the interface driver. Do you know where the `virsh iface-list` call is coming from? IMO any practical use of the interface API udev driver can be replaced with the nodedev APIs which are much more widely used and still actively developed. Basically `virsh iface-list` is similar to `virsh nodedev-list --cap net`. The XML output is different but it provides the main important info like type, name, mac addr.
Can not reproduce by virsh cmd: Pkgs: libvirt-9.7.0-2.fc39.x86_64 qemu-kvm-8.1.3-4.fc39.x86_64 Steps: 1.Run virsh cmd using virInterfaceLookupByMACString: #virsh iface-name 34:48:ed:f8:d9:20 eno1 check libvirtd(virtqemud) log: 2024-03-04 15:13:34.490+0000: 26505: debug : virInterfaceLookupByMACString:340 : conn=0x7f4f8c003280, macstr=34:48:ed:f8:d9:20 Loop more times: # for i in $(seq 1 1000); do virsh iface-name 34:48:ed:f8:d9:20; done eno1 ... Still no coredumps: # coredumpctl list No coredumps found. Try invalid mac or lo mac, still no coredumps # for i in $(seq 1 1000); do virsh iface-name 12345-invalid-mac; done error: failed to get interface '12345-invalid-mac' ... #for i in $(seq 1 1000); do virsh iface-name 00:00:00:00:00:00; done lo ... # coredumpctl list No coredumps found.
> Any idea when these issues first cropped up? Sorry, no. They happen very rarely, so these may have existed for a long time already (coinciding with your "hasn't been updated often in F38"). > Are these using stock libvirt distro packages, or updated bits from virt-preview? Stock distro packages. > Back on the topic of cockpit and the interface driver. Do you know where the `virsh iface-list` call is coming from? That literal call is just from one place in our tests (testNetworksCreate). That didn't fail in any of the examples above. I'm happy with changing it if that's deprecated of course. The rest happens through libvirt-dbus. So yeah, I suppose that bz isn't of much value, so if it doesn't ring a bell on your side ("I've seen this traceback $here"), then feel free to close.
Last night our test run [1] found a variation: #0 0x00007fa866c84202 udevConnectListAllInterfaces (libvirt_driver_interface.so + 0x3202) #1 0x00007fa868b5c578 virConnectListAllInterfaces (libvirt.so.0 + 0x35c578) #2 0x0000558bd34751ad remoteDispatchConnectListAllInterfacesHelper (virtinterfaced + 0x331ad) #3 0x00007fa868a280c5 virNetServerProgramDispatch (libvirt.so.0 + 0x2280c5) #4 0x00007fa868a2eb23 virNetServerProcessMsg (libvirt.so.0 + 0x22eb23) #5 0x00007fa868a2ec36 virNetServerHandleJob (libvirt.so.0 + 0x22ec36) #6 0x00007fa868966c45 virThreadPoolWorker (libvirt.so.0 + 0x166c45) #7 0x00007fa868965e00 virThreadHelper (libvirt.so.0 + 0x165e00) #8 0x00007fa8682d8897 start_thread (libc.so.6 + 0x8e897) #9 0x00007fa86835f80c __clone3 (libc.so.6 + 0x11580c) But supposedly it's the same root cause, so I'm not filing yet another bz. [1] https://cockpit-logs.us-east-1.linodeobjects.com/pull-0-669ecd3a-20240312-014207-fedora-39-updates-testing/log.html#25
FEDORA-2024-1a59230214 (libvirt-9.0.0-5.fc38) has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2024-1a59230214
FEDORA-2024-d96cdeb8ec (libvirt-9.7.0-3.fc39) has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2024-d96cdeb8ec
I don't think the crash shown in virInterfaceLookupByMACString is real. I think it is a consequence of memory corruption that occurred before this method is invoked. I've issued an update for the known crash problem when listing interfaces. If we're lucky that'll fix the virInterfaceLookupByMACString traces. If not, then we'll need to do more investigation. Let us know if you continue to see crashes after updating your CI to the versions listed in the two updates above.
FEDORA-2024-1a59230214 has been pushed to the Fedora 38 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-1a59230214` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-1a59230214 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-d96cdeb8ec has been pushed to the Fedora 39 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-d96cdeb8ec` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-d96cdeb8ec See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
Thanks Daniel! Will do. We track the occurrence of this crash automatically, so we won't forget. However, it'll take a few weeks to be sure, as this crash is rare.
FEDORA-2024-d96cdeb8ec (libvirt-9.7.0-3.fc39) has been pushed to the Fedora 39 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2024-1a59230214 (libvirt-9.0.0-5.fc38) has been pushed to the Fedora 38 stable repository. If problem still persists, please make note of it in this bug report.