Bug 1983278
Summary: | ospfd crashes in route_node_delete with assertion fail | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Ian Donaldson <iand> |
Component: | frr | Assignee: | Michal Ruprich <mruprich> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 34 | CC: | mruprich |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | frr-7.5.1-3.fc34 frr-7.5.1-3.fc33 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-29 01:06:57 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ian Donaldson
2021-07-17 08:22:10 UTC
# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet AA.BB.CC.DD/32 brd 46.31.247.85 scope global lo:1 valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:50:56:90:47:4a brd ff:ff:ff:ff:ff:ff altname enp2s0 inet 192.168.134.136/25 brd 192.168.134.255 scope global ens32 valid_lft forever preferred_lft forever inet6 fe80::250:56ff:fe90:474a/64 scope link valid_lft forever preferred_lft forever 3: ip_vti0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 4: ipsec0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000 link/ipip 192.168.134.136 peer 60.242.130.220 inet 192.168.129.118/30 scope global ipsec0 valid_lft forever preferred_lft forever inet6 fe80::5efe:c0a8:8688/64 scope link valid_lft forever preferred_lft forever If I comment out this line: network 192.168.129.116/30 area 0 which relates to an ipsec tunnnel, ospfd doesn't crash, but no tunnel related routes are learned. Just noticed I had blocked stronswan from updating since fc31; now updated to strongswan-5.9.3-1.fc34.x86_64 and strongswan itself is still working ok; just ospfd issues remain. I manually rebuilt the package from frr-7.5.1-2.fc34.src.rpm, and just installed the resultant ospfd; it crashes in the same way. Recompiled just ospfd without -O2 in order to debug it, and it works fine! Doh. Hi Ian, thanks for the report, there is definitely something wrong in the route_node_delete, I was able to reproduce a similar coredump just with this config: ! router ospf ospf router-id 10.10.10.10 network 10.16.40.0/21 area 0 ! Than it will crash if you simply try to query the ospf interfaces like this: # sh ip ospf interface vtysh: error reading from ospfd: Success (0)Warning: closing connection to ospfd because of an I/O error! # Coredump is a little bit different but still in the route_node_delete: @ coredumpctl debug PID: 1747 (ospfd) UID: 990 (frr) GID: 985 (frr) Signal: 6 (ABRT) Timestamp: Mon 2021-07-19 04:49:53 EDT (36s ago) Command Line: /usr/libexec/frr/ospfd -d -F traditional -A 127.0.0.1 Executable: /usr/libexec/frr/ospfd Control Group: /system.slice/frr.service Unit: frr.service Slice: system.slice Boot ID: 891d0c7947384279b4300c6e466c617e Machine ID: 95ac659c50334abd8fabbf02b850fdfb Hostname: rdma-dev-19.lab.bos.redhat.com Storage: /var/lib/systemd/coredump/core.ospfd.990.891d0c7947384279b4300c6e466c617e.1747.1626684593000000.zst (present) Disk Size: 486.7K Message: Process 1747 (ospfd) of user 990 dumped core. Stack trace of thread 1747: #0 0x00007f3e943ae2a2 raise (libc.so.6 + 0x3d2a2) #1 0x00007f3e943978a4 abort (libc.so.6 + 0x268a4) #2 0x00007f3e94397789 __assert_fail_base.cold (libc.so.6 + 0x26789) #3 0x00007f3e943a6a16 __assert_fail (libc.so.6 + 0x35a16) #4 0x00007f3e94913d9a route_node_delete (libfrr.so.0 + 0x8dd9a) #5 0x00007f3e94913e09 route_next (libfrr.so.0 + 0x8de09) #6 0x00005594b193cd54 show_ip_ospf_interface_sub (ospfd + 0x4dd54) #7 0x00005594b193d80b show_ip_ospf_interface_common (ospfd + 0x4e80b) #8 0x00005594b193db9b show_ip_ospf_interface.lto_priv.0 (ospfd + 0x4eb9b) #9 0x00007f3e949372ec cmd_execute_command_real.constprop.0 (libfrr.so.0 + 0xb12ec) #10 0x00007f3e948c24a1 cmd_execute_command (libfrr.so.0 + 0x3c4a1) #11 0x00007f3e948c2600 cmd_execute (libfrr.so.0 + 0x3c600) #12 0x00007f3e949215d5 vty_command (libfrr.so.0 + 0x9b5d5) #13 0x00007f3e94921781 vty_execute (libfrr.so.0 + 0x9b781) #14 0x00007f3e94923a80 vtysh_read (libfrr.so.0 + 0x9da80) #15 0x00007f3e94919f57 thread_call (libfrr.so.0 + 0x93f57) #16 0x00007f3e948e7fc8 frr_run (libfrr.so.0 + 0x61fc8) #17 0x00005594b1907b72 main (ospfd + 0x18b72) #18 0x00007f3e94398b75 __libc_start_main (libc.so.6 + 0x27b75) #19 0x00005594b1907f8e _start (ospfd + 0x18f8e) Seems like this could be it: https://github.com/FRRouting/frr/issues/8595 Let me try the patch. If you are referring to the commenting out of this: assert(node->info == NULL); I suspect that will stop the crash, I suspect that's just hiding another underlying bug that will bite later. table.[ch] hasn't changed between 7.4 and 7.5.1 but a lot of other code has. BTW The gear I'm interacting with on the other end of the IPSec tunnel is a Cisco 1941/k9, IOS 15.7(3)M2 Ian, can you try this test package if it actually helps? https://koji.fedoraproject.org/koji/taskinfo?taskID=72177563 Thanks, Michal Not sure what you want me to run here; I tried the frr-7.5.1-5.fc35.x86_64 package in there but it wouldn't install due to lots of dependency issues referencing fc35 packages. Using --nodeps overcame that but it won't start... ul 19 12:12:12 vm2.jer.ekorp.com frrinit.sh[119085]: /usr/libexec/frr/watchfrr: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /usr/libexec/frr/watchfrr) Jul 19 12:12:12 vm2.jer.ekorp.com frrinit.sh[119085]: /usr/libexec/frr/watchfrr: /lib64/libjson-c.so.5: no version information available (required by /usr/lib64/frr/libfrr.so.0) Jul 19 12:12:12 vm2.jer.ekorp.com frrinit.sh[119085]: /usr/libexec/frr/watchfrr: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /usr/lib64/frr/libfrr.so.0) Jul 19 12:12:12 vm2.jer.ekorp.com frrinit.sh[119081]: Failed to start watchfrr! so I've undone all that. Sorry, I ran the scratch build of the test package for rawhide instead of f34, hence the weird dependencies. This should work with F34: https://koji.fedoraproject.org/koji/taskinfo?taskID=72185690 Installed frr-7.5.1-5.fc34.x86_64 and it seems to run fine; behaving as normal! Restarted strongswan as further test of route recovery and it recovered fine. Many thanks. Looked further into the patches related to pure functions with -O2 optimization and now understand more how this could have occurred. Good spot. Thanks for the help, I am going to push a fixed version today. Regards, Michal FEDORA-2021-aa7a6a45ae has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-aa7a6a45ae FEDORA-2021-aa7a6a45ae has been pushed to the Fedora 33 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-aa7a6a45ae` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-aa7a6a45ae See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2021-d8feeaf3bf has been pushed to the Fedora 34 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-d8feeaf3bf` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-d8feeaf3bf See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2021-d8feeaf3bf (frr-7.5.1-3.fc34.x86_64) seems fine FEDORA-2021-d8feeaf3bf has been pushed to the Fedora 34 stable repository. If problem still persists, please make note of it in this bug report. FEDORA-2021-aa7a6a45ae has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report. |