Description of problem: ospfd won't start; dumps core Version-Release number of selected component (if applicable): frr-7.5.1-2.fc34.x86_64 How reproducible: 100% Steps to Reproduce: 1. systemctl restart frr 2. observe ospfd not running 3. coredumpctl list - shows core dump Actual results: ospfd not running Expected results: ospfd running and performing route updates Additional info: Using ospfd with local ethernet and ipsec tunnels from strongswan-5.8.2-3.fc31.x86_64m ospfd didn't crash in frr-7.4-1.fc31.x86_64 and frr-7.4-1.fc32.x86_64 but did in frr-7.5.1-2.fc33.x86_64, frr-7.5.1-2.fc34.x86_64 # coredumpctl debug PID: 6388 (ospfd) UID: 887 (frr) GID: 664 (frr) Signal: 6 (ABRT) Timestamp: Sat 2021-07-17 08:10:37 GMT (4min 42s ago) Command Line: /usr/libexec/frr/ospfd -d -F traditional -A 127.0.0.1 Executable: /usr/libexec/frr/ospfd Control Group: /system.slice/frr.service Unit: frr.service Slice: system.slice Boot ID: 813a7673f57a426bac5fa21b2abb8b57 Machine ID: fe66443e06134cf8bec78f57fc67cf7a Hostname: HOSTNAME Storage: /var/lib/systemd/coredump/core.ospfd.887.813a7673f57a426bac5fa21b2abb8b57.6388.1626509437000000.zst (present) Disk Size: 451.2K Message: Process 6388 (ospfd) of user 887 dumped core. Stack trace of thread 6388: #0 0x00007f5f5db8f2a2 __GI_raise (libc.so.6 + 0x3d2a2) #1 0x00007f5f5db788a4 __GI_abort (libc.so.6 + 0x268a4) #2 0x00007f5f5db78789 __assert_fail_base (libc.so.6 + 0x26789) #3 0x00007f5f5db87a16 __GI___assert_fail (libc.so.6 + 0x35a16) #4 0x00007f5f5e0f2d9a route_node_delete (libfrr.so.0 + 0x8dd9a) #5 0x00007f5f5e0f2e09 route_next (libfrr.so.0 + 0x8de09) #6 0x000055ed57777273 ospf_nbr_lookup_ptop (ospfd + 0x1e273) #7 0x000055ed577780ed ospf_router_lsa_new.lto_priv.0 (ospfd + 0x1f0ed) #8 0x000055ed57785d97 ospf_lsa_refresh (ospfd + 0x2cd97) #9 0x000055ed5777bb9f ospf_ism_event (ospfd + 0x22b9f) #10 0x00007f5f5e0f8f57 thread_call (libfrr.so.0 + 0x93f57) #11 0x00007f5f5e0c6fc8 frr_run (libfrr.so.0 + 0x61fc8) #12 0x000055ed57771b72 main (ospfd + 0x18b72) #13 0x00007f5f5db79b75 __libc_start_main (libc.so.6 + 0x27b75) #14 0x000055ed57771f8e _start (ospfd + 0x18f8e) GNU gdb (GDB) Fedora 10.2-3.fc34 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/libexec/frr/ospfd... Reading symbols from /usr/lib/debug/usr/libexec/frr/ospfd-7.5.1-2.fc34.x86_64.debug... [New LWP 6388] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/libexec/frr/ospfd -d -F traditional -A 127.0.0.1'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 49 return ret; (gdb) where #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 #1 0x00007f5f5db788a4 in __GI_abort () at abort.c:79 #2 0x00007f5f5db78789 in __assert_fail_base (fmt=<optimized out>, assertion=<optimized out>, file=<optimized out>, line=<optimized out>, function=<optimized out>) at assert.c:92 #3 0x00007f5f5db87a16 in __GI___assert_fail (assertion=assertion@entry=0x7f5f5e13681a "node->info == NULL", file=file@entry=0x7f5f5e136759 "lib/table.c", line=line@entry=344, function=function@entry=0x7f5f5e136240 <__PRETTY_FUNCTION__.6.lto_priv.2> "route_node_delete") at assert.c:101 #4 0x00007f5f5e0f2d9a in route_node_delete (node=<optimized out>) at lib/table.c:344 #5 0x00007f5f5e0f2e09 in route_unlock_node (node=<optimized out>) at lib/table.h:262 #6 route_next (node=<optimized out>) at lib/table.c:433 #7 route_next (node=<optimized out>) at lib/table.c:402 #8 0x000055ed57777273 in ospf_nbr_count (state=<optimized out>, oi=<optimized out>) at ospfd/ospf_neighbor.c:291 #9 ospf_nbr_lookup_ptop (oi=0x55ed5969dd10) at ospfd/ospf_lsa.c:402 #10 0x000055ed577780ed in lsa_link_ptop_set (oi=0x55ed5969dd10, s=0x7fffb852e9d0) at ospfd/ospf_lsa.c:482 #11 router_lsa_link_set (area=0x55ed5969b710, area=0x55ed5969b710, s=0x7fffb852e9d0) at ospfd/ospf_lsa.c:656 #12 ospf_router_lsa_body_set (area=0x55ed5969b710, s=0x7fffb852e9d0) at ospfd/ospf_lsa.c:701 #13 ospf_router_lsa_new (area=0x55ed5969b710) at ospfd/ospf_lsa.c:789 #14 0x000055ed57785d97 in ospf_router_lsa_refresh (lsa=0x55ed5969af50) at ospfd/ospf_lsa.c:864 #15 ospf_lsa_refresh (ospf=<optimized out>, lsa=0x55ed5969af50) at ospfd/ospf_lsa.c:3455 #16 0x000055ed5777bb9f in ospf_router_lsa_update_area (area=<optimized out>) at ospfd/ospf_lsa.c:893 #17 ism_change_state (state=4, oi=0x55ed5969dd10) at ospfd/ospf_ism.c:556 #18 ospf_ism_event (thread=<optimized out>) at ospfd/ospf_ism.c:600 #19 0x00007f5f5e0f8f57 in thread_call (thread=0x7fffb852ecb0) at lib/thread.c:1628 #20 0x00007f5f5e0c6fc8 in frr_run (master=0x55ed59486df0) at lib/libfrr.c:1099 #21 0x000055ed57771b72 in main (argc=6, argv=<optimized out>) at ospfd/ospf_main.c:238 frr.conf: --- hostname HOSTNAME password HIDDEN log file /var/log/frr/ospfd.log interface ens32 #ip address 192.168.134.136/25 <-- wasn't invalid in 7.4-1 ip ospf authentication message-digest ip ospf message-digest-key 1 md5 HIDDEN interface ipsec0 ip ospf authentication message-digest ip ospf message-digest-key 1 md5 HIDDEN router ospf ospf router-id 192.168.134.136 area 0 authentication message-digest redistribute static metric-type 1 network 192.168.134.0/25 area 0 network 192.168.134.128/25 area 0 network 192.168.129.116/30 area 0 access-list localhost permit 127.0.0.1/32 access-list localhost deny any line vty ---
# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet AA.BB.CC.DD/32 brd 46.31.247.85 scope global lo:1 valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:50:56:90:47:4a brd ff:ff:ff:ff:ff:ff altname enp2s0 inet 192.168.134.136/25 brd 192.168.134.255 scope global ens32 valid_lft forever preferred_lft forever inet6 fe80::250:56ff:fe90:474a/64 scope link valid_lft forever preferred_lft forever 3: ip_vti0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 4: ipsec0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000 link/ipip 192.168.134.136 peer 60.242.130.220 inet 192.168.129.118/30 scope global ipsec0 valid_lft forever preferred_lft forever inet6 fe80::5efe:c0a8:8688/64 scope link valid_lft forever preferred_lft forever
If I comment out this line: network 192.168.129.116/30 area 0 which relates to an ipsec tunnnel, ospfd doesn't crash, but no tunnel related routes are learned.
Just noticed I had blocked stronswan from updating since fc31; now updated to strongswan-5.9.3-1.fc34.x86_64 and strongswan itself is still working ok; just ospfd issues remain.
I manually rebuilt the package from frr-7.5.1-2.fc34.src.rpm, and just installed the resultant ospfd; it crashes in the same way. Recompiled just ospfd without -O2 in order to debug it, and it works fine! Doh.
Hi Ian, thanks for the report, there is definitely something wrong in the route_node_delete, I was able to reproduce a similar coredump just with this config: ! router ospf ospf router-id 10.10.10.10 network 10.16.40.0/21 area 0 ! Than it will crash if you simply try to query the ospf interfaces like this: # sh ip ospf interface vtysh: error reading from ospfd: Success (0)Warning: closing connection to ospfd because of an I/O error! # Coredump is a little bit different but still in the route_node_delete: @ coredumpctl debug PID: 1747 (ospfd) UID: 990 (frr) GID: 985 (frr) Signal: 6 (ABRT) Timestamp: Mon 2021-07-19 04:49:53 EDT (36s ago) Command Line: /usr/libexec/frr/ospfd -d -F traditional -A 127.0.0.1 Executable: /usr/libexec/frr/ospfd Control Group: /system.slice/frr.service Unit: frr.service Slice: system.slice Boot ID: 891d0c7947384279b4300c6e466c617e Machine ID: 95ac659c50334abd8fabbf02b850fdfb Hostname: rdma-dev-19.lab.bos.redhat.com Storage: /var/lib/systemd/coredump/core.ospfd.990.891d0c7947384279b4300c6e466c617e.1747.1626684593000000.zst (present) Disk Size: 486.7K Message: Process 1747 (ospfd) of user 990 dumped core. Stack trace of thread 1747: #0 0x00007f3e943ae2a2 raise (libc.so.6 + 0x3d2a2) #1 0x00007f3e943978a4 abort (libc.so.6 + 0x268a4) #2 0x00007f3e94397789 __assert_fail_base.cold (libc.so.6 + 0x26789) #3 0x00007f3e943a6a16 __assert_fail (libc.so.6 + 0x35a16) #4 0x00007f3e94913d9a route_node_delete (libfrr.so.0 + 0x8dd9a) #5 0x00007f3e94913e09 route_next (libfrr.so.0 + 0x8de09) #6 0x00005594b193cd54 show_ip_ospf_interface_sub (ospfd + 0x4dd54) #7 0x00005594b193d80b show_ip_ospf_interface_common (ospfd + 0x4e80b) #8 0x00005594b193db9b show_ip_ospf_interface.lto_priv.0 (ospfd + 0x4eb9b) #9 0x00007f3e949372ec cmd_execute_command_real.constprop.0 (libfrr.so.0 + 0xb12ec) #10 0x00007f3e948c24a1 cmd_execute_command (libfrr.so.0 + 0x3c4a1) #11 0x00007f3e948c2600 cmd_execute (libfrr.so.0 + 0x3c600) #12 0x00007f3e949215d5 vty_command (libfrr.so.0 + 0x9b5d5) #13 0x00007f3e94921781 vty_execute (libfrr.so.0 + 0x9b781) #14 0x00007f3e94923a80 vtysh_read (libfrr.so.0 + 0x9da80) #15 0x00007f3e94919f57 thread_call (libfrr.so.0 + 0x93f57) #16 0x00007f3e948e7fc8 frr_run (libfrr.so.0 + 0x61fc8) #17 0x00005594b1907b72 main (ospfd + 0x18b72) #18 0x00007f3e94398b75 __libc_start_main (libc.so.6 + 0x27b75) #19 0x00005594b1907f8e _start (ospfd + 0x18f8e)
Seems like this could be it: https://github.com/FRRouting/frr/issues/8595 Let me try the patch.
If you are referring to the commenting out of this: assert(node->info == NULL); I suspect that will stop the crash, I suspect that's just hiding another underlying bug that will bite later. table.[ch] hasn't changed between 7.4 and 7.5.1 but a lot of other code has.
BTW The gear I'm interacting with on the other end of the IPSec tunnel is a Cisco 1941/k9, IOS 15.7(3)M2
Ian, can you try this test package if it actually helps? https://koji.fedoraproject.org/koji/taskinfo?taskID=72177563 Thanks, Michal
Not sure what you want me to run here; I tried the frr-7.5.1-5.fc35.x86_64 package in there but it wouldn't install due to lots of dependency issues referencing fc35 packages. Using --nodeps overcame that but it won't start... ul 19 12:12:12 vm2.jer.ekorp.com frrinit.sh[119085]: /usr/libexec/frr/watchfrr: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /usr/libexec/frr/watchfrr) Jul 19 12:12:12 vm2.jer.ekorp.com frrinit.sh[119085]: /usr/libexec/frr/watchfrr: /lib64/libjson-c.so.5: no version information available (required by /usr/lib64/frr/libfrr.so.0) Jul 19 12:12:12 vm2.jer.ekorp.com frrinit.sh[119085]: /usr/libexec/frr/watchfrr: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /usr/lib64/frr/libfrr.so.0) Jul 19 12:12:12 vm2.jer.ekorp.com frrinit.sh[119081]: Failed to start watchfrr! so I've undone all that.
Sorry, I ran the scratch build of the test package for rawhide instead of f34, hence the weird dependencies. This should work with F34: https://koji.fedoraproject.org/koji/taskinfo?taskID=72185690
Installed frr-7.5.1-5.fc34.x86_64 and it seems to run fine; behaving as normal! Restarted strongswan as further test of route recovery and it recovered fine. Many thanks.
Looked further into the patches related to pure functions with -O2 optimization and now understand more how this could have occurred. Good spot.
Thanks for the help, I am going to push a fixed version today. Regards, Michal
FEDORA-2021-aa7a6a45ae has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-aa7a6a45ae
FEDORA-2021-aa7a6a45ae has been pushed to the Fedora 33 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-aa7a6a45ae` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-aa7a6a45ae See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2021-d8feeaf3bf has been pushed to the Fedora 34 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-d8feeaf3bf` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-d8feeaf3bf See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2021-d8feeaf3bf (frr-7.5.1-3.fc34.x86_64) seems fine
FEDORA-2021-d8feeaf3bf has been pushed to the Fedora 34 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2021-aa7a6a45ae has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report.