Bug 1983278 - ospfd crashes in route_node_delete with assertion fail
Summary: ospfd crashes in route_node_delete with assertion fail
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: frr
Version: 34
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Michal Ruprich
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-17 08:22 UTC by Ian Donaldson
Modified: 2021-07-29 02:00 UTC (History)
1 user (show)

Fixed In Version: frr-7.5.1-3.fc34 frr-7.5.1-3.fc33
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-29 01:06:57 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Ian Donaldson 2021-07-17 08:22:10 UTC
Description of problem:

ospfd won't start; dumps core

Version-Release number of selected component (if applicable):

frr-7.5.1-2.fc34.x86_64

How reproducible:

100%

Steps to Reproduce:
1. systemctl restart frr
2. observe ospfd not running
3. coredumpctl list    - shows core dump

Actual results:

ospfd not running

Expected results:

ospfd running and performing route updates


Additional info:

Using ospfd with local ethernet and ipsec tunnels 
from strongswan-5.8.2-3.fc31.x86_64m 

ospfd didn't crash in frr-7.4-1.fc31.x86_64 and frr-7.4-1.fc32.x86_64
but did in frr-7.5.1-2.fc33.x86_64, frr-7.5.1-2.fc34.x86_64


# coredumpctl debug                                                                                                        
           PID: 6388 (ospfd)                                                                                               
           UID: 887 (frr)                                                                                                  
           GID: 664 (frr)                                                                                                  
        Signal: 6 (ABRT)                                                                                                   
     Timestamp: Sat 2021-07-17 08:10:37 GMT (4min 42s ago)                                                                 
  Command Line: /usr/libexec/frr/ospfd -d -F traditional -A 127.0.0.1                                                      
    Executable: /usr/libexec/frr/ospfd                                                                                     
 Control Group: /system.slice/frr.service                                                                                  
          Unit: frr.service                                                                                                
         Slice: system.slice                                                                                               
       Boot ID: 813a7673f57a426bac5fa21b2abb8b57                                                                           
    Machine ID: fe66443e06134cf8bec78f57fc67cf7a                                                                           
      Hostname: HOSTNAME                                                                                                   
       Storage: /var/lib/systemd/coredump/core.ospfd.887.813a7673f57a426bac5fa21b2abb8b57.6388.1626509437000000.zst (present)                                                                                                                         
     Disk Size: 451.2K                                                                                                     
       Message: Process 6388 (ospfd) of user 887 dumped core.                                                              
                                                                                                                           
                Stack trace of thread 6388:                                                                                
                #0  0x00007f5f5db8f2a2 __GI_raise (libc.so.6 + 0x3d2a2)                                                    
                #1  0x00007f5f5db788a4 __GI_abort (libc.so.6 + 0x268a4)                                                    
                #2  0x00007f5f5db78789 __assert_fail_base (libc.so.6 + 0x26789)                                            
                #3  0x00007f5f5db87a16 __GI___assert_fail (libc.so.6 + 0x35a16)                                            
                #4  0x00007f5f5e0f2d9a route_node_delete (libfrr.so.0 + 0x8dd9a)                                           
                #5  0x00007f5f5e0f2e09 route_next (libfrr.so.0 + 0x8de09)                                                  
                #6  0x000055ed57777273 ospf_nbr_lookup_ptop (ospfd + 0x1e273)                                              
                #7  0x000055ed577780ed ospf_router_lsa_new.lto_priv.0 (ospfd + 0x1f0ed)                                    
                #8  0x000055ed57785d97 ospf_lsa_refresh (ospfd + 0x2cd97)                                                  
                #9  0x000055ed5777bb9f ospf_ism_event (ospfd + 0x22b9f)                                                    
                #10 0x00007f5f5e0f8f57 thread_call (libfrr.so.0 + 0x93f57)                                                 
                #11 0x00007f5f5e0c6fc8 frr_run (libfrr.so.0 + 0x61fc8)                                                     
                #12 0x000055ed57771b72 main (ospfd + 0x18b72)                                                              
                #13 0x00007f5f5db79b75 __libc_start_main (libc.so.6 + 0x27b75)                                             
                #14 0x000055ed57771f8e _start (ospfd + 0x18f8e)                                                            

GNU gdb (GDB) Fedora 10.2-3.fc34                                                                                           
Copyright (C) 2021 Free Software Foundation, Inc.                                                                          
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>                                              
This is free software: you are free to change and redistribute it.                                                         
There is NO WARRANTY, to the extent permitted by law.                                                                      
Type "show copying" and "show warranty" for details.                                                                       
This GDB was configured as "x86_64-redhat-linux-gnu".                                                                      
Type "show configuration" for configuration details.                                                                       
For bug reporting instructions, please see:                                                                                
<https://www.gnu.org/software/gdb/bugs/>.                                                                                  
Find the GDB manual and other documentation resources online at:                                                           
    <http://www.gnu.org/software/gdb/documentation/>.                                                                      

For help, type "help".                                                                                                     
Type "apropos word" to search for commands related to "word"...                                                            
Reading symbols from /usr/libexec/frr/ospfd...                                                                             
Reading symbols from /usr/lib/debug/usr/libexec/frr/ospfd-7.5.1-2.fc34.x86_64.debug...                                     
[New LWP 6388]                                                                                                             
[Thread debugging using libthread_db enabled]                                                                              
Using host libthread_db library "/lib64/libthread_db.so.1".                                                                
Core was generated by `/usr/libexec/frr/ospfd -d -F traditional -A 127.0.0.1'.                                             
Program terminated with signal SIGABRT, Aborted.                                                                           
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49                                                  
49        return ret;                                                                                                      
(gdb) where                                                                                                                
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49                                                  
#1  0x00007f5f5db788a4 in __GI_abort () at abort.c:79
#2  0x00007f5f5db78789 in __assert_fail_base (fmt=<optimized out>, assertion=<optimized out>, file=<optimized out>, 
    line=<optimized out>, function=<optimized out>) at assert.c:92
#3  0x00007f5f5db87a16 in __GI___assert_fail (assertion=assertion@entry=0x7f5f5e13681a "node->info == NULL", 
    file=file@entry=0x7f5f5e136759 "lib/table.c", line=line@entry=344, 
    function=function@entry=0x7f5f5e136240 <__PRETTY_FUNCTION__.6.lto_priv.2> "route_node_delete") at assert.c:101
#4  0x00007f5f5e0f2d9a in route_node_delete (node=<optimized out>) at lib/table.c:344
#5  0x00007f5f5e0f2e09 in route_unlock_node (node=<optimized out>) at lib/table.h:262
#6  route_next (node=<optimized out>) at lib/table.c:433
#7  route_next (node=<optimized out>) at lib/table.c:402
#8  0x000055ed57777273 in ospf_nbr_count (state=<optimized out>, oi=<optimized out>) at ospfd/ospf_neighbor.c:291
#9  ospf_nbr_lookup_ptop (oi=0x55ed5969dd10) at ospfd/ospf_lsa.c:402
#10 0x000055ed577780ed in lsa_link_ptop_set (oi=0x55ed5969dd10, s=0x7fffb852e9d0) at ospfd/ospf_lsa.c:482
#11 router_lsa_link_set (area=0x55ed5969b710, area=0x55ed5969b710, s=0x7fffb852e9d0) at ospfd/ospf_lsa.c:656
#12 ospf_router_lsa_body_set (area=0x55ed5969b710, s=0x7fffb852e9d0) at ospfd/ospf_lsa.c:701
#13 ospf_router_lsa_new (area=0x55ed5969b710) at ospfd/ospf_lsa.c:789
#14 0x000055ed57785d97 in ospf_router_lsa_refresh (lsa=0x55ed5969af50) at ospfd/ospf_lsa.c:864
#15 ospf_lsa_refresh (ospf=<optimized out>, lsa=0x55ed5969af50) at ospfd/ospf_lsa.c:3455
#16 0x000055ed5777bb9f in ospf_router_lsa_update_area (area=<optimized out>) at ospfd/ospf_lsa.c:893
#17 ism_change_state (state=4, oi=0x55ed5969dd10) at ospfd/ospf_ism.c:556
#18 ospf_ism_event (thread=<optimized out>) at ospfd/ospf_ism.c:600
#19 0x00007f5f5e0f8f57 in thread_call (thread=0x7fffb852ecb0) at lib/thread.c:1628
#20 0x00007f5f5e0c6fc8 in frr_run (master=0x55ed59486df0) at lib/libfrr.c:1099
#21 0x000055ed57771b72 in main (argc=6, argv=<optimized out>) at ospfd/ospf_main.c:238



frr.conf:
---
hostname HOSTNAME
password HIDDEN

log file /var/log/frr/ospfd.log

interface ens32
  #ip address 192.168.134.136/25                <-- wasn't invalid in 7.4-1
  ip ospf authentication message-digest
  ip ospf message-digest-key 1 md5 HIDDEN

interface ipsec0
  ip ospf authentication message-digest
  ip ospf message-digest-key 1 md5 HIDDEN

router ospf
 ospf router-id 192.168.134.136
 area 0 authentication message-digest
 redistribute static metric-type 1
 network 192.168.134.0/25 area 0
 network 192.168.134.128/25 area 0
 network 192.168.129.116/30 area 0

access-list localhost permit 127.0.0.1/32
access-list localhost deny any
line vty
---

Comment 1 Ian Donaldson 2021-07-17 08:36:35 UTC
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet AA.BB.CC.DD/32 brd 46.31.247.85 scope global lo:1
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:50:56:90:47:4a brd ff:ff:ff:ff:ff:ff
    altname enp2s0
    inet 192.168.134.136/25 brd 192.168.134.255 scope global ens32
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe90:474a/64 scope link 
       valid_lft forever preferred_lft forever
3: ip_vti0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
4: ipsec0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 192.168.134.136 peer 60.242.130.220
    inet 192.168.129.118/30 scope global ipsec0
       valid_lft forever preferred_lft forever
    inet6 fe80::5efe:c0a8:8688/64 scope link 
       valid_lft forever preferred_lft forever

Comment 2 Ian Donaldson 2021-07-17 08:39:55 UTC
If I comment out this line:

network 192.168.129.116/30 area 0

which relates to an ipsec tunnnel, ospfd doesn't crash, but no tunnel related routes are learned.

Comment 3 Ian Donaldson 2021-07-19 04:15:26 UTC
Just noticed I had blocked stronswan from updating since fc31; now updated to strongswan-5.9.3-1.fc34.x86_64
and strongswan itself is still working ok; just ospfd issues remain.

Comment 4 Ian Donaldson 2021-07-19 08:06:11 UTC
I manually rebuilt the package from frr-7.5.1-2.fc34.src.rpm, and just installed the resultant ospfd; it crashes
in the same way.

Recompiled just ospfd without -O2 in order to debug it, and it works fine!   Doh.

Comment 5 Michal Ruprich 2021-07-19 08:50:46 UTC
Hi Ian,

thanks for the report, there is definitely something wrong in the route_node_delete, I was able to reproduce a similar coredump just with this config:
!
router ospf
 ospf router-id 10.10.10.10
 network 10.16.40.0/21 area 0
!

Than it will crash if you simply try to query the ospf interfaces like this:
# sh ip ospf interface
vtysh: error reading from ospfd: Success (0)Warning: closing connection to ospfd because of an I/O error!
#

Coredump is a little bit different but still in the route_node_delete:
@ coredumpctl debug
          PID: 1747 (ospfd)
           UID: 990 (frr)
           GID: 985 (frr)
        Signal: 6 (ABRT)
     Timestamp: Mon 2021-07-19 04:49:53 EDT (36s ago)
  Command Line: /usr/libexec/frr/ospfd -d -F traditional -A 127.0.0.1
    Executable: /usr/libexec/frr/ospfd
 Control Group: /system.slice/frr.service
          Unit: frr.service
         Slice: system.slice
       Boot ID: 891d0c7947384279b4300c6e466c617e
    Machine ID: 95ac659c50334abd8fabbf02b850fdfb
      Hostname: rdma-dev-19.lab.bos.redhat.com
       Storage: /var/lib/systemd/coredump/core.ospfd.990.891d0c7947384279b4300c6e466c617e.1747.1626684593000000.zst (present)
     Disk Size: 486.7K
       Message: Process 1747 (ospfd) of user 990 dumped core.
                
                Stack trace of thread 1747:
                #0  0x00007f3e943ae2a2 raise (libc.so.6 + 0x3d2a2)
                #1  0x00007f3e943978a4 abort (libc.so.6 + 0x268a4)
                #2  0x00007f3e94397789 __assert_fail_base.cold (libc.so.6 + 0x26789)
                #3  0x00007f3e943a6a16 __assert_fail (libc.so.6 + 0x35a16)
                #4  0x00007f3e94913d9a route_node_delete (libfrr.so.0 + 0x8dd9a)
                #5  0x00007f3e94913e09 route_next (libfrr.so.0 + 0x8de09)
                #6  0x00005594b193cd54 show_ip_ospf_interface_sub (ospfd + 0x4dd54)
                #7  0x00005594b193d80b show_ip_ospf_interface_common (ospfd + 0x4e80b)
                #8  0x00005594b193db9b show_ip_ospf_interface.lto_priv.0 (ospfd + 0x4eb9b)
                #9  0x00007f3e949372ec cmd_execute_command_real.constprop.0 (libfrr.so.0 + 0xb12ec)
                #10 0x00007f3e948c24a1 cmd_execute_command (libfrr.so.0 + 0x3c4a1)
                #11 0x00007f3e948c2600 cmd_execute (libfrr.so.0 + 0x3c600)
                #12 0x00007f3e949215d5 vty_command (libfrr.so.0 + 0x9b5d5)
                #13 0x00007f3e94921781 vty_execute (libfrr.so.0 + 0x9b781)
                #14 0x00007f3e94923a80 vtysh_read (libfrr.so.0 + 0x9da80)
                #15 0x00007f3e94919f57 thread_call (libfrr.so.0 + 0x93f57)
                #16 0x00007f3e948e7fc8 frr_run (libfrr.so.0 + 0x61fc8)
                #17 0x00005594b1907b72 main (ospfd + 0x18b72)
                #18 0x00007f3e94398b75 __libc_start_main (libc.so.6 + 0x27b75)
                #19 0x00005594b1907f8e _start (ospfd + 0x18f8e)

Comment 6 Michal Ruprich 2021-07-19 08:55:37 UTC
Seems like this could be it:
https://github.com/FRRouting/frr/issues/8595

Let me try the patch.

Comment 7 Ian Donaldson 2021-07-19 09:11:50 UTC
If you are referring to the commenting out of this: assert(node->info == NULL);

I suspect that will stop the crash, I suspect that's just hiding another underlying bug that
will bite later.

table.[ch] hasn't changed between 7.4 and 7.5.1 but a lot of other code has.

Comment 8 Ian Donaldson 2021-07-19 09:14:24 UTC
BTW The gear I'm interacting with on the other end of the IPSec tunnel is a Cisco 1941/k9, IOS 15.7(3)M2

Comment 9 Michal Ruprich 2021-07-19 10:58:12 UTC
Ian,

can you try this test package if it actually helps?

https://koji.fedoraproject.org/koji/taskinfo?taskID=72177563

Thanks,
Michal

Comment 10 Ian Donaldson 2021-07-19 12:16:31 UTC
Not sure what you want me to run here; I tried the frr-7.5.1-5.fc35.x86_64 package in there
but it wouldn't install due to lots of dependency issues referencing fc35 packages.  Using --nodeps
overcame that but it won't start...


ul 19 12:12:12 vm2.jer.ekorp.com frrinit.sh[119085]: /usr/libexec/frr/watchfrr: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /usr/libexec/frr/watchfrr)
Jul 19 12:12:12 vm2.jer.ekorp.com frrinit.sh[119085]: /usr/libexec/frr/watchfrr: /lib64/libjson-c.so.5: no version information available (required by /usr/lib64/frr/libfrr.so.0)
Jul 19 12:12:12 vm2.jer.ekorp.com frrinit.sh[119085]: /usr/libexec/frr/watchfrr: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /usr/lib64/frr/libfrr.so.0)
Jul 19 12:12:12 vm2.jer.ekorp.com frrinit.sh[119081]: Failed to start watchfrr!

so I've undone all that.

Comment 11 Michal Ruprich 2021-07-19 12:46:14 UTC
Sorry, I ran the scratch build of the test package for rawhide instead of f34, hence the weird dependencies. This should work with F34:

https://koji.fedoraproject.org/koji/taskinfo?taskID=72185690

Comment 12 Ian Donaldson 2021-07-19 13:05:07 UTC
Installed frr-7.5.1-5.fc34.x86_64 and it seems to run fine; behaving as normal!

Restarted strongswan as further test of route recovery and it recovered fine.

Many thanks.

Comment 13 Ian Donaldson 2021-07-19 13:35:33 UTC
Looked further into the patches related to pure functions with -O2 optimization 
and now understand more how this could have occurred.  Good spot.

Comment 14 Michal Ruprich 2021-07-20 06:43:48 UTC
Thanks for the help, I am going to push a fixed version today.

Regards,
Michal

Comment 15 Fedora Update System 2021-07-20 11:39:59 UTC
FEDORA-2021-aa7a6a45ae has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-aa7a6a45ae

Comment 16 Fedora Update System 2021-07-21 01:15:32 UTC
FEDORA-2021-aa7a6a45ae has been pushed to the Fedora 33 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-aa7a6a45ae`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-aa7a6a45ae

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 17 Fedora Update System 2021-07-21 01:54:31 UTC
FEDORA-2021-d8feeaf3bf has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-d8feeaf3bf`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-d8feeaf3bf

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 18 Ian Donaldson 2021-07-25 23:48:55 UTC
FEDORA-2021-d8feeaf3bf (frr-7.5.1-3.fc34.x86_64) seems fine

Comment 19 Fedora Update System 2021-07-29 01:06:57 UTC
FEDORA-2021-d8feeaf3bf has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 20 Fedora Update System 2021-07-29 02:00:27 UTC
FEDORA-2021-aa7a6a45ae has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.