RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2168855 - BFD not working through VRF
Summary: BFD not working through VRF
Keywords:
Status: CLOSED ERRATA
Alias: None
Deadline: 2023-06-27
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: frr
Version: 9.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: rc
: ---
Assignee: Michal Ruprich
QA Contact: František Hrdina
URL:
Whiteboard:
Depends On:
Blocks: 2212921
TreeView+ depends on / blocked
 
Reported: 2023-02-10 09:29 UTC by Federico Paolinelli
Modified: 2023-11-07 09:48 UTC (History)
5 users (show)

Fixed In Version: frr-8.3.1-7.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2212921 (view as bug list)
Environment:
Last Closed: 2023-11-07 08:32:59 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-148292 0 None None None 2023-02-10 09:31:57 UTC
Red Hat Product Errata RHSA-2023:6434 0 None None None 2023-11-07 08:33:12 UTC

Description Federico Paolinelli 2023-02-10 09:29:50 UTC
Description of problem:

###############   NOTE #################
This is not happening if I replace the image version with upstream 8.4.1.
With 8.3.x I still hit the bug.

Ideally, this should target OCP 4.13+ which is gonna rebased against RHEL 9.

Also, I can provide a working environment where this is reproducible.
########################################


While testing MetalLB exposing services via VRF, the BFD session does not get established.

The setup is pretty simple. I have two similar nodes (vms) running MetalLB, and an external host where
I run FRR inside a hostnetworked container.

The nodes are 192.168.130.2 and 192.168.130.3, I will focus on 192.168.130.2.

The interface with 192.168.130.2 is ens9. It's added to a VRF named ens9vrf.


Looking at the tcpdumps against ens9, I _think_ the behavior is the one we'd get if
bfdd wasn't receiving the bfd messages. It wasn't able to get the remote id, and complains
about Control Detection Time Expired.

Here all the information I collected:


Configuraton on metallb side:

debug zebra nht
debug zebra nexthop
debug bgp keepalives
debug bgp neighbor-events
debug bgp nht
debug bgp updates in
debug bgp updates out
debug bgp zebra
debug bfd peer
debug bfd zebra
debug bfd network
!
router bgp 100 vrf ens9vrf
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 no bgp network import-check
 neighbor 192.168.130.1 remote-as 200
 neighbor 192.168.130.1 bfd
 neighbor 192.168.130.1 bfd profile testprofile
 neighbor 192.168.130.1 timers 30 90
 !
 address-family ipv4 unicast
  network 192.169.10.0/32
  neighbor 192.168.130.1 activate
  neighbor 192.168.130.1 route-map 192.168.130.1-ens9vrf-in in
  neighbor 192.168.130.1 route-map 192.168.130.1-ens9vrf-out out
 exit-address-family
 !
 address-family ipv6 unicast
  neighbor 192.168.130.1 activate
  neighbor 192.168.130.1 route-map 192.168.130.1-ens9vrf-in in
  neighbor 192.168.130.1 route-map 192.168.130.1-ens9vrf-out out
 exit-address-family
exit
!
ip prefix-list 192.168.130.1-ens9vrf-pl-ipv4 seq 5 permit 192.169.10.0/32
ip prefix-list 192.168.130.1-ens9vrf-pl-ipv4 seq 10 deny any
!
ipv6 prefix-list 192.168.130.1-ens9vrf-pl-ipv4 seq 5 deny any
!
route-map 192.168.130.1-ens9vrf-in deny 20
exit
!
route-map 192.168.130.1-ens9vrf-out permit 1
 match ip address prefix-list 192.168.130.1-ens9vrf-pl-ipv4
exit
!
route-map 192.168.130.1-ens9vrf-out permit 2
 match ipv6 address prefix-list 192.168.130.1-ens9vrf-pl-ipv4
exit
!
ip nht resolve-via-default
!
ipv6 nht resolve-via-default
!
bfd
 profile testprofile
  transmit-interval 100
  receive-interval 1000
 exit
 !
exit
!
end


show bfd peers on metallb:

fede-virt-worker-0.karmalabs.com# show bfd peers
BFD Peers:
        peer 192.168.130.1 local-address 192.168.130.2 vrf ens9vrf interface ens9
                ID: 3427070281
                Remote ID: 0
                Active mode
                Status: down
                Downtime: 11 minute(s), 12 second(s)
                Diagnostics: ok
                Remote diagnostics: ok
                Peer Type: dynamic
                Local timers:
                        Detect-multiplier: 3
                        Receive interval: 300ms
                        Transmission interval: 300ms
                        Echo receive interval: 50ms
                        Echo transmission interval: disabled
                Remote timers:
                        Detect-multiplier: 3
                        Receive interval: 1000ms
                        Transmission interval: 1000ms
                        Echo receive interval: disabled



TCP Dump on the metallb node, on the interface added to the vrf:

tcpdump -i ens9 -nn -vvv port 3784

10:26:58.182557 IP (tos 0xc0, ttl 255, id 44884, offset 0, flags [DF], proto UDP (17), length 52)
    192.168.130.1.49153 > 192.168.130.2.3784: [bad udp cksum 0x8586 -> 0x5f22!] BFDv1, length: 24
        Control, State Init, Flags: [Control Plane Independent], Diagnostic: Control Detection Time Expired (0x01)
        Detection Timer Multiplier: 3 (3000 ms Detection time), BFD Length: 24
        My Discriminator: 0xb04eff7c, Your Discriminator: 0x64f0cb21
          Desired min Tx Interval:    1000 ms
          Required min Rx Interval:   1000 ms
          Required min Echo Interval:   50 ms
10:26:58.697185 IP (tos 0xc0, ttl 255, id 10017, offset 0, flags [DF], proto UDP (17), length 52)
    192.168.130.2.49153 > 192.168.130.1.3784: [bad udp cksum 0x8586 -> 0x1036!] BFDv1, length: 24
        Control, State Down, Flags: [none], Diagnostic: No Diagnostic (0x00)
        Detection Timer Multiplier: 3 (3000 ms Detection time), BFD Length: 24
        My Discriminator: 0x64f0cb21, Your Discriminator: 0x00000000
          Desired min Tx Interval:    1000 ms
          Required min Rx Interval:   1000 ms
          Required min Echo Interval:   50 ms
10:26:58.942669 IP (tos 0xc0, ttl 255, id 44915, offset 0, flags [DF], proto UDP (17), length 52)
    192.168.130.1.49153 > 192.168.130.2.3784: [bad udp cksum 0x8586 -> 0x5f22!] BFDv1, length: 24
        Control, State Init, Flags: [Control Plane Independent], Diagnostic: Control Detection Time Expired (0x01)
        Detection Timer Multiplier: 3 (3000 ms Detection time), BFD Length: 24
        My Discriminator: 0xb04eff7c, Your Discriminator: 0x64f0cb21
          Desired min Tx Interval:    1000 ms
          Required min Rx Interval:   1000 ms
          Required min Echo Interval:   50 ms
10:26:59.527299 IP (tos 0xc0, ttl 255, id 10346, offset 0, flags [DF], proto UDP (17), length 52)
    192.168.130.2.49153 > 192.168.130.1.3784: [bad udp cksum 0x8586 -> 0x1036!] BFDv1, length: 24
        Control, State Down, Flags: [none], Diagnostic: No Diagnostic (0x00)
        Detection Timer Multiplier: 3 (3000 ms Detection time), BFD Length: 24
        My Discriminator: 0x64f0cb21, Your Discriminator: 0x00000000
          Desired min Tx Interval:    1000 ms
          Required min Rx Interval:   1000 ms
          Required min Echo Interval:   50 ms
10:26:59.872801 IP (tos 0xc0, ttl 255, id 45424, offset 0, flags [DF], proto UDP (17), length 52)
    192.168.130.1.49153 > 192.168.130.2.3784: [bad udp cksum 0x8586 -> 0x5f22!] BFDv1, length: 24
        Control, State Init, Flags: [Control Plane Independent], Diagnostic: Control Detection Time Expired (0x01)
        Detection Timer Multiplier: 3 (3000 ms Detection time), BFD Length: 24
        My Discriminator: 0xb04eff7c, Your Discriminator: 0x64f0cb21
          Desired min Tx Interval:    1000 ms
          Required min Rx Interval:   1000 ms
          Required min Echo Interval:   50 ms



ip link on the metallb node


1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:08:86:22 brd ff:ff:ff:ff:ff:ff
3: ens8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:73:1a:70 brd ff:ff:ff:ff:ff:ff
4: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ens9vrf state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:61:03:48 brd ff:ff:ff:ff:ff:ff
5: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 0a:36:d5:d1:e8:8e brd ff:ff:ff:ff:ff:ff
6: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether a2:8e:66:88:ee:ee brd ff:ff:ff:ff:ff:ff
7: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ca:22:f0:1d:9c:ed brd ff:ff:ff:ff:ff:ff
8: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether aa:5e:c9:af:61:5d brd ff:ff:ff:ff:ff:ff
10: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:08:86:22 brd ff:ff:ff:ff:ff:ff
11: 30f0e1334be200b@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
Error: Peer netns reference is invalid.
    link/ether 2a:3a:3b:26:ce:ff brd ff:ff:ff:ff:ff:ff link-netns 7f8c2e5b-4792-430e-8e94-5c56cef6caa3
12: bc89835ec5c56c4@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether a6:b2:d7:d9:30:47 brd ff:ff:ff:ff:ff:ff link-netns 74acbafa-20e1-4775-acf4-8e8abeebc294
13: e656939ed0b1695@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether c6:9d:85:79:ba:4c brd ff:ff:ff:ff:ff:ff link-netns 5325d199-21e9-4f36-89c7-5046c18e1ad9
15: 564a5becbc51d5f@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether 9e:5f:68:5d:10:fb brd ff:ff:ff:ff:ff:ff link-netns 65509519-e5fd-482a-9504-b5197ac3c6c5
16: b7e6baef4ddc02f@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether 6e:81:8d:13:cc:d7 brd ff:ff:ff:ff:ff:ff link-netns 1cda2002-57b8-437d-826b-4f84b2c14866
17: 049768f1f789858@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether 6a:6a:34:6d:75:3e brd ff:ff:ff:ff:ff:ff link-netns f28a0cbd-d250-4eff-8608-d035cd63132d
18: 7f61c5fa242c940@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether d6:4d:e6:2a:ca:d0 brd ff:ff:ff:ff:ff:ff link-netns 884d290a-cb0c-4dd7-a541-cbfa32572b98
2079: 21243fe97721291@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether 2e:17:89:5d:4a:33 brd ff:ff:ff:ff:ff:ff link-netns 0b31722e-d0a6-44cf-8a31-2cb059d8e82c
1398: ens9vrf: <NOARP,MASTER,UP,LOWER_UP> mtu 65575 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 86:d7:6b:a0:c0:12 brd ff:ff:ff:ff:ff:ff
1495: ens9veth-vrf@ens9veth-def: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master ens9vrf state UP mode DEFAULT group default qlen 1000
    link/ether b2:60:78:f8:35:9b brd ff:ff:ff:ff:ff:ff
1496: ens9veth-def@ens9veth-vrf: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether a2:66:df:3b:c7:ae brd ff:ff:ff:ff:ff:ff


tcp_l3mdev_accept is enabled

[root@fede-virt-worker-0 /]# sysctl net.ipv4.tcp_l3mdev_accept
net.ipv4.tcp_l3mdev_accept = 1


from the frr logs on metallb side:

2023/02/09 10:26:20 WATCHFRR: [ZCJ3S-SPH5S] bfdd state -> down : initial connection attempt failed
2023/02/09 10:26:21 BFD: [ZKB8W-3S2Q4][EC 100663330] unneeded 'destroy' callback for '/frr-bfdd:bfdd/bfd/profile/minimum-ttl'
2023/02/09 10:26:21 BFD: [ZKB8W-3S2Q4][EC 100663330] unneeded 'destroy' callback for '/frr-bfdd:bfdd/bfd/sessions/multi-hop/minimum-ttl'
2023/02/09 10:26:21 WATCHFRR: [QDG3Y-BY5TN] bfdd state -> up : connect succeeded
2023/02/09 10:26:21 BFD: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2023/02/09 10:26:30.756 ZEBRA: [G37DH-KRAE9] bfd_dst_register msg from client bgp: length=41
2023/02/09 10:26:30.757 BFD: [MSVDW-Y8Z5Q] ptm-add-dest: register peer [mhop:no peer:192.168.130.1 local:0.0.0.0 vrf:ens9vrf cbit:0x00 minimum-ttl:255 profile:testprofile]
2023/02/09 10:26:30.757 BFD: [PSB4R-8T1TJ] session-new: mhop:no peer:192.168.130.1 local:0.0.0.0 vrf:ens9vrf
2023/02/09 10:26:30.757 ZEBRA: [V0KXZ-QFE4D] bfd_dst_update msg from client bfd: length=25
2023/02/09 10:26:30.771 BFD: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2023/02/09 10:26:30.965 ZEBRA: [G37DH-KRAE9] bfd_dst_register msg from client bgp: length=41
2023/02/09 10:26:30.965 BFD: [MSVDW-Y8Z5Q] ptm-add-dest: register peer [mhop:no peer:192.168.130.1 local:0.0.0.0 vrf:ens9vrf cbit:0x00 minimum-ttl:255 profile:testprofile]
2023/02/09 10:26:30.965 ZEBRA: [V0KXZ-QFE4D] bfd_dst_update msg from client bfd: length=25
2023/02/09 10:26:30.975 BFD: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2023/02/09 10:26:34.979 ZEBRA: [T4P6D-CDT49] bfd_dst_deregister msg from client bgp: length=41
2023/02/09 10:26:34.980 ZEBRA: [G37DH-KRAE9] bfd_dst_register msg from client bgp: length=45
2023/02/09 10:26:34.980 BFD: [MSVDW-Y8Z5Q] ptm-del-dest: deregister peer [mhop:no peer:192.168.130.1 local:0.0.0.0 vrf:ens9vrf cbit:0x00 minimum-ttl:255 profile:testprofile]
2023/02/09 10:26:34.980 BFD: [NYF5K-SE3NS] ptm-del-session: [mhop:no peer:192.168.130.1 local:0.0.0.0 vrf:ens9vrf] refcount=0
2023/02/09 10:26:34.980 BFD: [NW21R-MRYNT] session-delete: mhop:no peer:192.168.130.1 local:0.0.0.0 vrf:ens9vrf
2023/02/09 10:26:34.980 BFD: [MSVDW-Y8Z5Q] ptm-add-dest: register peer [mhop:no peer:192.168.130.1 local:192.168.130.2 vrf:ens9vrf cbit:0x00 minimum-ttl:255 profile:testprofile]
2023/02/09 10:26:34.980 BFD: [PSB4R-8T1TJ] session-new: mhop:no peer:192.168.130.1 local:192.168.130.2 vrf:ens9vrf ifname:ens9
2023/02/09 10:26:34.981 ZEBRA: [V0KXZ-QFE4D] bfd_dst_update msg from client bfd: length=25
2023/02/09 10:26:36.043 BFD: [GCWEX-N0BBE] zclient: add interface ens9 (VRF ens9vrf(1398))
2023/02/09 10:28:00.823 BFD: [GCWEX-N0BBE] zclient: add interface 66c1b0e1f5b8044 (VRF default(0))
2023/02/09 10:28:01.925 BFD: [SSYGJ-9ZAE0] zclient: add local address fe80::c870:38ff:fecb:e770/64 (VRF 0)
2023/02/09 10:28:03.435 BFD: [SSYGJ-9ZAE0] zclient: delete local address fe80::c870:38ff:fecb:e770/64 (VRF 0)
2023/02/09 10:28:03.507 BFD: [J7QH3-773JH] zclient: delete interface 66c1b0e1f5b8044 (VRF default(0))

---------------------------------------------------------------------

show bfd peers on the external host


# show bfd peers
BFD Peers:
        peer 192.168.130.2 local-address 192.168.130.1 vrf default interface virbr2
                ID: 532757194
                Remote ID: 0
                Active mode
                Status: down
                Downtime: 1 second(s)
                Diagnostics: control detection time expired
                Remote diagnostics: ok
                Peer Type: dynamic
                Local timers:
                        Detect-multiplier: 3
                        Receive interval: 300ms
                        Transmission interval: 300ms
                        Echo receive interval: 50ms
                        Echo transmission interval: disabled
                Remote timers:
                        Detect-multiplier: 3
                        Receive interval: 1000ms
                        Transmission interval: 1000ms
                        Echo receive interval: 50ms

        peer 192.168.130.3 local-address 192.168.130.1 vrf default interface virbr2
                ID: 106570411
                Remote ID: 2238636529
                Active mode
                Status: init
                Diagnostics: control detection time expired
                Remote diagnostics: ok
                Peer Type: dynamic
                Local timers:
                        Detect-multiplier: 3
                        Receive interval: 300ms
                        Transmission interval: 300ms
                        Echo receive interval: 50ms
                        Echo transmission interval: disabled
                Remote timers:
                        Detect-multiplier: 3
                        Receive interval: 1000ms
                        Transmission interval: 1000ms
                        Echo receive interval: 50ms


Configuration on the peer:

show running-config
Building configuration...

Current configuration:
!
frr version 8.3.1_git
frr defaults traditional
hostname cnfdt03.lab.eng.tlv2.redhat.com
log file /tmp/frr.log
log timestamp precision 3
no ipv6 forwarding
!
debug zebra nht
debug bgp neighbor-events
debug bgp nht
debug bgp updates in
debug bgp updates out
debug bfd peer
!
password zebra
!
router bgp 200
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 no bgp network import-check
 neighbor 192.168.130.2 remote-as 100
 neighbor 192.168.130.2 bfd
 neighbor 192.168.130.3 remote-as 100
 neighbor 192.168.130.3 bfd
 !
 address-family ipv4 unicast
  neighbor 192.168.130.2 activate
  neighbor 192.168.130.2 next-hop-self
  neighbor 192.168.130.3 activate
  neighbor 192.168.130.3 next-hop-self
 exit-address-family
exit
!
route-map RMAP permit 10
 set ipv6 next-hop prefer-global
exit
!
ip nht resolve-via-default
!
ipv6 nht resolve-via-default
!
end








logs on the external host:

2023/02/09 11:15:33.747 BFD: [PSB4R-8T1TJ] session-new: mhop:no peer:192.168.130.3 local:192.168.130.1 vrf:default ifname:virbr2                                                                                                                              
2023/02/09 11:15:37.135 BFD: [SEY1D-NT8EQ] state-change: [mhop:no peer:192.168.130.3 local:192.168.130.1 vrf:default ifname:virbr2] init -> down reason:control-expired                                                                                       
2023/02/09 11:15:37.556 BFD: [SEY1D-NT8EQ] state-change: [mhop:no peer:192.168.130.2 local:192.168.130.1 vrf:default ifname:virbr2] init -> down reason:control-expired                                                                                       
2023/02/09 11:15:40.706 BFD: [SEY1D-NT8EQ] state-change: [mhop:no peer:192.168.130.3 local:192.168.130.1 vrf:default ifname:virbr2] init -> down reason:control-expired                                                                                       
2023/02/09 11:15:41.107 BFD: [SEY1D-NT8EQ] state-change: [mhop:no peer:192.168.130.2 local:192.168.130.1 vrf:default ifname:virbr2] init -> down reason:control-expired                                                                                       
2023/02/09 11:15:43.966 BFD: [SEY1D-NT8EQ] state-change: [mhop:no peer:192.168.130.3 local:192.168.130.1 vrf:default ifname:virbr2] init -> down reason:control-expired                                                                                       
2023/02/09 11:15:44.848 BFD: [SEY1D-NT8EQ] state-change: [mhop:no peer:192.168.130.2 local:192.168.130.1 vrf:default ifname:virbr2] init -> down reason:control-expired                                                                                       
2023/02/09 11:15:47.426 BFD: [SEY1D-NT8EQ] state-change: [mhop:no peer:192.168.130.3 local:192.168.130.1 vrf:default ifname:virbr2] init -> down reason:control-expired                                                                                       
2023/02/09 11:15:48.449 BFD: [SEY1D-NT8EQ] state-change: [mhop:no peer:192.168.130.2 local:192.168.130.1 vrf:default ifname:virbr2] init -> down reason:control-expired                                                                                       
2023/02/09 11:15:51.077 BFD: [SEY1D-NT8EQ] state-change: [mhop:no peer:192.168.130.3 local:192.168.130.1 vrf:default ifname:virbr2] init -> down reason:control-expired                                                                                       
2023/02/09 11:15:51.890 BFD: [SEY1D-NT8EQ] state-change: [mhop:no peer:192.168.130.2 local:192.168.130.1 vrf:default ifname:virbr2] init -> down reason:control-expired                                                                                       
2023/02/09 11:15:54.548 BFD: [SEY1D-NT8EQ] state-change: [mhop:no peer:192.168.130.3 local:192.168.130.1 vrf:default ifname:virbr2] init -> down reason:control-expired                                                                                       
2023/02/09 11:15:55.561 BFD: [SEY1D-NT8EQ] state-change: [mhop:no peer:192.168.130.2 local:192.168.130.1 vrf:default ifname:virbr2] init -> down reason:control-expired                                                                                       
2023/02/09 11:15:58.028 BFD: [SEY1D-NT8EQ] state-change: [mhop:no peer:192.168.130.3 local:192.168.130.1 vrf:default ifname:virbr2] init -> down reason:control-expired                                                                                       
2023/02/09 11:15:59.051 BFD: [SEY1D-NT8EQ] state-change: [mhop:no peer:192.168.130.2 local:192.168.130.1 vrf:default ifname:virbr2] init -> down reason:control-expired

Comment 1 Michal Ruprich 2023-02-16 15:32:57 UTC
Hi Federico,

I would like to ask for your help with the setup here. I am struggling a bit with the part where I need to setup VRF on one of the machines. Do you have a foolproof way how to do it? Using VRFs is not my kind of tea but this is where I got so far:

1. create the VRF:
# ip link add vrf-test type vrf table 10
# ip link set vrf-test up

2. enslave one interface:
# ip link set dev eth0 master vrf-test

3. Add some default routes probably to the table 10
# ip route add <gateway> dev eth0
# ip route add default via <gateway>

4. run frr in via ip vrf exec?
# ip vrf exec vrf-test /bin/bash
# /usr/libexec/frrinit.sh start ?
Or maybe some magic with frr.service file to run this via ip vrf exec?
This is the part where I am not sure how to run frr under the created VRF. How do I know than that it is running under that vrf? Will I be able to see it in systemctl status?

If you can help me to get to the point of having FRR running under the new VRF, that would be great.

Thanks,
Michal

Comment 2 Federico Paolinelli 2023-02-16 16:41:34 UTC
you just run frr normally, but specify the vrf name as part of the router, as specified here https://docs.frrouting.org/en/latest/bgp.html#clicmd-router-bgp-ASN-vrf-VRFNAME


To your vrf setup steps, I add also 
ip -4 rule add pref 32765 table local
ip -4 rule del pref 0

which is required to change the priority of the local routing rule.

Comment 3 Federico Paolinelli 2023-02-16 16:42:27 UTC
let me know if it helps, if not we can have a look together.

Comment 4 Michal Ruprich 2023-02-20 15:21:09 UTC
Thanks, this helped, this is clearly visible that with VRF, BFD is not establishing. Interesting thing though, I just tested this with 8.4.2 and BFD won't start, it will just dump core on startup and die. I need to investigate this and try version 8.4.1. I will keep you posted.

Comment 5 Federico Paolinelli 2023-02-20 15:52:41 UTC
Please note that in my case with 8.4.2 it works as expected.
I can provide access to the cluster.

Comment 6 Michal Ruprich 2023-03-01 14:18:27 UTC
Just a note from our discussion and some testing with Federico, seems that this particular commit is the one to fix this issue:

https://github.com/FRRouting/frr/commit/edc3f63167fd95e4e70287743c9b252415c9336e

So far, it seems that the issue is gone with this commit.

Comment 18 errata-xmlrpc 2023-11-07 08:32:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: frr security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6434


Note You need to log in before you can comment on or make changes to this bug.