Bug 1948422
| Summary: | BGP incorrectly withdraws routes on graceful restart capable routers | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Carlos Goncalves <cgoncalves> | |
| Component: | frr | Assignee: | Michal Ruprich <mruprich> | |
| Status: | CLOSED ERRATA | QA Contact: | FrantiĊĦek Hrdina <fhrdina> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 8.3 | CC: | fhrdina, michele, mruprich, rkhan | |
| Target Milestone: | beta | Keywords: | AutoVerified, Patch, Reopened, Reproducer, Triaged | |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | frr-7.5.1-5.el8 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2127494 (view as bug list) | Environment: | ||
| Last Closed: | 2023-05-16 08:30:22 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2127494 | |||
Is there an update on this bug? Has it been triaged by the RHEL team? Please let me know if there is any additional information I could provide, including setting up a lab for testing and development. I would like to highlight that this bug causes route flapping, interrupting data plane forwarding in network routers. Sorry Carlos, this seems to be solved in current upstream version. I reproduced it in RHEL8 but not in Fedora. I am looking for the fix, the upstream issue is not very specific on the details. I'll keep you posted. Michal Thanks, Michal. Were you able to find the mentioned upstream fix? I have been somewhat closely following upstream commits and issues, and have not flagged any potential one addressing this issue. I installed FRR from source (master, 9d78be6) on Fedora and was still able to reproduce the same issue with the same reproducer steps. Hi Carlos, TBH I did not find any particular commit that would fix this but for some reason with 7.5.1 I don't see the error. Nevermind, I will query the upstream again for possible solution. Any update? I have reproduced this issue with 7.5.1 as well, so all versions at least from 7.0 (up to master) seem to be impacted. Have you queried the upstream project as suggested in comment #4? Is there a place (e.g. Github, email list) where one could follow the discussion? Thank you. Hi Carlos, sorry, missed the needinfo on this one. Thanks for filing the bug upstream. Seems like no solution so far. This issue was reported to have been fixed upstream and my Github issue was closed. Please see and consider porting the patch back to all supported RHEL 8.x versions. https://github.com/FRRouting/frr/commit/aa24a36a2d1814c8a1844465b8ff73e54cb85b45 After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: frr security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:2801 |
BGP advertised route prefixes are sometimes (~20% of the time) removed on peer BGP routers when the local BGP router is restarted even when BGP graceful restart is enabled. This is known as route flapping causing recalculation of the topology by all participating routers. Version-Release number of selected component (if applicable): - frr-7.0-10.el8.x86_64 - frr-7.5-4.el8.x86_64 How reproducible: ~20% Steps to Reproduce: 1. Create two routers: router-1 (10.20.30.43) and router-2 (10.20.30.44) with the following FRR configuration: frr version 7.5 frr defaults traditional hostname router-2.localdomain log file /var/log/frr/frr.log no ip forwarding no ipv6 forwarding service integrated-vtysh-config ! debug bgp keepalives debug bgp neighbor-events debug bgp updates in debug bgp updates out debug bgp zebra ! router bgp 64999 bgp log-neighbor-changes no bgp suppress-duplicates bgp graceful-shutdown bgp graceful-restart bgp graceful-restart preserve-fw-state neighbor 10.20.30.43 remote-as 64999 ! address-family ipv4 unicast redistribute connected exit-address-family ! line vty 2. On router-1, add dummy route: $ sudo ip a a 10.20.50.98/32 dev lo 3. Verify router-2 received route prefix and installed it in the kernel routing table: $ sudo ip r | grep 10.20.50.98 10.20.50.98 nhid 16 via 10.20.30.43 dev eth1 proto bgp metric 20 4. Restart FRR on router-1: $ sudo systemctl restart frr 5. Check /var/frr/frr.log in router-2 and note that route 10.20.50.98/32 was deleted as soon as FRR@router-1 was stopped ("Tx route delete VRF 0 10.20.50.98/32"): BGP: 10.20.30.41 [Event] BGP connection closed fd 23 BGP: %NOTIFICATION: received from neighbor 10.20.30.41 6/3 (Cease/Peer Unconfigured) 0 bytes BGP: 10.20.30.41 [FSM] Receive_NOTIFICATION_message (Established->Clearing), fd 23 BGP: %ADJCHANGE: neighbor 10.20.30.41(router-1.localdomain) in vrf default Down BGP Notification received BGP: 10.20.30.41 graceful restart stalepath timer stopped BGP: bgp_fsm_change_status : vrf default(0), Status: Clearing established_peers 0 BGP: RID change : vrf VRF default(0), RTR ID 192.168.121.66 BGP: 10.20.30.41 went from Established to Clearing BGP: 10.20.30.41 [FSM] Clearing_Completed (Clearing->Idle), fd -1 BGP: bgp_fsm_change_status : vrf default(0), Status: Idle established_peers 0 BGP: 10.20.30.41 went from Clearing to Idle BGP: Tx route delete VRF 0 10.20.50.99/32 BGP: [Event] BGP connection from host 10.20.30.41 fd 23 BGP: bgp_fsm_change_status : vrf default(0), Status: Active established_peers 0 BGP: 10.20.30.41 went from Idle to Active BGP: 10.20.30.41 [FSM] TCP_connection_open (Active->OpenSent), fd 23 BGP: 10.20.30.41 passive open BGP: 10.20.30.41 Sending hostname cap with hn = router-2.localdomain, dn = (null) BGP: 10.20.30.41 sending OPEN, version 4, my as 64999, holdtime 180, id 192.168.121.66 BGP: bgp_fsm_change_status : vrf default(0), Status: OpenSent established_peers 0 BGP: 10.20.30.41 went from Active to OpenSent