RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1948422 - BGP incorrectly withdraws routes on graceful restart capable routers
Summary: BGP incorrectly withdraws routes on graceful restart capable routers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: frr
Version: 8.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: ---
Assignee: Michal Ruprich
QA Contact: František Hrdina
URL:
Whiteboard:
Depends On:
Blocks: 2127494
TreeView+ depends on / blocked
 
Reported: 2021-04-12 07:10 UTC by Carlos Goncalves
Modified: 2023-05-16 09:34 UTC (History)
4 users (show)

Fixed In Version: frr-7.5.1-5.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2127494 (view as bug list)
Environment:
Last Closed: 2023-05-16 08:30:22 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github FRRouting frr issues 12030 0 None closed BGP sends hard reset to the neighbor, even if graceful restart capability is enabled 2022-11-10 14:34:36 UTC
Github FRRouting frr issues 8425 0 None closed BGP incorrectly withdraws routes on graceful restart capable routers 2022-04-22 06:25:59 UTC
Github FRRouting frr pull 10838 0 None Merged bgpd: Add BGP configuration start/end markers 2022-04-22 12:29:06 UTC
Red Hat Product Errata RHSA-2023:2801 0 None None None 2023-05-16 08:30:29 UTC

Description Carlos Goncalves 2021-04-12 07:10:55 UTC
BGP advertised route prefixes are sometimes (~20% of the time) removed on peer BGP routers when the local BGP router is restarted even when BGP graceful restart is enabled. This is known as route flapping causing recalculation of the topology by all participating routers.

Version-Release number of selected component (if applicable):
- frr-7.0-10.el8.x86_64
- frr-7.5-4.el8.x86_64

How reproducible:
~20%

Steps to Reproduce:
1. Create two routers: router-1 (10.20.30.43) and router-2 (10.20.30.44) with the following FRR configuration:

	frr version 7.5
	frr defaults traditional
	hostname router-2.localdomain
	log file /var/log/frr/frr.log
	no ip forwarding
	no ipv6 forwarding
	service integrated-vtysh-config
	!
	debug bgp keepalives
	debug bgp neighbor-events
	debug bgp updates in
	debug bgp updates out
	debug bgp zebra
	!
	router bgp 64999
	 bgp log-neighbor-changes
	 no bgp suppress-duplicates
	 bgp graceful-shutdown
	 bgp graceful-restart
	 bgp graceful-restart preserve-fw-state
	 neighbor 10.20.30.43 remote-as 64999
	 !
	 address-family ipv4 unicast
	  redistribute connected
	 exit-address-family
	!
	line vty

2. On router-1, add dummy route:
	$ sudo ip a a 10.20.50.98/32 dev lo

3. Verify router-2 received route prefix and installed it in the kernel routing table:
	$ sudo ip r | grep 10.20.50.98
	10.20.50.98 nhid 16 via 10.20.30.43 dev eth1 proto bgp metric 20

4. Restart FRR on router-1:
	$ sudo systemctl restart frr

5. Check /var/frr/frr.log in router-2 and note that route 10.20.50.98/32 was deleted as soon as FRR@router-1 was stopped ("Tx route delete VRF 0 10.20.50.98/32"):

	BGP: 10.20.30.41 [Event] BGP connection closed fd 23
	BGP: %NOTIFICATION: received from neighbor 10.20.30.41 6/3 (Cease/Peer Unconfigured) 0 bytes
	BGP: 10.20.30.41 [FSM] Receive_NOTIFICATION_message (Established->Clearing), fd 23
	BGP: %ADJCHANGE: neighbor 10.20.30.41(router-1.localdomain) in vrf default Down BGP Notification received
	BGP: 10.20.30.41 graceful restart stalepath timer stopped
	BGP: bgp_fsm_change_status : vrf default(0), Status: Clearing established_peers 0
	BGP: RID change : vrf VRF default(0), RTR ID 192.168.121.66
	BGP: 10.20.30.41 went from Established to Clearing
	BGP: 10.20.30.41 [FSM] Clearing_Completed (Clearing->Idle), fd -1
	BGP: bgp_fsm_change_status : vrf default(0), Status: Idle established_peers 0
	BGP: 10.20.30.41 went from Clearing to Idle
	BGP: Tx route delete VRF 0 10.20.50.99/32
	BGP: [Event] BGP connection from host 10.20.30.41 fd 23
	BGP: bgp_fsm_change_status : vrf default(0), Status: Active established_peers 0
	BGP: 10.20.30.41 went from Idle to Active
	BGP: 10.20.30.41 [FSM] TCP_connection_open (Active->OpenSent), fd 23
	BGP: 10.20.30.41 passive open
	BGP: 10.20.30.41 Sending hostname cap with hn = router-2.localdomain, dn = (null)
	BGP: 10.20.30.41 sending OPEN, version 4, my as 64999, holdtime 180, id 192.168.121.66
	BGP: bgp_fsm_change_status : vrf default(0), Status: OpenSent established_peers 0
	BGP: 10.20.30.41 went from Active to OpenSent

Comment 1 Carlos Goncalves 2021-05-12 07:04:43 UTC
Is there an update on this bug? Has it been triaged by the RHEL team?
Please let me know if there is any additional information I could provide, including setting up a lab for testing and development.

I would like to highlight that this bug causes route flapping, interrupting data plane forwarding in network routers.

Comment 2 Michal Ruprich 2021-05-18 09:48:03 UTC
Sorry Carlos,

this seems to be solved in current upstream version. I reproduced it in RHEL8 but not in Fedora. I am looking for the fix, the upstream issue is not very specific on the details.

I'll keep you posted.

Michal

Comment 3 Carlos Goncalves 2021-05-19 06:52:56 UTC
Thanks, Michal. Were you able to find the mentioned upstream fix?
I have been somewhat closely following upstream commits and issues, and have not flagged any potential one addressing this issue.
I installed FRR from source (master, 9d78be6) on Fedora and was still able to reproduce the same issue with the same reproducer steps.

Comment 4 Michal Ruprich 2021-06-10 08:54:20 UTC
Hi Carlos,

TBH I did not find any particular commit that would fix this but for some reason with 7.5.1 I don't see the error. Nevermind, I will query the upstream again for possible solution.

Comment 5 Carlos Goncalves 2021-07-13 08:55:05 UTC
Any update? I have reproduced this issue with 7.5.1 as well, so all versions at least from 7.0 (up to master) seem to be impacted.
Have you queried the upstream project as suggested in comment #4? Is there a place (e.g. Github, email list) where one could follow the discussion?
Thank you.

Comment 6 Michal Ruprich 2021-08-19 05:30:19 UTC
Hi Carlos,

sorry, missed the needinfo on this one. Thanks for filing the bug upstream. Seems like no solution so far.

Comment 11 Carlos Goncalves 2022-04-22 12:29:07 UTC
This issue was reported to have been fixed upstream and my Github issue was closed.
Please see and consider porting the patch back to all supported RHEL 8.x versions.

https://github.com/FRRouting/frr/commit/aa24a36a2d1814c8a1844465b8ff73e54cb85b45

Comment 20 RHEL Program Management 2022-11-01 07:28:56 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 30 errata-xmlrpc 2023-05-16 08:30:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: frr security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2801


Note You need to log in before you can comment on or make changes to this bug.