Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2232986

Summary: eBGP multihop peer flapping due to delta miscalculation of new configuration
Product: Red Hat Enterprise Linux 8 Reporter: Carlos Goncalves <cgoncalves>
Component: frrAssignee: Michal Ruprich <mruprich>
Status: CLOSED MIGRATED QA Contact: FrantiĊĦek Hrdina <fhrdina>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.6CC: bzvonar, fhrdina, jorton
Target Milestone: rcKeywords: MigratedToJIRA
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-05 15:36:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Carlos Goncalves 2023-08-20 20:28:22 UTC
Description of problem:
The FRR reloader causes eBGP connection flapping when eBGP multihop peer with default TTL (255) is unchanged.

When creating an eBGP multihop without TTL, FRR will default the TTL to 255 and show it in a vtysh 'running-configuration' command which is used by the frr-reload script to calculate new configuration deltas.

Version-Release number of selected component (if applicable):
7.5.1

How reproducible:
100%

Steps to Reproduce:
1. Create /etc/frr/frr.conf
    frr version 7.5
    frr defaults traditional
    hostname centos8.localdomain
    no ip forwarding
    no ipv6 forwarding
    service integrated-vtysh-config
    line vty
    router bgp 4250001000
      neighbor 192.168.122.207 remote-as 65512
      neighbor 192.168.122.207 ebgp-multihop

2. Start FRR
    # systemctl start frr

3. Show running configuration. Note that FRR explicitly set and shows the default TTL (225)

    # vtysh -c 'show running-config'
    Building configuration...

    Current configuration:
    !
    frr version 7.5
    frr defaults traditional
    hostname centos8.localdomain
    no ip forwarding
    no ipv6 forwarding
    service integrated-vtysh-config
    !
    router bgp 4250001000
     neighbor 192.168.122.207 remote-as 65512
     neighbor 192.168.122.207 ebgp-multihop 255
    !
    line vty
    !
    end

4. Copy initial frr.conf to frr.conf.new (no changes)
    # cp /etc/frr/frr.conf /root/frr.conf.new

5. Run frr-reload.sh:

    # /usr/lib/frr/frr-reload.py --test  /root/frr.conf.new 
    2023-08-20 20:15:48,050  INFO: Called via "Namespace(bindir='/usr/bin', confdir='/etc/frr', daemon='', debug=False, filename='/root/frr.conf.new', input=None, log_level='info', overwrite=False, pathspace=None, reload=False, rundir='/var/run/frr', stdout=False, test=True, vty_socket=None)"
    2023-08-20 20:15:48,050  INFO: Loading Config object from file /root/frr.conf.new
    2023-08-20 20:15:48,124  INFO: Loading Config object from vtysh show running

    Lines To Delete
    ===============
    router bgp 4250001000
     no neighbor 192.168.122.207 ebgp-multihop 255

    Lines To Add
    ============
    router bgp 4250001000
     neighbor 192.168.122.207 ebgp-multihop
 
As seen above, the FRR reloader wants to remove the BGP peer and re-add it. This causes the reported BGP peering flapping issue.

I also reproduced this issue on the latest stable upstream FRR frr-9.0-01.el9.x86_64.

Comment 1 Carlos Goncalves 2023-08-20 20:30:45 UTC
This issue was originally reported by an OpenShift MetalLB customer in https://issues.redhat.com/browse/OCPBUGS-17704

Comment 2 Carlos Goncalves 2023-08-20 20:39:01 UTC
Reported issue in upstream FRR: https://github.com/FRRouting/frr/issues/14242

Comment 3 Carlos Goncalves 2023-08-21 11:27:49 UTC
An upstream PR has been posted: https://github.com/FRRouting/frr/issues/14242

I built RHEL 9.2 and 8.6 test RPMs and can confirm the issue is fixed.

    /usr/libexec/frr/frr-reload.py --test  /root/frr.conf.new 
    2023-08-21 11:15:22,012  INFO: Called via "Namespace(input=None, reload=False, test=True, debug=False, log_level='info', stdout=False, pathspace=None, filename='/root/frr.conf.new', overwrite=False, bindir='/usr/bin', confdir='/etc/frr', rundir='/var/run/frr', vty_socket=None, daemon='', test_reset=False)"
    2023-08-21 11:15:22,012  INFO: Loading Config object from file /root/frr.conf.new
    2023-08-21 11:15:22,047  INFO: Loading Config object from vtysh show running
    2023-08-21 11:15:22,086  INFO: "frr version 8.3.1" cannot be removed
    2023-08-21 11:15:22,086  INFO: "frr defaults traditional" cannot be removed

    Lines To Add
    ============
    service integrated-vtysh-config
    line vty


Test builds
  - RHEL 9.2: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=54758348
  - RHEL 8.6: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=54758798

Comment 4 Carlos Goncalves 2023-08-23 08:14:41 UTC
Requesting backport of the fix to 8.4.z, 8.6.z and 9.2.z.

Business justification:
OpenShift 4.13-4.16 (RHEL 9.2 based) include two Extended Update Support (EUS) releases.
OpenShift 4.12 (RHEL 8.6 based) customers have reported this issue in https://issues.redhat.com/browse/OCPBUGS-17704
OpenShift 4.10 (RHEL 8.4 based) is also impacted by this issue. RHEL 8.4 and 8.6 share the same 7.5 FRR version.

Comment 7 RHEL Program Management 2023-09-05 15:33:32 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 8 RHEL Program Management 2023-09-05 15:36:46 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues.

Comment 9 Joe Orton 2023-09-06 07:39:25 UTC
Now tracking at: https://issues.redhat.com/browse/RHEL-2263