Bug 2050824

Summary: Deleting a BFDProfile object does not delete the corresponding running-conf from the MetalLB speakers
Product: OpenShift Container Platform Reporter: Jose Castillo Lema <jlema>
Component: NetworkingAssignee: Sabina Aledort <saledort>
Networking sub component: Metal LB QA Contact: Arti Sood <asood>
Status: CLOSED DEFERRED Docs Contact:
Severity: high    
Priority: high CC: cgoncalves, dblack, eoneill, fpaoline, jlema, kquinn, saledort, tradej
Version: 4.10Keywords: Triaged
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Cause: BFD can't be disabled when removing a custom profile. The FRR reloader is unsetting the custom profile and falling back to the default built-in profile. Consequence: After enabling BFD it can't be disabled. Workaround (if any): Deleting the BGPPeers configuration and recreating them without a BFD profile. Result:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-09 01:12:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2054160    
Bug Blocks:    

Description Jose Castillo Lema 2022-02-04 17:36:25 UTC
Description of problem:
After deleting a BFDProfile object the corresponding running-conf from the MetalLB speakers does not get deleted

Version-Release number of selected component (if applicable):
OCP Version: 4.10.0-fc.2
Kubernetes Version: v1.23.0+60f5a1c
MetalLB Version: 4.10.0-202201210948

How reproducible:
100%

Steps to Reproduce:
1. Create a bfdprofile object
2. Check that the corresponding bfd profile configuration is loaded into the MetalLB speakers
3. Delete the bfdprofile object:
  $ oc delete bfdprofile bfdprofilefull
  bfdprofile.metallb.io "bfdprofilefull" deleted

Actual results:
The corresponding bfd profile is not deleted from the running-config of the speakers:
  $ oc -n metallb-system rsh speaker-9pcl2
  sh-4.4# vtysh
  sh-4.4# sh running-config
  ...
  bfd
   profile bfdprofilefull
    detect-multiplier 37
    transmit-interval 35
    receive-interval 35
    passive-mode
    echo-mode
    minimum-ttl 10
   !
  !

Expected results:
The corresponding bfd profile gets deleted from the running-config of the speakers

Additional info:

Comment 1 Jose Castillo Lema 2022-02-04 17:38:02 UTC
BTW, at the moment of the BFDProfile deletion we did not have any BGPPeers using the BFDProfile:

$ oc get bgppeer -o yaml
apiVersion: v1
items:
- apiVersion: metallb.io/v1beta1
  kind: BGPPeer
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"metallb.io/v1beta1","kind":"BGPPeer","metadata":{"annotations":{},"name":"peer-65000-ipv6","namespace":"metallb-system"},"spec":{"myASN":65001,"peerASN":65000,"peerAddress":"fd01:1101::1"}}
    creationTimestamp: "2022-02-04T17:09:29Z"
    generation: 1
    name: peer-65000-ipv6
    namespace: metallb-system
    resourceVersion: "22954264"
    uid: 6fee6700-7209-4860-aa7a-71a7e4d67143
  spec:
    myASN: 65001
    peerASN: 65000
    peerAddress: fd01:1101::1
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 2 Carlos Goncalves 2022-02-07 16:49:09 UTC
I can confirm this bug. My test environment is upstream main MetalLB.

1. create BGP peer and BFD profile in MetalLB's ConfigMap
2. verify BGP peer is using expected BFD Profile
3. delete BFD profile and unreferenced it from the BGP peer
4. verify that BFD profile is still defined in FRR and that the BGP peer still has BFD on (although not set to the deleted BFD profile)

If only the BFD profile was still configured in FRR, this bug would have a severity. However, the BGP peer still has BFD enabled (default BFD profile) so this can cause data-plane outages.


kind-control-plane# show running-config 
[...]
router bgp 64512
 bgp router-id 1.2.3.4
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 no bgp network import-check
 neighbor 10.0.0.1 remote-as 64512
 neighbor 10.0.0.1 bfd              <----------------------

[...]

bfd                    <----------------------
 profile bfdprofilename
  detect-multiplier 200
  transmit-interval 270
  receive-interval 280
  echo-interval 62
  minimum-ttl 254
 !
!
end

Comment 3 Carlos Goncalves 2022-02-07 16:52:32 UTC
(In reply to Carlos Goncalves from comment #2)
> If only the BFD profile was still configured in FRR, this bug would have a
> severity. However, the BGP peer still has BFD enabled (default BFD profile)
> so this can cause data-plane outages.

Small correction:
If only the BFD profile was still configured in FRR, this bug would have a *low* severity.

Comment 14 Shiftzilla 2023-03-09 01:12:26 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9107