Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2231273

Summary: [RFE] Request to support OVN L3 HA when external network failure.
Product: Red Hat OpenStack Reporter: youngcheol <yocha>
Component: python-networking-ovnAssignee: Miro Tomaska <mtomaska>
Status: CLOSED WONTFIX QA Contact: Eran Kuris <ekuris>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: apevec, fyanac, gurpsing, lhh, majopela, mlavalle, mtomaska, scohen
Target Milestone: ---Keywords: FutureFeature, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-12-10 16:27:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description youngcheol 2023-08-11 08:06:21 UTC
Description of problem:

OVN Layer 3 high availability can not detect External network failures.
but there is a request to make it work and automatically HA.


https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html/networking_guide/assembly_work-with-ovn_rhosp-network#l3-ha-ovn_work-ovn
Note: 
  External network failures are not detected as would happen with an ML2-OVS configuration.
  
L3 HA for OVN supports the following failure modes:
  - The gateway node becomes disconnected from the network (tunneling interface).
  - ovs-vswitchd stops (ovs-switchd is responsible for BFD signaling)
  - ovn-controller stops (ovn-controller removes itself as a registered node).
  
  
  

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Miro Tomaska 2023-08-22 20:43:46 UTC
Hi YoungCheol,

I have looked into the documentation you linked and I am not sure where the Note comes from. As far as I can tell, ML2/OVS backend also does not have ability to detect issues on the external(to openstack) network.
Is this documentation Note the only source you have indicating that ML2/OVS has such capability? I can work with documentation to have that fixed. Thank you

Comment 5 Miro Tomaska 2023-08-31 01:38:44 UTC
Hi YoungCheol,

Thanks for sharing that KCS, to be honest I did not know that ML2/OVN keepalived had that capability so I looked into it.
So basically what happens in keepalived(used with ML2/OVS) when you set ha_vrrp_health_check_interval to anything > 0 is that it configures keep_alived track_script[1] which is basically a script that keepalived runs and considers its return code into overall healthcheck(it also does some weight and priorty checks but thats not relevant here). If the script returns 0, everything is ok, otherwise something is wrong and then the failover starts.
In our case the script just pings the external network subnets gateway IPs. Here is an example from my 
system

keepalived.conf (irrelevant things are omitted):
vrrp_script ha_health_check_23 {                                                                             
    script "/var/lib/neutron/ha_confs/d9652efa-8031-46f7-948e-8710c73232ad/ha_check_script_23.sh"            
    interval 5                                                                                               
    fall 2                                                                                                   
    rise 2                                                                                                   
}  
... 
    track_script {
        ha_health_check_23
    }

where cat of ha_check_script_23.sh

#!/bin/bash -eu
ip a | grep fe80::f816:3eff:fed4:bf7b || exit 0
ping -c 1 -w 1 10.0.0.1 1>/dev/null || exit 1
ping6 -c 1 -w 1 2620:52:0:13b8::fe 1>/dev/null || exit 1


As far as I can tell this functionality does not exists in ML2/OVN and this is a valid RFE. Looking at this image[2], the OVN BFD alrgorithm only checks connections between nodes (interface 2) but does not check the interface 3. I think this is what the customer wants in ML2/OVN setup. I.e. check if connection to interface 3 is also OK and if not consider the node "not healthy"




[1] https://manpages.debian.org/testing/keepalived/keepalived.conf.5.en.html
[2] https://docs.openstack.org/networking-ovn/latest/admin/routing.html#bfd-monitoring

Comment 6 Miro Tomaska 2023-08-31 01:42:00 UTC
^ Correction I meant to say " ... I did not know that ML2/OVS keepalived had that capability..."

Comment 7 youngcheol 2023-08-31 04:09:13 UTC
Hi Miro,

Thank you for your update.

Yes your understanding is correct what customer wants.

Let me know if you need additional info.

Regards,
YoungCheol.