Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1836209

Summary: [RFE] OVN - Multiple bridges support for different datapaths
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Daniel Alvarez Sanchez <dalvarez>
Component: ovn23.03Assignee: Ihar Hrachyshka <ihrachys>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: medium Docs Contact:
Priority: medium    
Version: FDP 20.ECC: ctrautma, dcbw, dsneddon, echaudro, ihrachys, jiji, jishi, ltomasbo, mbooth, mmichels, ralongi, william.caban
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn23.03-23.03.0-68.el8fdp ovn23.03-23.03.0-68.el9fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-21 02:08:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Daniel Alvarez Sanchez 2020-05-15 12:40:52 UTC
Right now, the configuration of the OVN bridge is read from the local OVSDB external_ids:ovn-bridge. This poses a limitation of one single datapath for everything running in that hypervisor.

A possible use case for multiple datapaths could be those where workloads would have a NIC on an accelerated datapath (netdev/OVS-DPDK) and another NIC for management (or not as latency/throughput sensitive) where ACLs and other things are applied.

This way we can better segregate traffic in a performance oriented way.

A possibility could be to run multiple ovn-controller instances and each of them would ready the config either from OVSDB (as today) or a config file if passed in as parameter. All the instances could initially talk to the same ovs-vswitchd.

For CMSs (OpenStack, OpenShift), we would require RFEs to leverage this feature and plug the correspondent ports to the right OVN bridge.

Comment 3 Dan Williams 2021-05-03 13:50:18 UTC
Latest is v12 from Nov 2020: http://patchwork.ozlabs.org/project/ovn/patch/20201119032052.599236-1-ihrachys@redhat.com/

Comment 4 Ihar Hrachyshka 2021-07-27 17:24:03 UTC
This will have to be reworked if we need it. Moving back to ASSIGNED.

Comment 5 Dan Sneddon 2022-11-16 14:51:10 UTC
It is not mentioned here, but I envision this using separate name spaces for each ovn-controller and associated bridge. Is that how this would most likely be implemented? Would there be any need for separate routing tables (VRFs)? Perhaps not, since the OVN controller works with flows, but I wonder if the management bridge in the example in the description should use a different routing table than the fast dataplane bridge? This would allow separate bridges to be used for outbound traffic that is routed via a default route, for instance, or to ensure that traffic was symmetric in/out the same bridge via route rules or dynamic routing (BGP).

Comment 6 Ihar Hrachyshka 2022-11-16 15:18:51 UTC
The proposed implementation is proposed at: https://patchwork.ozlabs.org/project/ovn/list/?series=323408 and is expected to land in the upcoming OVN 22.12 release. (pending successful review in next weeks)

In the proposed patch series, all co-located ovn-controllers are running in the same network namespace. The same vswitchd, (perhaps running in the main namespace) will reuse the same routing table for both bridges, each managed by its own ovn-controller instance. There is no namespace separation between controller processes, only logical separation (controllers make sure not to step on each other's toes, e.g. killing patch ports).

I am vague on symmetric routing concern you expressed (perhaps this should be tested?), but in general, tunnel traffic will be managed by corresponding tunnels, each carrying unique tunnel IPs (external_ids:ovn-encap-ip-<virtual-chassis-name>) that belong to separate controller processes.

Note that the proposed implementation will be considered experimental upstream, and we are also aware about at least a number of deficiencies in it that will have to be resolved before this can be used in production. One obvious deficiency is the fact that ovn-controllers reuse the same ct zone namespace w/o negotiating space allocation between each other. This can probably be resolved by adopting the new ovn-chassis-idx- db property into a new ct zone sharing mechanism (see the index property added in the first patch of the series: https://patchwork.ozlabs.org/project/ovn/patch/20221018183150.1213728-2-ihrachys@redhat.com/)

Let me know if this addressed your question.

Comment 7 Dan Sneddon 2022-11-22 01:31:26 UTC
(In reply to Ihar Hrachyshka from comment #6)

Thank you very much Ihar, this does answer my question, and clarifies that separate OVN controllers use the same network namespace. I took a closer look at the patches, and thought about both controllers using the same vswitchd (in the same way a single controller does). I don’t believe these changes would add any issues with symmetric routing, and if any currently exist they would be addressed elsewhere. Testing will confirm, but to me this looks like a solid and beneficial change.

Comment 8 Ihar Hrachyshka 2022-11-28 17:53:32 UTC
FYI this slipped into 2023.03 because of last minute reviews upstream pre-branching.

Comment 9 Ihar Hrachyshka 2023-01-09 17:49:29 UTC
FYI it just landed in main upstream so it will be part of 2023.03. There may still be bugs to squash (for BGP and otherwise). There are known limitations like stateful ACLs not supported in this setup (can be fixed in a follow-up).

I will update to MODIFIED when there's a 2023.03 package for OVN. The feature won't be backported.

Comment 10 Jianlin Shi 2023-07-10 07:48:28 UTC
Hi Ihar,
which patch add the feature into ovn23.03? how could we enable this feature and create multiple ovn-controllers on the same chassis?

Comment 11 Ihar Hrachyshka 2023-07-10 13:32:13 UTC
The list of commits that implement the feature:

- https://github.com/ovn-org/ovn/commit/dae2eb8a17f35099539bf338746e8e5b917fd4e6
- https://github.com/ovn-org/ovn/commit/48db2a7a353a81aaa1795ef4b35b0fab1f0b0ccc
- https://github.com/ovn-org/ovn/commit/3dbf5f03df5ac277b7398686e60cd9e3205359fe
- https://github.com/ovn-org/ovn/commit/8b48f7d69400ad3db7043ec383c00a19148132d5
- https://github.com/ovn-org/ovn/commit/98b436db9f0e2f732960752cd88d51baeca04bb9
- https://github.com/ovn-org/ovn/commit/b600316f252aa29f15d153d9331f8557f1f77874
- https://github.com/ovn-org/ovn/commit/ab7b0eb8ca05af4bfd3b0b9730a124ad89d4ca42 (test)
- https://github.com/ovn-org/ovn/commit/ce126c9a82108f25cb4218edada0ce2d7757e146 (docs)

To use it, you can either set a custom chassis name in system-id-override file, or pass it via CLI: ovn-controller -n <chassis-name>

Note that they should also use different bridges not to conflict with each other. This is achieved by setting ovn-bridge-<chassis-name>= for each chassis-name to point to a different bridge. You may want to configure other chassis-name-specific options for each of the instances of the service, see how this is done in the test scenario: https://github.com/ovn-org/ovn/commit/ab7b0eb8ca05af4bfd3b0b9730a124ad89d4ca42

I hope this helps.

Comment 14 Jianlin Shi 2023-07-27 02:27:52 UTC
Hi Ihar,

Does "systemctl start ovn-controller" support this feature?
if yes, how could we configure before start the service?

Comment 15 Ihar Hrachyshka 2023-07-27 13:44:50 UTC
I haven't done any systemctl related changes to support this feature. If you want to drive it with systemd, you will have to define your custom unit file that would pass the appropriate chassis-name / bridge-name etc.

Comment 16 Jianlin Shi 2023-07-27 23:42:06 UTC
(In reply to Ihar Hrachyshka from comment #15)
> I haven't done any systemctl related changes to support this feature. If you
> want to drive it with systemd, you will have to define your custom unit file
> that would pass the appropriate chassis-name / bridge-name etc.

ok, so to try this feature, we need to start the ovn-controller manually, rather than use the systemd

Comment 17 Jianlin Shi 2023-07-28 09:05:38 UTC
is the feature supported on ovn22.12-108, it seems that ovn-controller doesn't support -n parameters:

+ systemctl start openvswitch
+ systemctl start ovn-northd
+ ovn-nbctl set-connection ptcp:6641
+ ovn-sbctl set-connection ptcp:6642
+ ovs-vsctl add-br br-hv1
+ ip link set br-hv1 up
+ ip addr add 1.1.52.15/24 dev br-hv1
+ ovs-vsctl -- set Open_vSwitch . external-ids:ovn-remote-hv1=tcp:1.1.52.25:6642 -- set Open_vSwitch . external-ids:ovn-encap-type-hv1=geneve -- set Open_vSwitch . external-ids:ovn-encap-ip-hv1=1.1.52.15 -- set Open_vSwitch . external-ids:ovn-bridge-hv1=br-hv1
+ ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller1.log --pidfile=/run/ovn/ovn-controller1.pid --detach -n hv1
ovn-controller: invalid option -- 'n'

[root@dell-per740-42 bz1836209]# rpm -qa | grep -E "openvswitch3.1|ovn22.12"
openvswitch3.1-3.1.0-38.el9fdp.x86_64
python3-openvswitch3.1-3.1.0-38.el9fdp.x86_64
openvswitch3.1-test-3.1.0-38.el9fdp.noarch
ovn22.12-22.12.0-108.el9fdp.x86_64
ovn22.12-central-22.12.0-108.el9fdp.x86_64
ovn22.12-host-22.12.0-108.el9fdp.x86_64

Comment 18 Jianlin Shi 2023-07-28 09:06:03 UTC
and the command is supported on ovn23.03-86.el9

Comment 19 Ihar Hrachyshka 2023-07-28 13:07:25 UTC
It's 23.03 feature, and Fixed in Version points to 23.03. Am I missing something?

Comment 20 Jianlin Shi 2023-07-28 16:28:11 UTC
the component for this bug is ovn22.12, and now it's in the errata for ovn22.12.
so maybe we need to remove this bug from the errata for ovn22.12, and add it into the errata for ovn23.03

Comment 21 Ihar Hrachyshka 2023-07-28 16:55:13 UTC
Changing component, sorry I missed it. This should not be part of 22.12 errata, the feature was not there and wasn't intended to be there.

Comment 22 Ihar Hrachyshka 2023-07-28 16:57:09 UTC
Sorry for the mess Mark. Is there anything else that should be taken care to make sure this is not part of the 22.12 errata? Thanks.

Comment 23 Jianlin Shi 2023-08-01 03:15:41 UTC
I run following script to start 2 ovn-controller in one machine:

systemctl start openvswitch                                                                                                                                                                        [38/1891]
systemctl start ovn-northd                                                                            
ovn-nbctl set-connection ptcp:6641                                                                    
ovn-sbctl set-connection ptcp:6642                                                                    
                                                                                                      
#echo hv1 > /etc/ovn/system-id-override                                                               
ovs-vsctl \                                                                                           
        -- set Open_vSwitch . external-ids:ovn-remote-hv1=tcp:1.1.202.25:6642 \                       
        -- set Open_vSwitch . external-ids:ovn-encap-type-hv1=geneve \       
        -- set Open_vSwitch . external-ids:ovn-encap-ip-hv1=1.1.202.15 \
        -- set Open_vSwitch . external-ids:ovn-bridge-hv1=br-hv1
                                                                                                      
ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller1.log --pidfile=/run/ovn/ovn-controller
1.pid --detach -n hv1                                                                                 
                                                                                                      
sleep 10                                                                                              
                                                                                                      
#echo hv2 > /etc/openvswitch/system-id-override
ovs-vsctl \                                                                                           
        -- set Open_vSwitch . external-ids:ovn-remote-hv2=tcp:1.1.202.25:6642 \
        -- set Open_vSwitch . external-ids:ovn-encap-type-hv2=geneve \
        -- set Open_vSwitch . external-ids:ovn-encap-ip-hv2=1.1.202.25 \                              
        -- set Open_vSwitch . external-ids:ovn-bridge-hv2=br-hv2
                                                                                                      
ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller2.log --pidfile=/run/ovn/ovn-controller
2.pid --detach -n hv2                                                                                 
                                                                                                      
ovn-nbctl ls-add ls1                                                                            
ovn-nbctl lsp-add ls1 ls1p1                                                                     
ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:01 192.168.1.1 2001::1"                   
ovn-nbctl lsp-add ls1 ls1p2                                                                  
ovn-nbctl lsp-set-addresses ls1p2 "00:00:00:01:01:02 192.168.1.2 2001::2"                             
ovn-nbctl lsp-add ls1 ls1p3                                                                           
ovn-nbctl lsp-set-addresses ls1p3 "00:00:00:01:01:03 192.168.1.3 2001::3"                             
                                                                                                      
ovn-nbctl lr-add lr1                                                                                  
ovn-nbctl lrp-add lr1 lr1-ls1 00:00:00:00:00:01 192.168.1.254/24 2001::a/64
ovn-nbctl lsp-add ls1 ls1-lr1                                                                         
ovn-nbctl lsp-set-addresses ls1-lr1 "00:00:00:00:00:01 192.168.1.254 2001::a"
ovn-nbctl lsp-set-type ls1-lr1 router
ovn-nbctl lsp-set-options ls1-lr1 router-port=lr1-ls1    
                                   
ovn-nbctl lrp-add lr1 lr1-ls2 00:00:00:00:00:02 192.168.2.254/24 2002::a/64
                                                
ovn-nbctl ls-add ls2                                           
ovn-nbctl lsp-add ls2 ls2-lr1                               
ovn-nbctl lsp-set-addresses ls2-lr1 "00:00:00:00:00:02 192.168.2.254 2002::a"
ovn-nbctl lsp-set-type ls2-lr1 router
ovn-nbctl lsp-set-options ls2-lr1 router-port=lr1-ls2
                                                                
ovn-nbctl lsp-add ls2 ls2p1             
ovn-nbctl lsp-set-addresses ls2p1 "00:00:00:01:02:01 192.168.2.1 2002::1"
ovn-nbctl lsp-add ls2 ls2p2                          
ovn-nbctl lsp-set-addresses ls2p2 "00:00:00:01:02:02 192.168.2.2 2002::2"
ovn-nbctl lsp-add ls2 ls2p3
ovn-nbctl lsp-set-addresses ls2p3 "00:00:00:01:02:03 192.168.2.3 2002::3"

ovs-vsctl add-port br-hv1 ls1p1 -- set interface ls1p1 type=internal external_ids:iface-id=ls1p1
ovs-vsctl add-port br-hv1 ls2p1 -- set interface ls2p1 type=internal external_ids:iface-id=ls2p1
                                                                                 
ip netns add ls1p1                     
ip link set ls1p1 netns ls1p1              
ip netns exec ls1p1 ip link set ls1p1 address 00:00:00:01:01:01
ip netns exec ls1p1 ip link set ls1p1 up
ip netns exec ls1p1 ip addr add 192.168.1.1/24 dev ls1p1
ip netns exec ls1p1 ip addr add 2001::1/64 dev ls1p1
ip netns exec ls1p1 ip route add default via 192.168.1.254 dev ls1p1
ip netns exec ls1p1 ip -6 route add default via 2001::a dev ls1p1

ip netns add ls2p1                                 
ip link set ls2p1 netns ls2p1
ip netns exec ls2p1 ip link set ls2p1 address 00:00:00:01:02:01
ip netns exec ls2p1 ip link set ls2p1 up
ip netns exec ls2p1 ip addr add 192.168.2.1/24 dev ls2p1
ip netns exec ls2p1 ip addr add 2002::1/64 dev ls2p1
ip netns exec ls2p1 ip route add default via 192.168.2.254 dev ls2p1
ip netns exec ls2p1 ip -6 route add default via 2002::a dev ls2p1

ovs-vsctl add-port br-hv2 ls1p2 -- set interface ls1p2 type=internal external_ids:iface-id=ls1p2      
ip netns add ls1p2                                 
ip link set ls1p2 netns ls1p2                      
ip netns exec ls1p2 ip link set ls1p2 address 00:00:00:01:01:02                                       
ip netns exec ls1p2 ip link set ls1p2 up           
ip netns exec ls1p2 ip addr add 192.168.1.2/24 dev ls1p2                                              
ip netns exec ls1p2 ip addr add 2001::2/64 dev ls1p2                                                  
ip netns exec ls1p2 ip route add default via 192.168.1.254 dev ls1p2                                  
ip netns exec ls1p2 ip -6 route add default via 2001::a                                               

ovs-vsctl add-port br-hv2 ls2p2 -- set interface ls2p2 type=internal external_ids:iface-id=ls2p2
ip netns add ls2p2                                 
ip link set ls2p2 netns ls2p2                      
ip netns exec ls2p2 ip link set ls2p2 address 00:00:00:01:02:02                                       
ip netns exec ls2p2 ip link set ls2p2 up           
ip netns exec ls2p2 ip addr add 192.168.2.2/24 dev ls2p2                                              
ip netns exec ls2p2 ip addr add 2002::2/64 dev ls2p2                                                  
ip netns exec ls2p2 ip route add default via 192.168.2.254 dev ls2p2                                  
ip netns exec ls2p2 ip -6 route add default via 2002::a dev ls2p2

and the script passed on ovn23.03-23.03.0-86.el9:

[root@wsfd-advnetlab18 bz1836209]# ovs-vsctl show                                                     
deb29006-ec8e-4688-b58f-dc4ca8c8bb34                                                                  
    Bridge br-hv1                                                                                     
        fail_mode: secure                                                                             
        datapath_type: system                                                                         
        Port ls2p1                                                                                    
            Interface ls2p1                                                                           
                type: internal                                                                        
        Port ovn-hv2-0                                                                                
            Interface ovn-hv2-0                                                                       
                type: geneve                                                                          
                options: {csum="true", key=flow, remote_ip="1.1.202.25"}                              
        Port br-hv1                                                                                   
            Interface br-hv1                                                                          
                type: internal                                                                        
        Port ls1p1                                                                                    
            Interface ls1p1                                                                           
                type: internal                                                                        
    Bridge br-hv2                                                                                     
        fail_mode: secure                                                                             
        datapath_type: system                                                                         
        Port br-hv2                                                                                   
            Interface br-hv2                                                                          
                type: internal                                                                        
        Port ls1p2                                                                                    
            Interface ls1p2                                                                           
                type: internal                                                                        
        Port ls2p2                                                                                    
            Interface ls2p2                                                                           
                type: internal                                                                        
        Port ovn0-hv1-0                                                                               
            Interface ovn0-hv1-0                                                                      
                type: geneve                                                                          
                options: {csum="true", key=flow, remote_ip="1.1.202.15"}                              
    ovs_version: "3.1.3"                                                                              
[root@wsfd-advnetlab18 bz1836209]# rpm -qa | grep -E "openvswitch3.1|ovn23.03"                        
ovn23.03-23.03.0-86.el9fdp.x86_64                                                                     
openvswitch3.1-3.1.0-38.el9fdp.x86_64                                                                 
ovn23.03-central-23.03.0-86.el9fdp.x86_64                                                             
ovn23.03-host-23.03.0-86.el9fdp.x86_64 

but if I start another controller on another system:

systemctl start openvswitch
ovs-vsctl set open . external_ids:system-id=hv3 external_ids:ovn-remote=tcp:1.1.202.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.202.26
systemctl start ovn-controller                                                                        

ovs-vsctl add-port br-int ls1p3 -- set interface ls1p3 type=internal external_ids:iface-id=ls1p3      
ovs-vsctl add-port br-int ls2p3 -- set interface ls2p3 type=internal external_ids:iface-id=ls2p3      
                                                                                                      
ip netns add ls1p3
ip link set ls1p3 netns ls1p3                                                                         
ip netns exec ls1p3 ip link set ls1p3 address 00:00:00:01:01:03                                       
ip netns exec ls1p3 ip link set ls1p3 up                                                              
ip netns exec ls1p3 ip addr add 192.168.1.3/24 dev ls1p3                                              
ip netns exec ls1p3 ip addr add 2001::3/64 dev ls1p3                                                  
ip netns exec ls1p3 ip route add default via 192.168.1.254 dev ls1p3                                  
ip netns exec ls1p3 ip -6 route add default via 2001::a dev ls1p3                                     

ip netns add ls2p3
ip link set ls2p3 netns ls2p3                                                                         
ip netns exec ls2p3 ip link set ls2p3 address 00:00:00:01:02:03                                       
ip netns exec ls2p3 ip link set ls2p3 up                                                              
ip netns exec ls2p3 ip addr add 192.168.2.3/24 dev ls2p3                                              
ip netns exec ls2p3 ip addr add 2002::3/64 dev ls2p3                                                  
ip netns exec ls2p3 ip route add default via 192.168.2.254 dev ls2p3                                  
ip netns exec ls2p3 ip -6 route add default via 2002::a dev ls2p3

there is error in ovs-vsctl:

[root@wsfd-advnetlab18 bz1836209]# ovs-vsctl show                                                     
deb29006-ec8e-4688-b58f-dc4ca8c8bb34                                                                  
    Bridge br-hv1                                                                                     
        fail_mode: secure                                                                             
        datapath_type: system                                                                         
        Port ls2p1                                                                                    
            Interface ls2p1                                                                           
                type: internal                                                                        
        Port ovn-hv2-0                                                                                
            Interface ovn-hv2-0                                                                       
                type: geneve                                                                          
                options: {csum="true", key=flow, remote_ip="1.1.202.25"}                              
        Port ovn-hv3-0                                                                                
            Interface ovn-hv3-0                                                                       
                type: geneve                                                                          
                options: {csum="true", key=flow, remote_ip="1.1.202.26"}                              
        Port br-hv1                                                                                   
            Interface br-hv1                                                                          
                type: internal                                                                        
        Port ls1p1                                                                                    
            Interface ls1p1                                                                           
                type: internal                                                                        
    Bridge br-hv2                                                                                     
        fail_mode: secure                                                                             
        datapath_type: system                                                                         
        Port ovn0-hv3-0                                                                               
            Interface ovn0-hv3-0                                                                      
                type: geneve                                                                          
                options: {csum="true", key=flow, remote_ip="1.1.202.26"}                              
                error: "could not add network device ovn0-hv3-0 to ofproto (File exists)"  

<== both br-hv1 and br-hv2 tried to create ovn0-hv3-0 port, then it failed on one of the bridge
           
        Port br-hv2                                                                                   
            Interface br-hv2                                                                          
                type: internal                                                                        
        Port ls1p2                                                                                    
            Interface ls1p2                                                                           
                type: internal                                                                        
        Port ls2p2                                                                                    
            Interface ls2p2                                                                           
                type: internal                                                                        
        Port ovn0-hv1-0                                                                               
            Interface ovn0-hv1-0                                                                      
                type: geneve                                                                          
                options: {csum="true", key=flow, remote_ip="1.1.202.15"}                              
    ovs_version: "3.1.3"

Ihar, how could we solve this problem?

Comment 24 Ihar Hrachyshka 2023-08-01 13:42:49 UTC
Hi Jianlin,

would you mind sharing the contents of your Open_vSwitch table? It should have unique ovn-chassis-idx-* options set in other_config.

Comment 25 Ihar Hrachyshka 2023-08-02 14:24:20 UTC
Here's the list of indices picked by OVN for the two controllers on the host:

```
[root@wsfd-advnetlab18 ~]# ovs-vsctl list Open_vSwitch | grep idx
other_config        : {ovn-chassis-idx-hv1="", ovn-chassis-idx-hv2="0", vlan-limit="0"}
```

These indices are supposed to be used when determining the name of tunnel ports to create to connect to the other chassis (hv3). But the error (`error: "could not add network device ovn0-hv3-0 to ofproto (File exists)"`) seems to suggest that they tried to create a port with the exact same name, which failed.

I've logged to one of the nodes kindly provided by Jianlin, and I also see that OVS complains about an existing port for the other name too:

```
Bridge br-hv1
...
    Bridge br-hv1
        fail_mode: secure
        datapath_type: system
        Port ovn-hv3-0
            Interface ovn-hv3-0
                type: geneve
                options: {csum="true", key=flow, remote_ip="1.1.202.26"}
                error: "could not add network device ovn-hv3-0 to ofproto (File exists)"
```

(note the prefix is ovn- not ovn0-)

Jinlin suggested above that `both br-hv1 and br-hv2 tried to create ovn0-hv3-0 port`, though I don't think it necessarily is what happens. It may be that the same ovn-controller tries to create the same tunnel port twice, failing the second time (in which case there's no cross-talk between ovn-controllers).

I will update the BZ as I make progress in investigation.

Comment 26 Ihar Hrachyshka 2023-08-02 14:37:05 UTC
A side note: the patch series that implemented the feature does not cover this particular scenario. The test case added there only validates that controllers co-located on the same host are able to talk to each other: https://github.com/ovn-org/ovn/commit/ab7b0eb8ca05af4bfd3b0b9730a124ad89d4ca42 In your scenario, you bring another controller running on a different node, so each of controllers has to establish a tunnel to the same peer. This is something that should be taken care of in the upstream test suite.

Comment 27 Ihar Hrachyshka 2023-08-02 14:52:58 UTC
The reason why this failed is elaborated about in `vswitchd` log file:

```
[root@wsfd-advnetlab18 ~]# zgrep WARN /var/log/openvswitch/ovs-vswitchd.log-20230802.gz 
2023-08-01T13:55:37.772Z|00050|tunnel|WARN|ovn-hv3-0: attempting to add tunnel port with same config as port 'ovn0-hv3-0' (::->1.1.202.26, key=flow, legacy_l2, dp port=3)
2023-08-01T13:55:37.772Z|00051|ofproto|WARN|br-hv1: could not add port ovn-hv3-0 (File exists)
2023-08-01T13:55:37.772Z|00052|bridge|WARN|could not add network device ovn-hv3-0 to ofproto (File exists)
```

I think the reason for the failure is that both tunnel ports managed by two co-located controller instances use the same wildcard local_ip (::) to match against tunnelled packets. To make the scenario work, OVN would have to pass, in addition to `remote_ip` of the peer chassis, the `local_ip` option that would use the value from `external_ids:ovn-encap-ip`.

That said, OVN already allows to request enforcement of `local_ip` setting for tunnel ports by setting the following in `Open_vSwitch` object: `external_ids:ovn-set-local-ip=true`. To test this, I executed the following: `ovs-vsctl set open . external_ids:ovn-set-local-ip=true` after which the error in `ovs-vsctl show` output vanished, and I can see in `vswitchd` log that the port is successfully created.

```
2023-08-02T14:48:20.310Z|00063|bridge|INFO|bridge br-hv1: added interface ovn-hv3-0 on port 7
```

While this works, I think there are a number of things that we may follow up on:

1. Update OVN documentation section covering multiple ovn-controller co-located on the same node to mention the need to set `ovn-set-local-ip=true`.
2. Add a test scenario in upstream test suite to cover the (common) case of co-located controller talking to another controller located on a different node.
3. Perhaps OVN could be smart to detect multiple co-located controllers (by inspecting the `ovn-chassis-idx-*` keys in `Open_vSwitch` records) and - once detected - enforce `local_ip` for all tunnel ports. (Perhaps this could be enforced unconditionally? What's the drawback of setting it for all tunnel ports regardless of whether multiple controllers are co-located?)

Comment 28 Ihar Hrachyshka 2023-08-02 15:12:48 UTC
FYI I'll clone the BZ to track follow-up items later. As for this RFE itself, I think it can be verified.

Comment 29 Jianlin Shi 2023-08-02 23:33:12 UTC
thanks Ihar for the well explained comments, after set  "ovs-vsctl set open . external_ids:ovn-set-local-ip=true", it works.

Comment 30 Jianlin Shi 2023-08-02 23:36:56 UTC
(In reply to Ihar Hrachyshka from comment #27)
> The reason why this failed is elaborated about in `vswitchd` log file:
> 
> ```
> [root@wsfd-advnetlab18 ~]# zgrep WARN
> /var/log/openvswitch/ovs-vswitchd.log-20230802.gz 
> 2023-08-01T13:55:37.772Z|00050|tunnel|WARN|ovn-hv3-0: attempting to add
> tunnel port with same config as port 'ovn0-hv3-0' (::->1.1.202.26, key=flow,
> legacy_l2, dp port=3)
> 2023-08-01T13:55:37.772Z|00051|ofproto|WARN|br-hv1: could not add port
> ovn-hv3-0 (File exists)
> 2023-08-01T13:55:37.772Z|00052|bridge|WARN|could not add network device
> ovn-hv3-0 to ofproto (File exists)
> ```
> 
> I think the reason for the failure is that both tunnel ports managed by two
> co-located controller instances use the same wildcard local_ip (::) to match
> against tunnelled packets. To make the scenario work, OVN would have to
> pass, in addition to `remote_ip` of the peer chassis, the `local_ip` option
> that would use the value from `external_ids:ovn-encap-ip`.
> 
> That said, OVN already allows to request enforcement of `local_ip` setting
> for tunnel ports by setting the following in `Open_vSwitch` object:
> `external_ids:ovn-set-local-ip=true`. To test this, I executed the
> following: `ovs-vsctl set open . external_ids:ovn-set-local-ip=true` after
> which the error in `ovs-vsctl show` output vanished, and I can see in
> `vswitchd` log that the port is successfully created.
> 
> ```
> 2023-08-02T14:48:20.310Z|00063|bridge|INFO|bridge br-hv1: added interface
> ovn-hv3-0 on port 7
> ```
> 
> While this works, I think there are a number of things that we may follow up
> on:
> 
> 1. Update OVN documentation section covering multiple ovn-controller
> co-located on the same node to mention the need to set
> `ovn-set-local-ip=true`.
> 2. Add a test scenario in upstream test suite to cover the (common) case of
> co-located controller talking to another controller located on a different
> node.
> 3. Perhaps OVN could be smart to detect multiple co-located controllers (by
> inspecting the `ovn-chassis-idx-*` keys in `Open_vSwitch` records) and -
> once detected - enforce `local_ip` for all tunnel ports. (Perhaps this could
> be enforced unconditionally? What's the drawback of setting it for all
> tunnel ports regardless of whether multiple controllers are co-located?)

I can't figure out what is the drawback for setting ovn-set-local-ip as true.
maybe sometimes when there are several ip addresses in the system, and the route is ecmp, the src ip of the tunnel packet would be chosen through the route caculated from ecmp.

Comment 31 Jianlin Shi 2023-08-04 23:41:46 UTC
the nat doesn't work well after enable this feature, add an issue to track: https://issues.redhat.com/browse/FD-3083

Comment 33 errata-xmlrpc 2023-08-21 02:08:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn22.12 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:4677

Comment 34 Ihar Hrachyshka 2023-09-22 15:47:37 UTC
Documenting the need for local-ip setting in upstream: https://patchwork.ozlabs.org/project/ovn/patch/20230922154655.5571-1-ihrachys@redhat.com/

Comment 35 Red Hat Bugzilla 2024-01-21 04:25:07 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days