Bug 1836209
| Summary: | [RFE] OVN - Multiple bridges support for different datapaths | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Daniel Alvarez Sanchez <dalvarez> |
| Component: | ovn23.03 | Assignee: | Ihar Hrachyshka <ihrachys> |
| Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | FDP 20.E | CC: | ctrautma, dcbw, dsneddon, echaudro, ihrachys, jiji, jishi, ltomasbo, mbooth, mmichels, ralongi, william.caban |
| Target Milestone: | --- | Keywords: | FutureFeature |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ovn23.03-23.03.0-68.el8fdp ovn23.03-23.03.0-68.el9fdp | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-08-21 02:08:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Daniel Alvarez Sanchez
2020-05-15 12:40:52 UTC
Upstream series is http://patchwork.ozlabs.org/project/ovn/list/?series=211705&state=%2A&archive=both Latest is v12 from Nov 2020: http://patchwork.ozlabs.org/project/ovn/patch/20201119032052.599236-1-ihrachys@redhat.com/ This will have to be reworked if we need it. Moving back to ASSIGNED. It is not mentioned here, but I envision this using separate name spaces for each ovn-controller and associated bridge. Is that how this would most likely be implemented? Would there be any need for separate routing tables (VRFs)? Perhaps not, since the OVN controller works with flows, but I wonder if the management bridge in the example in the description should use a different routing table than the fast dataplane bridge? This would allow separate bridges to be used for outbound traffic that is routed via a default route, for instance, or to ensure that traffic was symmetric in/out the same bridge via route rules or dynamic routing (BGP). The proposed implementation is proposed at: https://patchwork.ozlabs.org/project/ovn/list/?series=323408 and is expected to land in the upcoming OVN 22.12 release. (pending successful review in next weeks) In the proposed patch series, all co-located ovn-controllers are running in the same network namespace. The same vswitchd, (perhaps running in the main namespace) will reuse the same routing table for both bridges, each managed by its own ovn-controller instance. There is no namespace separation between controller processes, only logical separation (controllers make sure not to step on each other's toes, e.g. killing patch ports). I am vague on symmetric routing concern you expressed (perhaps this should be tested?), but in general, tunnel traffic will be managed by corresponding tunnels, each carrying unique tunnel IPs (external_ids:ovn-encap-ip-<virtual-chassis-name>) that belong to separate controller processes. Note that the proposed implementation will be considered experimental upstream, and we are also aware about at least a number of deficiencies in it that will have to be resolved before this can be used in production. One obvious deficiency is the fact that ovn-controllers reuse the same ct zone namespace w/o negotiating space allocation between each other. This can probably be resolved by adopting the new ovn-chassis-idx- db property into a new ct zone sharing mechanism (see the index property added in the first patch of the series: https://patchwork.ozlabs.org/project/ovn/patch/20221018183150.1213728-2-ihrachys@redhat.com/) Let me know if this addressed your question. (In reply to Ihar Hrachyshka from comment #6) Thank you very much Ihar, this does answer my question, and clarifies that separate OVN controllers use the same network namespace. I took a closer look at the patches, and thought about both controllers using the same vswitchd (in the same way a single controller does). I don’t believe these changes would add any issues with symmetric routing, and if any currently exist they would be addressed elsewhere. Testing will confirm, but to me this looks like a solid and beneficial change. FYI this slipped into 2023.03 because of last minute reviews upstream pre-branching. FYI it just landed in main upstream so it will be part of 2023.03. There may still be bugs to squash (for BGP and otherwise). There are known limitations like stateful ACLs not supported in this setup (can be fixed in a follow-up). I will update to MODIFIED when there's a 2023.03 package for OVN. The feature won't be backported. Hi Ihar, which patch add the feature into ovn23.03? how could we enable this feature and create multiple ovn-controllers on the same chassis? The list of commits that implement the feature: - https://github.com/ovn-org/ovn/commit/dae2eb8a17f35099539bf338746e8e5b917fd4e6 - https://github.com/ovn-org/ovn/commit/48db2a7a353a81aaa1795ef4b35b0fab1f0b0ccc - https://github.com/ovn-org/ovn/commit/3dbf5f03df5ac277b7398686e60cd9e3205359fe - https://github.com/ovn-org/ovn/commit/8b48f7d69400ad3db7043ec383c00a19148132d5 - https://github.com/ovn-org/ovn/commit/98b436db9f0e2f732960752cd88d51baeca04bb9 - https://github.com/ovn-org/ovn/commit/b600316f252aa29f15d153d9331f8557f1f77874 - https://github.com/ovn-org/ovn/commit/ab7b0eb8ca05af4bfd3b0b9730a124ad89d4ca42 (test) - https://github.com/ovn-org/ovn/commit/ce126c9a82108f25cb4218edada0ce2d7757e146 (docs) To use it, you can either set a custom chassis name in system-id-override file, or pass it via CLI: ovn-controller -n <chassis-name> Note that they should also use different bridges not to conflict with each other. This is achieved by setting ovn-bridge-<chassis-name>= for each chassis-name to point to a different bridge. You may want to configure other chassis-name-specific options for each of the instances of the service, see how this is done in the test scenario: https://github.com/ovn-org/ovn/commit/ab7b0eb8ca05af4bfd3b0b9730a124ad89d4ca42 I hope this helps. Hi Ihar, Does "systemctl start ovn-controller" support this feature? if yes, how could we configure before start the service? I haven't done any systemctl related changes to support this feature. If you want to drive it with systemd, you will have to define your custom unit file that would pass the appropriate chassis-name / bridge-name etc. (In reply to Ihar Hrachyshka from comment #15) > I haven't done any systemctl related changes to support this feature. If you > want to drive it with systemd, you will have to define your custom unit file > that would pass the appropriate chassis-name / bridge-name etc. ok, so to try this feature, we need to start the ovn-controller manually, rather than use the systemd is the feature supported on ovn22.12-108, it seems that ovn-controller doesn't support -n parameters: + systemctl start openvswitch + systemctl start ovn-northd + ovn-nbctl set-connection ptcp:6641 + ovn-sbctl set-connection ptcp:6642 + ovs-vsctl add-br br-hv1 + ip link set br-hv1 up + ip addr add 1.1.52.15/24 dev br-hv1 + ovs-vsctl -- set Open_vSwitch . external-ids:ovn-remote-hv1=tcp:1.1.52.25:6642 -- set Open_vSwitch . external-ids:ovn-encap-type-hv1=geneve -- set Open_vSwitch . external-ids:ovn-encap-ip-hv1=1.1.52.15 -- set Open_vSwitch . external-ids:ovn-bridge-hv1=br-hv1 + ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller1.log --pidfile=/run/ovn/ovn-controller1.pid --detach -n hv1 ovn-controller: invalid option -- 'n' [root@dell-per740-42 bz1836209]# rpm -qa | grep -E "openvswitch3.1|ovn22.12" openvswitch3.1-3.1.0-38.el9fdp.x86_64 python3-openvswitch3.1-3.1.0-38.el9fdp.x86_64 openvswitch3.1-test-3.1.0-38.el9fdp.noarch ovn22.12-22.12.0-108.el9fdp.x86_64 ovn22.12-central-22.12.0-108.el9fdp.x86_64 ovn22.12-host-22.12.0-108.el9fdp.x86_64 and the command is supported on ovn23.03-86.el9 It's 23.03 feature, and Fixed in Version points to 23.03. Am I missing something? the component for this bug is ovn22.12, and now it's in the errata for ovn22.12. so maybe we need to remove this bug from the errata for ovn22.12, and add it into the errata for ovn23.03 Changing component, sorry I missed it. This should not be part of 22.12 errata, the feature was not there and wasn't intended to be there. Sorry for the mess Mark. Is there anything else that should be taken care to make sure this is not part of the 22.12 errata? Thanks. I run following script to start 2 ovn-controller in one machine:
systemctl start openvswitch [38/1891]
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
#echo hv1 > /etc/ovn/system-id-override
ovs-vsctl \
-- set Open_vSwitch . external-ids:ovn-remote-hv1=tcp:1.1.202.25:6642 \
-- set Open_vSwitch . external-ids:ovn-encap-type-hv1=geneve \
-- set Open_vSwitch . external-ids:ovn-encap-ip-hv1=1.1.202.15 \
-- set Open_vSwitch . external-ids:ovn-bridge-hv1=br-hv1
ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller1.log --pidfile=/run/ovn/ovn-controller
1.pid --detach -n hv1
sleep 10
#echo hv2 > /etc/openvswitch/system-id-override
ovs-vsctl \
-- set Open_vSwitch . external-ids:ovn-remote-hv2=tcp:1.1.202.25:6642 \
-- set Open_vSwitch . external-ids:ovn-encap-type-hv2=geneve \
-- set Open_vSwitch . external-ids:ovn-encap-ip-hv2=1.1.202.25 \
-- set Open_vSwitch . external-ids:ovn-bridge-hv2=br-hv2
ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller2.log --pidfile=/run/ovn/ovn-controller
2.pid --detach -n hv2
ovn-nbctl ls-add ls1
ovn-nbctl lsp-add ls1 ls1p1
ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:01 192.168.1.1 2001::1"
ovn-nbctl lsp-add ls1 ls1p2
ovn-nbctl lsp-set-addresses ls1p2 "00:00:00:01:01:02 192.168.1.2 2001::2"
ovn-nbctl lsp-add ls1 ls1p3
ovn-nbctl lsp-set-addresses ls1p3 "00:00:00:01:01:03 192.168.1.3 2001::3"
ovn-nbctl lr-add lr1
ovn-nbctl lrp-add lr1 lr1-ls1 00:00:00:00:00:01 192.168.1.254/24 2001::a/64
ovn-nbctl lsp-add ls1 ls1-lr1
ovn-nbctl lsp-set-addresses ls1-lr1 "00:00:00:00:00:01 192.168.1.254 2001::a"
ovn-nbctl lsp-set-type ls1-lr1 router
ovn-nbctl lsp-set-options ls1-lr1 router-port=lr1-ls1
ovn-nbctl lrp-add lr1 lr1-ls2 00:00:00:00:00:02 192.168.2.254/24 2002::a/64
ovn-nbctl ls-add ls2
ovn-nbctl lsp-add ls2 ls2-lr1
ovn-nbctl lsp-set-addresses ls2-lr1 "00:00:00:00:00:02 192.168.2.254 2002::a"
ovn-nbctl lsp-set-type ls2-lr1 router
ovn-nbctl lsp-set-options ls2-lr1 router-port=lr1-ls2
ovn-nbctl lsp-add ls2 ls2p1
ovn-nbctl lsp-set-addresses ls2p1 "00:00:00:01:02:01 192.168.2.1 2002::1"
ovn-nbctl lsp-add ls2 ls2p2
ovn-nbctl lsp-set-addresses ls2p2 "00:00:00:01:02:02 192.168.2.2 2002::2"
ovn-nbctl lsp-add ls2 ls2p3
ovn-nbctl lsp-set-addresses ls2p3 "00:00:00:01:02:03 192.168.2.3 2002::3"
ovs-vsctl add-port br-hv1 ls1p1 -- set interface ls1p1 type=internal external_ids:iface-id=ls1p1
ovs-vsctl add-port br-hv1 ls2p1 -- set interface ls2p1 type=internal external_ids:iface-id=ls2p1
ip netns add ls1p1
ip link set ls1p1 netns ls1p1
ip netns exec ls1p1 ip link set ls1p1 address 00:00:00:01:01:01
ip netns exec ls1p1 ip link set ls1p1 up
ip netns exec ls1p1 ip addr add 192.168.1.1/24 dev ls1p1
ip netns exec ls1p1 ip addr add 2001::1/64 dev ls1p1
ip netns exec ls1p1 ip route add default via 192.168.1.254 dev ls1p1
ip netns exec ls1p1 ip -6 route add default via 2001::a dev ls1p1
ip netns add ls2p1
ip link set ls2p1 netns ls2p1
ip netns exec ls2p1 ip link set ls2p1 address 00:00:00:01:02:01
ip netns exec ls2p1 ip link set ls2p1 up
ip netns exec ls2p1 ip addr add 192.168.2.1/24 dev ls2p1
ip netns exec ls2p1 ip addr add 2002::1/64 dev ls2p1
ip netns exec ls2p1 ip route add default via 192.168.2.254 dev ls2p1
ip netns exec ls2p1 ip -6 route add default via 2002::a dev ls2p1
ovs-vsctl add-port br-hv2 ls1p2 -- set interface ls1p2 type=internal external_ids:iface-id=ls1p2
ip netns add ls1p2
ip link set ls1p2 netns ls1p2
ip netns exec ls1p2 ip link set ls1p2 address 00:00:00:01:01:02
ip netns exec ls1p2 ip link set ls1p2 up
ip netns exec ls1p2 ip addr add 192.168.1.2/24 dev ls1p2
ip netns exec ls1p2 ip addr add 2001::2/64 dev ls1p2
ip netns exec ls1p2 ip route add default via 192.168.1.254 dev ls1p2
ip netns exec ls1p2 ip -6 route add default via 2001::a
ovs-vsctl add-port br-hv2 ls2p2 -- set interface ls2p2 type=internal external_ids:iface-id=ls2p2
ip netns add ls2p2
ip link set ls2p2 netns ls2p2
ip netns exec ls2p2 ip link set ls2p2 address 00:00:00:01:02:02
ip netns exec ls2p2 ip link set ls2p2 up
ip netns exec ls2p2 ip addr add 192.168.2.2/24 dev ls2p2
ip netns exec ls2p2 ip addr add 2002::2/64 dev ls2p2
ip netns exec ls2p2 ip route add default via 192.168.2.254 dev ls2p2
ip netns exec ls2p2 ip -6 route add default via 2002::a dev ls2p2
and the script passed on ovn23.03-23.03.0-86.el9:
[root@wsfd-advnetlab18 bz1836209]# ovs-vsctl show
deb29006-ec8e-4688-b58f-dc4ca8c8bb34
Bridge br-hv1
fail_mode: secure
datapath_type: system
Port ls2p1
Interface ls2p1
type: internal
Port ovn-hv2-0
Interface ovn-hv2-0
type: geneve
options: {csum="true", key=flow, remote_ip="1.1.202.25"}
Port br-hv1
Interface br-hv1
type: internal
Port ls1p1
Interface ls1p1
type: internal
Bridge br-hv2
fail_mode: secure
datapath_type: system
Port br-hv2
Interface br-hv2
type: internal
Port ls1p2
Interface ls1p2
type: internal
Port ls2p2
Interface ls2p2
type: internal
Port ovn0-hv1-0
Interface ovn0-hv1-0
type: geneve
options: {csum="true", key=flow, remote_ip="1.1.202.15"}
ovs_version: "3.1.3"
[root@wsfd-advnetlab18 bz1836209]# rpm -qa | grep -E "openvswitch3.1|ovn23.03"
ovn23.03-23.03.0-86.el9fdp.x86_64
openvswitch3.1-3.1.0-38.el9fdp.x86_64
ovn23.03-central-23.03.0-86.el9fdp.x86_64
ovn23.03-host-23.03.0-86.el9fdp.x86_64
but if I start another controller on another system:
systemctl start openvswitch
ovs-vsctl set open . external_ids:system-id=hv3 external_ids:ovn-remote=tcp:1.1.202.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.202.26
systemctl start ovn-controller
ovs-vsctl add-port br-int ls1p3 -- set interface ls1p3 type=internal external_ids:iface-id=ls1p3
ovs-vsctl add-port br-int ls2p3 -- set interface ls2p3 type=internal external_ids:iface-id=ls2p3
ip netns add ls1p3
ip link set ls1p3 netns ls1p3
ip netns exec ls1p3 ip link set ls1p3 address 00:00:00:01:01:03
ip netns exec ls1p3 ip link set ls1p3 up
ip netns exec ls1p3 ip addr add 192.168.1.3/24 dev ls1p3
ip netns exec ls1p3 ip addr add 2001::3/64 dev ls1p3
ip netns exec ls1p3 ip route add default via 192.168.1.254 dev ls1p3
ip netns exec ls1p3 ip -6 route add default via 2001::a dev ls1p3
ip netns add ls2p3
ip link set ls2p3 netns ls2p3
ip netns exec ls2p3 ip link set ls2p3 address 00:00:00:01:02:03
ip netns exec ls2p3 ip link set ls2p3 up
ip netns exec ls2p3 ip addr add 192.168.2.3/24 dev ls2p3
ip netns exec ls2p3 ip addr add 2002::3/64 dev ls2p3
ip netns exec ls2p3 ip route add default via 192.168.2.254 dev ls2p3
ip netns exec ls2p3 ip -6 route add default via 2002::a dev ls2p3
there is error in ovs-vsctl:
[root@wsfd-advnetlab18 bz1836209]# ovs-vsctl show
deb29006-ec8e-4688-b58f-dc4ca8c8bb34
Bridge br-hv1
fail_mode: secure
datapath_type: system
Port ls2p1
Interface ls2p1
type: internal
Port ovn-hv2-0
Interface ovn-hv2-0
type: geneve
options: {csum="true", key=flow, remote_ip="1.1.202.25"}
Port ovn-hv3-0
Interface ovn-hv3-0
type: geneve
options: {csum="true", key=flow, remote_ip="1.1.202.26"}
Port br-hv1
Interface br-hv1
type: internal
Port ls1p1
Interface ls1p1
type: internal
Bridge br-hv2
fail_mode: secure
datapath_type: system
Port ovn0-hv3-0
Interface ovn0-hv3-0
type: geneve
options: {csum="true", key=flow, remote_ip="1.1.202.26"}
error: "could not add network device ovn0-hv3-0 to ofproto (File exists)"
<== both br-hv1 and br-hv2 tried to create ovn0-hv3-0 port, then it failed on one of the bridge
Port br-hv2
Interface br-hv2
type: internal
Port ls1p2
Interface ls1p2
type: internal
Port ls2p2
Interface ls2p2
type: internal
Port ovn0-hv1-0
Interface ovn0-hv1-0
type: geneve
options: {csum="true", key=flow, remote_ip="1.1.202.15"}
ovs_version: "3.1.3"
Ihar, how could we solve this problem?
Hi Jianlin, would you mind sharing the contents of your Open_vSwitch table? It should have unique ovn-chassis-idx-* options set in other_config. Here's the list of indices picked by OVN for the two controllers on the host:
```
[root@wsfd-advnetlab18 ~]# ovs-vsctl list Open_vSwitch | grep idx
other_config : {ovn-chassis-idx-hv1="", ovn-chassis-idx-hv2="0", vlan-limit="0"}
```
These indices are supposed to be used when determining the name of tunnel ports to create to connect to the other chassis (hv3). But the error (`error: "could not add network device ovn0-hv3-0 to ofproto (File exists)"`) seems to suggest that they tried to create a port with the exact same name, which failed.
I've logged to one of the nodes kindly provided by Jianlin, and I also see that OVS complains about an existing port for the other name too:
```
Bridge br-hv1
...
Bridge br-hv1
fail_mode: secure
datapath_type: system
Port ovn-hv3-0
Interface ovn-hv3-0
type: geneve
options: {csum="true", key=flow, remote_ip="1.1.202.26"}
error: "could not add network device ovn-hv3-0 to ofproto (File exists)"
```
(note the prefix is ovn- not ovn0-)
Jinlin suggested above that `both br-hv1 and br-hv2 tried to create ovn0-hv3-0 port`, though I don't think it necessarily is what happens. It may be that the same ovn-controller tries to create the same tunnel port twice, failing the second time (in which case there's no cross-talk between ovn-controllers).
I will update the BZ as I make progress in investigation.
A side note: the patch series that implemented the feature does not cover this particular scenario. The test case added there only validates that controllers co-located on the same host are able to talk to each other: https://github.com/ovn-org/ovn/commit/ab7b0eb8ca05af4bfd3b0b9730a124ad89d4ca42 In your scenario, you bring another controller running on a different node, so each of controllers has to establish a tunnel to the same peer. This is something that should be taken care of in the upstream test suite. The reason why this failed is elaborated about in `vswitchd` log file: ``` [root@wsfd-advnetlab18 ~]# zgrep WARN /var/log/openvswitch/ovs-vswitchd.log-20230802.gz 2023-08-01T13:55:37.772Z|00050|tunnel|WARN|ovn-hv3-0: attempting to add tunnel port with same config as port 'ovn0-hv3-0' (::->1.1.202.26, key=flow, legacy_l2, dp port=3) 2023-08-01T13:55:37.772Z|00051|ofproto|WARN|br-hv1: could not add port ovn-hv3-0 (File exists) 2023-08-01T13:55:37.772Z|00052|bridge|WARN|could not add network device ovn-hv3-0 to ofproto (File exists) ``` I think the reason for the failure is that both tunnel ports managed by two co-located controller instances use the same wildcard local_ip (::) to match against tunnelled packets. To make the scenario work, OVN would have to pass, in addition to `remote_ip` of the peer chassis, the `local_ip` option that would use the value from `external_ids:ovn-encap-ip`. That said, OVN already allows to request enforcement of `local_ip` setting for tunnel ports by setting the following in `Open_vSwitch` object: `external_ids:ovn-set-local-ip=true`. To test this, I executed the following: `ovs-vsctl set open . external_ids:ovn-set-local-ip=true` after which the error in `ovs-vsctl show` output vanished, and I can see in `vswitchd` log that the port is successfully created. ``` 2023-08-02T14:48:20.310Z|00063|bridge|INFO|bridge br-hv1: added interface ovn-hv3-0 on port 7 ``` While this works, I think there are a number of things that we may follow up on: 1. Update OVN documentation section covering multiple ovn-controller co-located on the same node to mention the need to set `ovn-set-local-ip=true`. 2. Add a test scenario in upstream test suite to cover the (common) case of co-located controller talking to another controller located on a different node. 3. Perhaps OVN could be smart to detect multiple co-located controllers (by inspecting the `ovn-chassis-idx-*` keys in `Open_vSwitch` records) and - once detected - enforce `local_ip` for all tunnel ports. (Perhaps this could be enforced unconditionally? What's the drawback of setting it for all tunnel ports regardless of whether multiple controllers are co-located?) FYI I'll clone the BZ to track follow-up items later. As for this RFE itself, I think it can be verified. thanks Ihar for the well explained comments, after set "ovs-vsctl set open . external_ids:ovn-set-local-ip=true", it works. (In reply to Ihar Hrachyshka from comment #27) > The reason why this failed is elaborated about in `vswitchd` log file: > > ``` > [root@wsfd-advnetlab18 ~]# zgrep WARN > /var/log/openvswitch/ovs-vswitchd.log-20230802.gz > 2023-08-01T13:55:37.772Z|00050|tunnel|WARN|ovn-hv3-0: attempting to add > tunnel port with same config as port 'ovn0-hv3-0' (::->1.1.202.26, key=flow, > legacy_l2, dp port=3) > 2023-08-01T13:55:37.772Z|00051|ofproto|WARN|br-hv1: could not add port > ovn-hv3-0 (File exists) > 2023-08-01T13:55:37.772Z|00052|bridge|WARN|could not add network device > ovn-hv3-0 to ofproto (File exists) > ``` > > I think the reason for the failure is that both tunnel ports managed by two > co-located controller instances use the same wildcard local_ip (::) to match > against tunnelled packets. To make the scenario work, OVN would have to > pass, in addition to `remote_ip` of the peer chassis, the `local_ip` option > that would use the value from `external_ids:ovn-encap-ip`. > > That said, OVN already allows to request enforcement of `local_ip` setting > for tunnel ports by setting the following in `Open_vSwitch` object: > `external_ids:ovn-set-local-ip=true`. To test this, I executed the > following: `ovs-vsctl set open . external_ids:ovn-set-local-ip=true` after > which the error in `ovs-vsctl show` output vanished, and I can see in > `vswitchd` log that the port is successfully created. > > ``` > 2023-08-02T14:48:20.310Z|00063|bridge|INFO|bridge br-hv1: added interface > ovn-hv3-0 on port 7 > ``` > > While this works, I think there are a number of things that we may follow up > on: > > 1. Update OVN documentation section covering multiple ovn-controller > co-located on the same node to mention the need to set > `ovn-set-local-ip=true`. > 2. Add a test scenario in upstream test suite to cover the (common) case of > co-located controller talking to another controller located on a different > node. > 3. Perhaps OVN could be smart to detect multiple co-located controllers (by > inspecting the `ovn-chassis-idx-*` keys in `Open_vSwitch` records) and - > once detected - enforce `local_ip` for all tunnel ports. (Perhaps this could > be enforced unconditionally? What's the drawback of setting it for all > tunnel ports regardless of whether multiple controllers are co-located?) I can't figure out what is the drawback for setting ovn-set-local-ip as true. maybe sometimes when there are several ip addresses in the system, and the route is ecmp, the src ip of the tunnel packet would be chosen through the route caculated from ecmp. the nat doesn't work well after enable this feature, add an issue to track: https://issues.redhat.com/browse/FD-3083 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn22.12 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:4677 Documenting the need for local-ip setting in upstream: https://patchwork.ozlabs.org/project/ovn/patch/20230922154655.5571-1-ihrachys@redhat.com/ The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |