Bug 2294876
Summary: | "over 4096 resubmit actions" error occurs when there are 250 neutron routers on a provider network | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | yatanaka |
Component: | openstack-neutron | Assignee: | Lucas Alvares Gomes <lmartins> |
Status: | CLOSED ERRATA | QA Contact: | Bharath M V <bmv> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 17.1 (Wallaby) | CC: | apverma, bcafarel, bmv, chrisw, gkadam, ihrachys, i.maximets, jamsmith, lmartins, mariel, mflusche, scohen |
Target Milestone: | z4 | Keywords: | Triaged |
Target Release: | 17.1 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | openstack-neutron-18.6.1-17.1.20240918120815.85ff760.el9ost | Doc Type: | Release Note |
Doc Text: |
This update adds a configuration option called `broadcast_arps_to_all_routers` to the "[ovn]" config section.
+
This option configures the external networks with the `broadcast-arps-to-all-routers` config option that became available in OVN 23.06. This option is enabled by default. It causes OVN to flood ARP requests to all attached ports on a network.
+
----
[ovn]
broadcast_arps_to_all_routers=true
----
+
If you disable `broadcast_arps_to_all_routers`, ARP requests are only sent to routers on a network if the target MAC address matches. ARP requests that do not match a router are only forwarded to non-router ports.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2024-11-21 09:41:35 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
yatanaka
2024-07-01 08:12:15 UTC
I wonder if OSP can set LS.other_config:broadcast-arps-to-all-routers to 'false' on the logical switches attached to these routers. The feature is available since OVN v23.06, so it should be available in latest RHOSP 17.1. Commit that adds support to OVN: https://github.com/ovn-org/ovn/commit/37d308a2074515834692d442475a8e05310a152d (In reply to Ilya Maximets from comment #2) > I wonder if OSP can set LS.other_config:broadcast-arps-to-all-routers to > 'false' > on the logical switches attached to these routers. The feature is available > since OVN v23.06, so it should be available in latest RHOSP 17.1. > > Commit that adds support to OVN: > > https://github.com/ovn-org/ovn/commit/ > 37d308a2074515834692d442475a8e05310a152d Thanks Ilya, I just checked the Neutron code and we do not set that option yet but, this could be easily added. Do you think this would solve this limitation with the resubmits ? Also, should this option be set to 'false' by default or is there cases where we want it to still be 'true'. So that we would need to make it configurable in OSP ? (In reply to Lucas Alvares Gomes from comment #3) > (In reply to Ilya Maximets from comment #2) > > I wonder if OSP can set LS.other_config:broadcast-arps-to-all-routers to > > 'false' > > on the logical switches attached to these routers. The feature is available > > since OVN v23.06, so it should be available in latest RHOSP 17.1. > > > > Commit that adds support to OVN: > > > > https://github.com/ovn-org/ovn/commit/ > > 37d308a2074515834692d442475a8e05310a152d > > Thanks Ilya, I just checked the Neutron code and we do not set that option > yet but, this could be easily added. > > Do you think this would solve this limitation with the resubmits ? It should, because we'll no longer resubmit ARP requests to all the routers. > Also, should this option be set to 'false' by default or is there cases > where we want it to still be 'true'. So that we would need to make it > configurable in OSP ? We're discussing this within OVN team. The side effect will be that routers will stop learning from GARPs. So, I'm not sure if you can turn this flag on all the routers unconditionally, if you have a use case for learning. (In reply to Ilya Maximets from comment #4) > (In reply to Lucas Alvares Gomes from comment #3) > > (In reply to Ilya Maximets from comment #2) > > > I wonder if OSP can set LS.other_config:broadcast-arps-to-all-routers to > > > 'false' > > > on the logical switches attached to these routers. The feature is available > > > since OVN v23.06, so it should be available in latest RHOSP 17.1. > > > > > > Commit that adds support to OVN: > > > > > > https://github.com/ovn-org/ovn/commit/ > > > 37d308a2074515834692d442475a8e05310a152d > > > > Thanks Ilya, I just checked the Neutron code and we do not set that option > > yet but, this could be easily added. > > > > Do you think this would solve this limitation with the resubmits ? > > It should, because we'll no longer resubmit ARP requests to all the routers. > > > Also, should this option be set to 'false' by default or is there cases > > where we want it to still be 'true'. So that we would need to make it > > configurable in OSP ? > > We're discussing this within OVN team. The side effect will be that routers > will stop learning from GARPs. So, I'm not sure if you can turn this flag > on all the routers unconditionally, if you have a use case for learning. I see, yeah definitely that would require more discussion. Perhaps making it configurable in OSP (keeping true as default) would be a way forward for OSP. In the meantime, as a workaround for the issue and also to test this option to see if it works as intended. @Reporter, could you please set it to 'false' in the OVSDB and let us know if it works ? I believe the command would be: $ ovn-nbctl set Logical_Switch neutron-<Neutron Network UUID> other_config:broadcast-arps-to-all-routers=false Cheers, Lucas Forwarding the question asked by @Ihar on slack here: Are the routers attached to networks with lots of ports with disabled port security ? If so, this looks a lot like what has been discussed at this OVN ML thread [0] where if when we have many ports with the "unknown" address (port security off) arps will be broadcasted up to the point where it will hit this limitation in OVN. So, it would be nice if we had a better understanding of what the topology looks like and how these ports are created in the customer environment. [0] https://mail.openvswitch.org/pipermail/ovs-discuss/2020-September/050716.html @Matt, thank you for reproducing and checking the workaround!!! Great work. Next steps for this BZ would be: 1. write a KCS for the workaround; 2. document the (approximate?) limit for the number of routers attached to an external network, similar to what we do for port-security=off ports here: https://docs.redhat.com/en/documentation/red_hat_openstack_platform/16.1/html-single/networking_guide/index#con_limit-nonsecure-port-ovn_work-ovn 3. Expose the OVN network level configuration for broadcast-arps-to-all-routers="false" in neutron (probably as a config option for ml2/ovn) 4. (Not sure if needed) Tripleo config option to set the option value (maybe this can be done with a config snippet? whatever it is, update the KCS from 1. accordingly once implemented) (In RHOSO 18, we can already provide a custom snippet to NeutronApi CR.) There's probably some BZ cloning to do here since we'll need to patch different components (docs, neutron, maybe tripleo). Forgot to mention in the last comment: there's also a path to improve this and get rid of the router limit by changing the way learning is done on OVN side (by learning once on switch side instead of in each router pipeline.) This was discussed before in upstream, e.g. here: https://mail.openvswitch.org/pipermail/ovs-dev/2023-March/402539.html I think it's worth requesting this improvement from FDP team. (This will go into Jira, since they use it to track their work.) This would of course require some more time to get implemented, but AFAIU the team is open to consider this change, even if it's invasive and probably won't be backported to older OVN releases. Thanks @Matt for testing the workaround and confirming that it does mitigate the issue. Hi Ihar Hrachyshka, thank you for your help > 1. write a KCS for the workaround; I've just wrote a KCS for this issue: https://access.redhat.com/solutions/7077367 > 2. document the (approximate?) limit for the number of routers attached to an external network In my RHOSP 17.1.2 lab, the limit was 237. If the number of routers was grater than 237 the issue is occurred. However, the limit may vary depending on environments. I'd prefer to state an approximate limit, like 230 or 200. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHOSP 17.1.4 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:9974 |