Description of problem: A tenant can cause network issues for other tenants: nf_conntrack: table full, dropping packet. In our cloud had a jmeter performance test running on two instances caused network issues for other tenants. In the /var/log/messages on the compute node we see the following message: "nf_conntrack: table full, dropping packet." This gerrit https://review.openstack.org/#/c/275769/ increases the limit to 500.000 but this is a workaround as a tenant can still increase usage up to this new limit. It's possible to limit bandwidth ( https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/paged/networking-guide/chapter-10-configure-quality-of-service-qos ) on a port, but you cannot limit the conntrack sessions for an instance, port or tenant.
Assaf, can w have someone from the team look into this issue? If it makes sense, it seems like a straightforward fix on TripleO side. Thanks, Nir
I have looked into the issue, and the only option, as @jlibosva said, is to have the kernel create separate hash tables (or at least counts [2]) per conntrack zone, checking [1] we can see that kernel creates an individual big table for the whole system. In more recent kernels, the max count is still global [3] , and the hash table too [4] [1] https://access.redhat.com/labs/psb/versions/kernel-3.10.0-693.11.1.el7/net/netfilter/nf_conntrack_core.c#line481 [2] https://access.redhat.com/labs/psb/versions/kernel-3.10.0-693.11.1.el7/net/netfilter/nf_conntrack_core.c#line868 [3] https://elixir.free-electrons.com/linux/v4.15-rc6/source/net/netfilter/nf_conntrack_core.c#L1109 [4] https://elixir.free-electrons.com/linux/v4.15-rc6/source/net/netfilter/nf_conntrack_core.c#L74
I have registered a bug over kernel/netfilter on rhel8: https://bugzilla.redhat.com/show_bug.cgi?id=1531074
Hello Miguel, Thanks for the information. From the customer point of view important question is, whether the request to implement this (creation of separate hash tables) is feasible, what are the requirements, when this can be implemented. Is there any info we are able to pass to the customer with regard to this? I see there was new BZ created, with target to RHEL8, so I guess it will take time. Do you thing that it can be backported to RHEL7 and related OSP environments? Thanks, Petr
(In reply to Petr Barta from comment #12) > Hello Miguel, > Thanks for the information. > > From the customer point of view important question is, whether the request > to implement this (creation of separate hash tables) is feasible, what are > the requirements, when this can be implemented. > > Is there any info we are able to pass to the customer with regard to this? > I see there was new BZ created, with target to RHEL8, so I guess it will > take time. Do you thing that it can be backported to RHEL7 and related OSP > environments? > > Thanks, > Petr Hey Petr, we need to ask on the RHEL bug I opened over the netfilter component. I know how it could be done, but I don't have the expertise about upstream development in kernel and backports. Let's ask the experts in that area.
Petr, we already had a possitive answer from the kernel developers, please have an eye on https://bugzilla.redhat.com/show_bug.cgi?id=1531074#c1 and ask them about timelines please :)
Hello Miguel, ok, thanks for the info, will monitor the kernel bz and will ask there. BR, Petr
*** Bug 1558462 has been marked as a duplicate of this bug. ***
This happened back again in OSP10 .
Hi Assaf, Could you provide any timeline to the customer in regard to this feature request ? Now the kernel provides a way to limit connections per ct zone. If there is any blueprint on this that can be shares will be appreciated too. Thanks and Best Regards, Mauro S. Oddi
This Release is retired. If this bug is still relevant, please reopen and retarget to an open release.
It's a RFE.
I believe OVN doesn't support zone-limits set per port, yet, and so additional work is due in OVN before we can implement it in neutron. A counterpart bug should be created in addition to this one against ovn component to track the RFE there.
Thanks for your help and advice. I have reported bug #2189924 for OVN and would appreciate a second look from anyone involved. I think that this bug's metadata should be updated: there is a different blocker now and focus here should be switched to OVN.