Bug 2123226 - [Octavia] Spam of "nf_conntrack: table full, dropping packet" messages during performance tests
Summary: [Octavia] Spam of "nf_conntrack: table full, dropping packet" messages during...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-octavia
Version: 16.2 (Train)
Hardware: All
OS: All
high
high
Target Milestone: z4
: 16.2 (Train on RHEL 8.4)
Assignee: Gregory Thiemonge
QA Contact: Omer Schwartz
URL:
Whiteboard:
Depends On: 2122016 2123225
Blocks: 2125612
TreeView+ depends on / blocked
 
Reported: 2022-09-01 07:42 UTC by Gregory Thiemonge
Modified: 2023-09-19 04:25 UTC (History)
17 users (show)

Fixed In Version: openstack-octavia-5.1.3-2.20220906154809.58e2e13.el8ost
Doc Type: Bug Fix
Doc Text:
Before this update, VM instances (amphorae) for the Red Hat OpenStack Platform (RHOSP) Load-balancing service (octavia) could experience performance issues when a lot of connections filled the network connection tracking (conntrack) table. The cause for this was that conntrack was enabled for all packet types, including TCP, which does not require conntrack. In RHOSP 16.2.4, amphora performance has improved, because conntrack is disabled for TCP packets and is only enabled for UDP and SCTP packets.
Clone Of: 2123225
: 2125612 (view as bug list)
Environment:
Last Closed: 2022-12-07 19:24:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 854928 0 None MERGED Disable conntrack for TCP flows in the amphora 2022-09-01 07:51:20 UTC
Red Hat Issue Tracker OSP-18497 0 None None None 2022-09-01 08:22:04 UTC
Red Hat Product Errata RHBA-2022:8794 0 None None None 2022-12-07 19:24:50 UTC

Description Gregory Thiemonge 2022-09-01 07:42:29 UTC
+++ This bug was initially created as a clone of Bug #2123225 +++

+++ This bug was initially created as a clone of Bug #2122016 +++

Description of problem:

One of our customers is running performance tests for his Web portal built on top of Shift on Stack environment. One of the problems we have found which has perfect correlation with client errors is spam of "nf_conntrack: table full, dropping packet" messages in amphora's console.

From the tests we can see that Octavia starts spamming this errors when "/proc/sys/net/netfilter/nf_conntrack_count" shows around 32000.

I have found two related bugs fixed in newer versions:

Bug/fix 1:
nf_conntrack: table full, dropping packet
https://bugzilla.redhat.com/show_bug.cgi?id=1869771      (fixed in RHOSP 16.2)
https://review.opendev.org/c/openstack/octavia/+/748749/ (fix)

Bug/fix 2:
https://storyboard.openstack.org/#!/story/2008979        (is not backported to RHOSP 16)
https://review.opendev.org/c/openstack/octavia/+/796608


It doesn't look like these fixes will be released for RHOSP 13, so I am wondering if there is some supported way to apply some workaround for this problem and prevent DoS situation for Amphora?

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 13.0.13 (Queens)

Comment 13 Omer Schwartz 2022-11-22 10:32:09 UTC
I ran the following verification steps on a SINGLE topology Octavia LB:

(overcloud) [stack@undercloud-0 ~]$ cat core_puddle_version
RHOS-16.2-RHEL-8-20221104.n.0%


(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer create --vip-subnet external_subnet --name lb1
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| admin_state_up      | True                                 |
| created_at          | 2022-11-22T10:23:43                  |
| description         |                                      |
| flavor_id           | None                                 |
| id                  | c19b2b7f-8479-4111-ad17-8d7a967f96ec |
| listeners           |                                      |
| name                | lb1                                  |
| operating_status    | OFFLINE                              |
| pools               |                                      |
| project_id          | 55791abb3f5a43a2ad29f7ea68eca414     |
| provider            | amphora                              |
| provisioning_status | PENDING_CREATE                       |
| updated_at          | None                                 |
| vip_address         | 10.0.0.202                           |
| vip_network_id      | 74a35f12-fd6d-4daf-9582-9afb72ff3618 |
| vip_port_id         | 9c237af6-c825-44fb-ac1f-a17ca2d213ae |
| vip_qos_policy_id   | None                                 |
| vip_subnet_id       | 09e042c3-9c4b-492e-a47b-34ac6dd96a82 |
+---------------------+--------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer listener create --protocol HTTP --protocol-port 80 --name listener1 lb1
+-----------------------------+--------------------------------------+
| Field                       | Value                                |
+-----------------------------+--------------------------------------+
| admin_state_up              | True                                 |
| connection_limit            | -1                                   |
| created_at                  | 2022-11-22T10:26:43                  |
| default_pool_id             | None                                 |
| default_tls_container_ref   | None                                 |
| description                 |                                      |
| id                          | 70833ca7-91c7-42b7-a4d4-2a4a4643c94d |
| insert_headers              | None                                 |
| l7policies                  |                                      |
| loadbalancers               | c19b2b7f-8479-4111-ad17-8d7a967f96ec |
| name                        | listener1                            |
| operating_status            | OFFLINE                              |
| project_id                  | 55791abb3f5a43a2ad29f7ea68eca414     |
| protocol                    | HTTP                                 |
| protocol_port               | 80                                   |
| provisioning_status         | PENDING_CREATE                       |
| sni_container_refs          | []                                   |
| timeout_client_data         | 50000                                |
| timeout_member_connect      | 5000                                 |
| timeout_member_data         | 50000                                |
| timeout_tcp_inspect         | 0                                    |
| updated_at                  | None                                 |
| client_ca_tls_container_ref | None                                 |
| client_authentication       | NONE                                 |
| client_crl_container_ref    | None                                 |
| allowed_cidrs               | None                                 |
+-----------------------------+--------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer pool create --protocol HTTP --listener listener1 --lb-algorithm ROUND_ROBIN --name pool1
+----------------------+--------------------------------------+
| Field                | Value                                |
+----------------------+--------------------------------------+
| admin_state_up       | True                                 |
| created_at           | 2022-11-22T10:26:48                  |
| description          |                                      |
| healthmonitor_id     |                                      |
| id                   | 46e7db8f-1733-4035-abe9-a95d83edda25 |
| lb_algorithm         | ROUND_ROBIN                          |
| listeners            | 70833ca7-91c7-42b7-a4d4-2a4a4643c94d |
| loadbalancers        | c19b2b7f-8479-4111-ad17-8d7a967f96ec |
| members              |                                      |
| name                 | pool1                                |
| operating_status     | OFFLINE                              |
| project_id           | 55791abb3f5a43a2ad29f7ea68eca414     |
| protocol             | HTTP                                 |
| provisioning_status  | PENDING_CREATE                       |
| session_persistence  | None                                 |
| updated_at           | None                                 |
| tls_container_ref    | None                                 |
| ca_tls_container_ref | None                                 |
| crl_container_ref    | None                                 |
| tls_enabled          | False                                |
+----------------------+--------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ curl 10.0.0.202




In the same time of cURLing the LB, I ssh the amphora and made sure the conntrack table did not contain any entries:
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer amphora list --loadbalancer lb1
+--------------------------------------+--------------------------------------+-----------+------------+---------------+------------+
| id                                   | loadbalancer_id                      | status    | role       | lb_network_ip | ha_ip      |
+--------------------------------------+--------------------------------------+-----------+------------+---------------+------------+
| 40b2a023-67d5-4a50-be46-46775e641bd8 | c19b2b7f-8479-4111-ad17-8d7a967f96ec | ALLOCATED | STANDALONE | 172.24.0.173  | 10.0.0.202 |
+--------------------------------------+--------------------------------------+-----------+------------+---------------+------------+
(overcloud) [stack@undercloud-0 ~]$ eval $(ssh-agent)
Agent pid 264617
(overcloud) [stack@undercloud-0 ~]$ ssh-add
Identity added: /home/stack/.ssh/id_rsa (/home/stack/.ssh/id_rsa)
Identity added: /home/stack/.ssh/id_ecdsa (stack.local)
(overcloud) [stack@undercloud-0 ~]$ ssh -A controller-0.ctlplane
Warning: Permanently added 'controller-0.ctlplane,192.168.24.29' (ECDSA) to the list of known hosts.
Last login: Tue Nov 22 10:20:03 2022 from 192.168.24.1
[heat-admin@controller-0 ~]$ ssh cloud-user.0.173
The authenticity of host '172.24.0.173 (172.24.0.173)' can't be established.
ECDSA key fingerprint is SHA256:wn+ML5k2TQCCVUzfg2M6AQGzc2jqDDi+wh0nu9D90Ho.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '172.24.0.173' (ECDSA) to the list of known hosts.
[cloud-user@amphora-40b2a023-67d5-4a50-be46-46775e641bd8 ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-40b2a023-67d5-4a50-be46-46775e641bd8 ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-40b2a023-67d5-4a50-be46-46775e641bd8 ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-40b2a023-67d5-4a50-be46-46775e641bd8 ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-40b2a023-67d5-4a50-be46-46775e641bd8 ~]$



Looks good to me. I am moving the BZ status to VERIFIED.

Comment 23 errata-xmlrpc 2022-12-07 19:24:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8794

Comment 27 Red Hat Bugzilla 2023-09-19 04:25:38 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.