Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2123225

Summary: [Octavia] Spam of "nf_conntrack: table full, dropping packet" messages during performance tests
Product: Red Hat OpenStack Reporter: Gregory Thiemonge <gthiemon>
Component: openstack-octaviaAssignee: Gregory Thiemonge <gthiemon>
Status: CLOSED ERRATA QA Contact: Omer Schwartz <oschwart>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: astupnik, bbonguar, cmuresan, gthiemon, jelynch, jvisser, lpeer, majopela, njohnston, oschwart, rcernin, scohen
Target Milestone: z9Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: openstack-octavia-5.0.3-1.20220906163309.8c32d2e.el8ost Doc Type: Bug Fix
Doc Text:
Before this update, Conntrack was enabled in the Amphora VM for any type of packet, but it is only required for the User Datagram Protocol (UDP) and Stream Control Transmission Protocol (SCTP). With this update, Conntrack is now disabled for Transmission Control Protocol (TCP) flows, preventing some performance issues when a user generates a lot of connections that fill the Conntrack table.
Story Points: ---
Clone Of: 2122016
: 2123226 (view as bug list) Environment:
Last Closed: 2022-12-07 20:27:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2122016    
Bug Blocks: 2123226, 2125612    

Description Gregory Thiemonge 2022-09-01 07:40:53 UTC
+++ This bug was initially created as a clone of Bug #2122016 +++

Description of problem:

One of our customers is running performance tests for his Web portal built on top of Shift on Stack environment. One of the problems we have found which has perfect correlation with client errors is spam of "nf_conntrack: table full, dropping packet" messages in amphora's console.

From the tests we can see that Octavia starts spamming this errors when "/proc/sys/net/netfilter/nf_conntrack_count" shows around 32000.

I have found two related bugs fixed in newer versions:

Bug/fix 1:
nf_conntrack: table full, dropping packet
https://bugzilla.redhat.com/show_bug.cgi?id=1869771      (fixed in RHOSP 16.2)
https://review.opendev.org/c/openstack/octavia/+/748749/ (fix)

Bug/fix 2:
https://storyboard.openstack.org/#!/story/2008979        (is not backported to RHOSP 16)
https://review.opendev.org/c/openstack/octavia/+/796608


It doesn't look like these fixes will be released for RHOSP 13, so I am wondering if there is some supported way to apply some workaround for this problem and prevent DoS situation for Amphora?

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 13.0.13 (Queens)

Comment 10 Omer Schwartz 2022-11-22 10:11:41 UTC
I ran the following verification steps on a SINGLE topology Octavia LB:

(overcloud) [stack@undercloud-0 ~]$ cat core_puddle_version 
RHOS-16.1-RHEL-8-20221116.n.1

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer create --vip-subnet external_subnet --name lb1
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| admin_state_up      | True                                 |
| created_at          | 2022-11-22T10:03:51                  |
| description         |                                      |
| flavor_id           | None                                 |
| id                  | 3318fe17-d9c7-4e82-9104-48a5279352b2 |
| listeners           |                                      |
| name                | lb1                                  |
| operating_status    | OFFLINE                              |
| pools               |                                      |
| project_id          | 5fc6b2b45c3a4dd6858ccda5db8ecc2a     |
| provider            | amphora                              |
| provisioning_status | PENDING_CREATE                       |
| updated_at          | None                                 |
| vip_address         | 10.0.0.205                           |
| vip_network_id      | 73f82788-e281-409f-85cc-15c309a7da02 |
| vip_port_id         | 5b9a2346-c375-4228-86c2-7b78776a3079 |
| vip_qos_policy_id   | None                                 |
| vip_subnet_id       | c2256ac8-48e5-4788-adc6-57e5c7935691 |
+---------------------+--------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer listener create --protocol HTTP --protocol-port 80 --name listener1 lb1
+-----------------------------+--------------------------------------+
| Field                       | Value                                |
+-----------------------------+--------------------------------------+
| admin_state_up              | True                                 |
| connection_limit            | -1                                   |
| created_at                  | 2022-11-22T10:06:22                  |
| default_pool_id             | None                                 |
| default_tls_container_ref   | None                                 |
| description                 |                                      |
| id                          | e4849055-6963-4649-ac3b-1509575afc72 |
| insert_headers              | None                                 |
| l7policies                  |                                      |
| loadbalancers               | 3318fe17-d9c7-4e82-9104-48a5279352b2 |
| name                        | listener1                            |
| operating_status            | OFFLINE                              |
| project_id                  | 5fc6b2b45c3a4dd6858ccda5db8ecc2a     |
| protocol                    | HTTP                                 |
| protocol_port               | 80                                   |
| provisioning_status         | PENDING_CREATE                       |
| sni_container_refs          | []                                   |
| timeout_client_data         | 50000                                |
| timeout_member_connect      | 5000                                 |
| timeout_member_data         | 50000                                |
| timeout_tcp_inspect         | 0                                    |
| updated_at                  | None                                 |
| client_ca_tls_container_ref | None                                 |
| client_authentication       | NONE                                 |
| client_crl_container_ref    | None                                 |
| allowed_cidrs               | None                                 |
+-----------------------------+--------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer pool create --protocol HTTP --listener listener1 --lb-algorithm ROUND_ROBIN --name pool1
+----------------------+--------------------------------------+
| Field                | Value                                |
+----------------------+--------------------------------------+
| admin_state_up       | True                                 |
| created_at           | 2022-11-22T10:06:26                  |
| description          |                                      |
| healthmonitor_id     |                                      |
| id                   | 84ac5125-2148-49ca-af34-efc67e37a86e |
| lb_algorithm         | ROUND_ROBIN                          |
| listeners            | e4849055-6963-4649-ac3b-1509575afc72 |
| loadbalancers        | 3318fe17-d9c7-4e82-9104-48a5279352b2 |
| members              |                                      |
| name                 | pool1                                |
| operating_status     | OFFLINE                              |
| project_id           | 5fc6b2b45c3a4dd6858ccda5db8ecc2a     |
| protocol             | HTTP                                 |
| provisioning_status  | PENDING_CREATE                       |
| session_persistence  | None                                 |
| updated_at           | None                                 |
| tls_container_ref    | None                                 |
| ca_tls_container_ref | None                                 |
| crl_container_ref    | None                                 |
| tls_enabled          | False                                |
+----------------------+--------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ curl 10.0.0.217




In the same time of cURLing the LB, I ssh the amphora and made sure the conntrack table did not contain any entries:
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer amphora list --loadbalancer lb1
+--------------------------------------+--------------------------------------+-----------+------------+---------------+------------+
| id                                   | loadbalancer_id                      | status    | role       | lb_network_ip | ha_ip      |
+--------------------------------------+--------------------------------------+-----------+------------+---------------+------------+
| 41c4119f-4ffc-43b1-9e0d-7a7cb0ddcd0a | 3318fe17-d9c7-4e82-9104-48a5279352b2 | ALLOCATED | STANDALONE | 172.24.2.137  | 10.0.0.205 |
+--------------------------------------+--------------------------------------+-----------+------------+---------------+------------+
(overcloud) [stack@undercloud-0 ~]$ eval $(ssh-agent)
Agent pid 636273
(overcloud) [stack@undercloud-0 ~]$ ssh-add
Identity added: /home/stack/.ssh/id_rsa (/home/stack/.ssh/id_rsa)
Identity added: /home/stack/.ssh/id_ecdsa (stack.local)
(overcloud) [stack@undercloud-0 ~]$ ssh -A controller-0.ctlplane
Warning: Permanently added 'controller-0.ctlplane,192.168.24.47' (ECDSA) to the list of known hosts.
This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register

Last login: Tue Nov 22 09:59:00 2022 from 192.168.24.1
[heat-admin@controller-0 ~]$ ssh cloud-user.2.137
The authenticity of host '172.24.2.137 (172.24.2.137)' can't be established.
ECDSA key fingerprint is SHA256:UQJnBheqp4I9MLdaA+r0B6DLRwSpDSODjXxZfw6onO4.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
[cloud-user@amphora-41c4119f-4ffc-43b1-9e0d-7a7cb0ddcd0a ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-41c4119f-4ffc-43b1-9e0d-7a7cb0ddcd0a ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-41c4119f-4ffc-43b1-9e0d-7a7cb0ddcd0a ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-41c4119f-4ffc-43b1-9e0d-7a7cb0ddcd0a ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-41c4119f-4ffc-43b1-9e0d-7a7cb0ddcd0a ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-41c4119f-4ffc-43b1-9e0d-7a7cb0ddcd0a ~]$


Looks good to me. I am moving the BZ status to VERIFIED.

Comment 16 errata-xmlrpc 2022-12-07 20:27:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8795