Bug 2125612 - [Octavia] Spam of "nf_conntrack: table full, dropping packet" messages during performance tests
Summary: [Octavia] Spam of "nf_conntrack: table full, dropping packet" messages during...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-octavia
Version: 17.0 (Wallaby)
Hardware: All
OS: All
high
high
Target Milestone: beta
: 17.1
Assignee: Gregory Thiemonge
QA Contact: Omer Schwartz
URL:
Whiteboard:
Depends On: 2122016 2123225 2123226
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-09-09 13:01 UTC by Gregory Thiemonge
Modified: 2023-08-16 01:12 UTC (History)
13 users (show)

Fixed In Version: openstack-octavia-8.0.2-1.20221208181214.b0379d6.el9ost
Doc Type: Bug Fix
Doc Text:
Before this update, users might have experienced the following warning message in the amphora log file of the Load-balancing service (octavia) when the load balancer was loaded with multiple concurrent sessions: `nf_conntrack: table full, dropping packet`. This error occurred if the amphora dropped Transport Control Protocol (TCP) flows and caused latency on user traffic. With this update, connection tracking (conntrack) is disabled for TCP flows in the Load-balancing service that uses amphora, and new TCP flows are not dropped. Conntrack is only required for User Datagram Protocol (UDP) flows.
Clone Of: 2123226
Environment:
Last Closed: 2023-08-16 01:12:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 809996 0 None MERGED Disable conntrack for TCP flows in the amphora 2023-01-26 09:00:59 UTC
OpenStack gerrit 847541 0 None MERGED Set sensible nf_conntrack_max value in amphora 2023-01-26 09:00:59 UTC
Red Hat Issue Tracker OSP-18639 0 None None None 2022-09-09 13:05:49 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:12:35 UTC

Description Gregory Thiemonge 2022-09-09 13:01:05 UTC
+++ This bug was initially created as a clone of Bug #2123226 +++

+++ This bug was initially created as a clone of Bug #2123225 +++

+++ This bug was initially created as a clone of Bug #2122016 +++

Description of problem:

One of our customers is running performance tests for his Web portal built on top of Shift on Stack environment. One of the problems we have found which has perfect correlation with client errors is spam of "nf_conntrack: table full, dropping packet" messages in amphora's console.

From the tests we can see that Octavia starts spamming this errors when "/proc/sys/net/netfilter/nf_conntrack_count" shows around 32000.

I have found two related bugs fixed in newer versions:

Bug/fix 1:
nf_conntrack: table full, dropping packet
https://bugzilla.redhat.com/show_bug.cgi?id=1869771      (fixed in RHOSP 16.2)
https://review.opendev.org/c/openstack/octavia/+/748749/ (fix)

Bug/fix 2:
https://storyboard.openstack.org/#!/story/2008979        (is not backported to RHOSP 16)
https://review.opendev.org/c/openstack/octavia/+/796608


It doesn't look like these fixes will be released for RHOSP 13, so I am wondering if there is some supported way to apply some workaround for this problem and prevent DoS situation for Amphora?

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 13.0.13 (Queens)

Comment 11 Omer Schwartz 2023-02-21 10:06:12 UTC
I ran the following verification steps on a SINGLE topology Octavia LB:

(overcloud) [stack@undercloud-0 ~]$ cat core_puddle_version
RHOS-17.1-RHEL-9-20230131.n.2

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer create --vip-subnet external_subnet --name lb1 --wait
/usr/lib/python3.9/site-packages/osc_lib/utils/__init__.py:448: DeprecationWarning: The usage of formatter functions is now discouraged. Consider using cliff.columns.FormattableColumn instead. See reviews linked with bug 1687955 for more detail.
  warnings.warn(
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| admin_state_up      | True                                 |
| availability_zone   | None                                 |
| created_at          | 2023-02-21T09:56:42                  |
| description         |                                      |
| flavor_id           | None                                 |
| id                  | 6c234d54-008d-4966-b2c4-f1bfd8a3d605 |
| listeners           |                                      |
| name                | lb1                                  |
| operating_status    | ONLINE                               |
| pools               |                                      |
| project_id          | 946cd27e13f14b7395cac4de6dc82abe     |
| provider            | amphora                              |
| provisioning_status | ACTIVE                               |
| updated_at          | 2023-02-21T09:57:51                  |
| vip_address         | 10.0.0.159                           |
| vip_network_id      | c0c8a991-388f-447c-9a9a-59d3d0a9290a |
| vip_port_id         | 4279ca68-48f5-4117-bea9-3b59458576a7 |
| vip_qos_policy_id   | None                                 |
| vip_subnet_id       | c8e98308-413b-4a36-898d-7588327f02af |
| tags                |                                      |
+---------------------+--------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer listener create --protocol HTTP --protocol-port 80 --name listener1 lb1
/usr/lib/python3.9/site-packages/osc_lib/utils/__init__.py:448: DeprecationWarning: The usage of formatter functions is now discouraged. Consider using cliff.columns.FormattableColumn instead. See reviews linked with bug 1687955 for more detail.
  warnings.warn(
+-----------------------------+--------------------------------------+
| Field                       | Value                                |
+-----------------------------+--------------------------------------+
| admin_state_up              | True                                 |
| connection_limit            | -1                                   |
| created_at                  | 2023-02-21T09:59:17                  |
| default_pool_id             | None                                 |
| default_tls_container_ref   | None                                 |
| description                 |                                      |
| id                          | eefd33d3-14a3-4477-b09b-0f15f82dc76b |
| insert_headers              | None                                 |
| l7policies                  |                                      |
| loadbalancers               | 6c234d54-008d-4966-b2c4-f1bfd8a3d605 |
| name                        | listener1                            |
| operating_status            | OFFLINE                              |
| project_id                  | 946cd27e13f14b7395cac4de6dc82abe     |
| protocol                    | HTTP                                 |
| protocol_port               | 80                                   |
| provisioning_status         | PENDING_CREATE                       |
| sni_container_refs          | []                                   |
| timeout_client_data         | 50000                                |
| timeout_member_connect      | 5000                                 |
| timeout_member_data         | 50000                                |
| timeout_tcp_inspect         | 0                                    |
| updated_at                  | None                                 |
| client_ca_tls_container_ref | None                                 |
| client_authentication       | NONE                                 |
| client_crl_container_ref    | None                                 |
| allowed_cidrs               | None                                 |
| tls_ciphers                 | None                                 |
| tls_versions                | None                                 |
| alpn_protocols              | None                                 |
| tags                        |                                      |
+-----------------------------+--------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer pool create --protocol HTTP --listener listener1 --lb-algorithm ROUND_ROBIN --name pool1
/usr/lib/python3.9/site-packages/osc_lib/utils/__init__.py:448: DeprecationWarning: The usage of formatter functions is now discouraged. Consider using cliff.columns.FormattableColumn instead. See reviews linked with bug 1687955 for more detail.
  warnings.warn(
+----------------------+--------------------------------------+
| Field                | Value                                |
+----------------------+--------------------------------------+
| admin_state_up       | True                                 |
| created_at           | 2023-02-21T09:59:22                  |
| description          |                                      |
| healthmonitor_id     |                                      |
| id                   | 65276094-29b4-4832-b53c-307296d0f8e3 |
| lb_algorithm         | ROUND_ROBIN                          |
| listeners            | eefd33d3-14a3-4477-b09b-0f15f82dc76b |
| loadbalancers        | 6c234d54-008d-4966-b2c4-f1bfd8a3d605 |
| members              |                                      |
| name                 | pool1                                |
| operating_status     | OFFLINE                              |
| project_id           | 946cd27e13f14b7395cac4de6dc82abe     |
| protocol             | HTTP                                 |
| provisioning_status  | PENDING_CREATE                       |
| session_persistence  | None                                 |
| updated_at           | None                                 |
| tls_container_ref    | None                                 |
| ca_tls_container_ref | None                                 |
| crl_container_ref    | None                                 |
| tls_enabled          | False                                |
| tls_ciphers          | None                                 |
| tls_versions         | None                                 |
| tags                 |                                      |
| alpn_protocols       | None                                 |
+----------------------+--------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ for i in {1..500}; do curl 10.0.0.159; done

In the same time of cURLing the LB, I ssh the amphora and made sure the conntrack table did not contain any entries:
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer amphora list --loadbalancer lb1
+--------------------------------------+--------------------------------------+-----------+------------+---------------+------------+
| id                                   | loadbalancer_id                      | status    | role       | lb_network_ip | ha_ip      |
+--------------------------------------+--------------------------------------+-----------+------------+---------------+------------+
| 03deb08b-6062-42fd-b623-43fdbfc3dd78 | 6c234d54-008d-4966-b2c4-f1bfd8a3d605 | ALLOCATED | STANDALONE | 172.24.0.56   | 10.0.0.159 |
+--------------------------------------+--------------------------------------+-----------+------------+---------------+------------+


[stack@undercloud-0 ~]$ eval $(ssh-agent)
Agent pid 898688
[stack@undercloud-0 ~]$ sudo -E ssh-add /etc/octavia/ssh/octavia_id_rsa
Identity added: /etc/octavia/ssh/octavia_id_rsa (root.local)
[stack@undercloud-0 ~]$ ssh -A -t tripleo-admin ssh cloud-user.0.56
Warning: Permanently added 'controller-0.ctlplane' (ED25519) to the list of known hosts.
[cloud-user@amphora-03deb08b-6062-42fd-b623-43fdbfc3dd78 ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-03deb08b-6062-42fd-b623-43fdbfc3dd78 ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-03deb08b-6062-42fd-b623-43fdbfc3dd78 ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-03deb08b-6062-42fd-b623-43fdbfc3dd78 ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-03deb08b-6062-42fd-b623-43fdbfc3dd78 ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-03deb08b-6062-42fd-b623-43fdbfc3dd78 ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-03deb08b-6062-42fd-b623-43fdbfc3dd78 ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack
[cloud-user@amphora-03deb08b-6062-42fd-b623-43fdbfc3dd78 ~]$ sudo ip netns exec amphora-haproxy cat /proc/net/nf_conntrack


Looks good to me. I am moving the BZ status to VERIFIED.

Comment 23 errata-xmlrpc 2023-08-16 01:12:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577


Note You need to log in before you can comment on or make changes to this bug.