Bug 1961162 - ARP request flooding for 172.24.0.1 gateway
Summary: ARP request flooding for 172.24.0.1 gateway
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z9
: 16.1 (Train on RHEL 8.2)
Assignee: Gregory Thiemonge
QA Contact: Omer Schwartz
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-17 11:50 UTC by anil venkata
Modified: 2022-12-07 20:25 UTC (History)
12 users (show)

Fixed In Version: tripleo-ansible-0.5.1-1.20220906163309.902c3c8.el8ost
Doc Type: Bug Fix
Doc Text:
Before this update, a nonexistent gateway address was configured on the load-balancing management network. This caused excessive Address Resolution Protocol (ARP) requests on the load-balancing management network.
Clone Of:
Environment:
Last Closed: 2022-12-07 20:24:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 824688 0 None MERGED Disable gateway for Octavia management subnet 2022-02-22 09:43:47 UTC
OpenStack gerrit 830332 0 None MERGED Disable gateway for Octavia management subnet 2022-03-22 08:41:39 UTC
OpenStack gerrit 834633 0 None MERGED Disable Octavia management gateway on update 2022-04-20 07:41:03 UTC
OpenStack gerrit 838649 0 None MERGED Disable Octavia management gateway on update 2022-06-22 08:32:14 UTC
Red Hat Issue Tracker OSP-3923 0 None None None 2021-12-13 13:23:27 UTC
Red Hat Product Errata RHBA-2022:8795 0 None None None 2022-12-07 20:25:09 UTC

Description anil venkata 2021-05-17 11:50:16 UTC
Description of problem:
While Octavia scale testing, we observed all the VMs continuously sending ARP requests for 172.24.0.1 ip address as the VMs configured with this IP as the default gateway. As all the VMs are in the same broadcast domain, all of them also receive these ARP request packets.

As Octavia is not using the default gateway, it shouldn't create the network with default gateway option.

[cloud-user@amphora-f7749858-5649-466e-9cac-876210d61e7a ~]$ ip r
default via 172.24.0.1 dev eth0 proto dhcp metric 100 
169.254.169.254 via 172.24.0.2 dev eth0 proto dhcp metric 100 
172.24.0.0/16 dev eth0 proto kernel scope link src 172.24.19.165 metric 100 

(overcloud) [stack@undercloud ~]$ neutron subnet-show 764269e8-bf7f-46e0-8ce2-188b0790b6cb
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+-------------------+--------------------------------------------------+
| Field             | Value                                            |
+-------------------+--------------------------------------------------+
| allocation_pools  | {"start": "172.24.0.2", "end": "172.24.255.254"} |
| cidr              | 172.24.0.0/16                                    |
| created_at        | 2021-05-05T20:08:26Z                             |
| description       |                                                  |
| dns_nameservers   |                                                  |
| enable_dhcp       | True                                             |
| gateway_ip        | 172.24.0.1                                       |
| host_routes       |                                                  |
| id                | 764269e8-bf7f-46e0-8ce2-188b0790b6cb             |
| ip_version        | 4                                                |
| ipv6_address_mode |                                                  |
| ipv6_ra_mode      |                                                  |
| name              | lb-mgmt-subnet                                   |
| network_id        | 8ce010d1-fc76-4008-a5b0-5294ce1b9415             |
| project_id        | d9dc980fa43f4c64998a7889cf458d8f                 |
| revision_number   | 0                                                |
| segment_id        |                                                  |
| service_types     |                                                  |
| subnetpool_id     |                                                  |
| tags              |                                                  |
| tenant_id         | d9dc980fa43f4c64998a7889cf458d8f                 |
| updated_at        | 2021-05-05T20:08:26Z                             |
+-------------------+--------------------------------------------------+

Ovs-vswitchd CPU usage on compute node has drastically reduced to 5% from 40%-90% after creating a VM on lb-mgmt-net with 172.24.0.1 address to temporarily fix this ARP issue (disabling gateway with “neutron subnet-update” is not helping as well).

Comment 1 Michael Johnson 2021-05-17 16:30:34 UTC
lol, well, this is kind of funny. It's probably a bug in OVS that the CPU load goes up for handling normal ARP traffic (which is super small and easy to process). I wouldn't expect 5,600 VMs to cause that much trouble in OVS.

On the Octavia side, we don't touch or create these ARPs. They are all handled directly by the kernel and the network stack of RHEL.

It is tripleo that is creating the subnet with the default gateway set. If the OSP role being used does not require routing for the lb-mgmt-net, it should not be configuring a gateway on that subnet, especially without a router listening on it.

The amphora automatically pick that up from the neutron subnet configuration at nova boot time.

I would agree, this is a tripleo bug for the role(s).

Comment 6 Gregory Thiemonge 2022-02-22 09:23:30 UTC
Backport proposed to stable/train

Comment 11 Omer Schwartz 2022-11-14 12:24:29 UTC
After deploying the latest passed_phase2 compose:

(overcloud) [stack@undercloud-0 ~]$ cat core_puddle_version
RHOS-16.1-RHEL-8-20221108.n.1


The lb-mgmt-subnet does not have a gateway_ip:
(overcloud) [stack@undercloud-0 ~]$ openstack subnet show -c gateway_ip lb-mgmt-subnet
+------------+-------+
| Field      | Value |
+------------+-------+
| gateway_ip | None  |
+------------+-------+


No ARP requests were sent to 172.24.0.1 when I ran
[tripleo-admin@controller-0 ~]$ sudo tcpdump -nn -i o-hm0 arp

while creating an LB.


I am moving the status of this BZ to VERIFIED.

Comment 15 Michael Johnson 2022-12-05 16:54:20 UTC
Updating the doctext to more accurately reflect the issue resolved.

Comment 20 errata-xmlrpc 2022-12-07 20:24:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8795


Note You need to log in before you can comment on or make changes to this bug.