Description of problem: While Octavia scale testing, we observed all the VMs continuously sending ARP requests for 172.24.0.1 ip address as the VMs configured with this IP as the default gateway. As all the VMs are in the same broadcast domain, all of them also receive these ARP request packets. As Octavia is not using the default gateway, it shouldn't create the network with default gateway option. [cloud-user@amphora-f7749858-5649-466e-9cac-876210d61e7a ~]$ ip r default via 172.24.0.1 dev eth0 proto dhcp metric 100 169.254.169.254 via 172.24.0.2 dev eth0 proto dhcp metric 100 172.24.0.0/16 dev eth0 proto kernel scope link src 172.24.19.165 metric 100 (overcloud) [stack@undercloud ~]$ neutron subnet-show 764269e8-bf7f-46e0-8ce2-188b0790b6cb neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead. +-------------------+--------------------------------------------------+ | Field | Value | +-------------------+--------------------------------------------------+ | allocation_pools | {"start": "172.24.0.2", "end": "172.24.255.254"} | | cidr | 172.24.0.0/16 | | created_at | 2021-05-05T20:08:26Z | | description | | | dns_nameservers | | | enable_dhcp | True | | gateway_ip | 172.24.0.1 | | host_routes | | | id | 764269e8-bf7f-46e0-8ce2-188b0790b6cb | | ip_version | 4 | | ipv6_address_mode | | | ipv6_ra_mode | | | name | lb-mgmt-subnet | | network_id | 8ce010d1-fc76-4008-a5b0-5294ce1b9415 | | project_id | d9dc980fa43f4c64998a7889cf458d8f | | revision_number | 0 | | segment_id | | | service_types | | | subnetpool_id | | | tags | | | tenant_id | d9dc980fa43f4c64998a7889cf458d8f | | updated_at | 2021-05-05T20:08:26Z | +-------------------+--------------------------------------------------+ Ovs-vswitchd CPU usage on compute node has drastically reduced to 5% from 40%-90% after creating a VM on lb-mgmt-net with 172.24.0.1 address to temporarily fix this ARP issue (disabling gateway with “neutron subnet-update” is not helping as well).
lol, well, this is kind of funny. It's probably a bug in OVS that the CPU load goes up for handling normal ARP traffic (which is super small and easy to process). I wouldn't expect 5,600 VMs to cause that much trouble in OVS. On the Octavia side, we don't touch or create these ARPs. They are all handled directly by the kernel and the network stack of RHEL. It is tripleo that is creating the subnet with the default gateway set. If the OSP role being used does not require routing for the lb-mgmt-net, it should not be configuring a gateway on that subnet, especially without a router listening on it. The amphora automatically pick that up from the neutron subnet configuration at nova boot time. I would agree, this is a tripleo bug for the role(s).
Pointers: - https://github.com/openstack/tripleo-heat-templates/blob/fe2373225f039d795970b70fe9b2f28e0e7cd6a4/deployment/octavia/octavia-deployment-config.j2.yaml#L115-L118 - https://github.com/openstack/tripleo-ansible/blob/8ef33773a2d3eaca062bb4629bf2077b6eb1349b/tripleo_ansible/roles/octavia_overcloud_config/tasks/network.yml#L26
Backport proposed to stable/train
After deploying the latest passed_phase2 compose: (overcloud) [stack@undercloud-0 ~]$ cat core_puddle_version RHOS-16.1-RHEL-8-20221108.n.1 The lb-mgmt-subnet does not have a gateway_ip: (overcloud) [stack@undercloud-0 ~]$ openstack subnet show -c gateway_ip lb-mgmt-subnet +------------+-------+ | Field | Value | +------------+-------+ | gateway_ip | None | +------------+-------+ No ARP requests were sent to 172.24.0.1 when I ran [tripleo-admin@controller-0 ~]$ sudo tcpdump -nn -i o-hm0 arp while creating an LB. I am moving the status of this BZ to VERIFIED.
Updating the doctext to more accurately reflect the issue resolved.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8795