Description of problem: I have deployed ovn hwoffload templates in 17.1 and spawn several vms. (overcloud) [stack@undercloud-0 ~]$ openstack server list --all-projects +--------------------------------------+------------------------------------------+--------+----------------------------------------------------+---------------------------------------+--------------------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+------------------------------------------+--------+----------------------------------------------------+---------------------------------------+--------------------+ | e8a92dc1-a911-409f-b036-f4ff8d922a54 | tempest-TestNfvOffload-server-1950151452 | ACTIVE | mellanox-vlan-provider=10.46.228.38, 30.30.220.170 | rhel-guest-image-7-6-210-x86-64-qcow2 | nfv_qe_base_flavor | | 7b8f9415-9d8f-4a37-8aa7-35badf4c3aa3 | tempest-TestNfvOffload-server-829199820 | ACTIVE | mellanox-vlan-provider=10.46.228.36, 30.30.220.171 | rhel-guest-image-7-6-210-x86-64-qcow2 | nfv_qe_base_flavor | | a9bda131-616c-47b2-8a1f-ea513cdef1bd | tempest-TestNfvOffload-server-880070818 | ACTIVE | mellanox-vlan-provider=10.46.228.41, 30.30.220.151 | rhel-guest-image-7-6-210-x86-64-qcow2 | nfv_qe_base_flavor | | fc1528f2-fdd8-48a2-834f-6d44b7434845 | tempest-TestNfvOffload-server-1738747327 | ACTIVE | mellanox-vlan-provider=10.46.228.39, 30.30.220.141 | rhel-guest-image-7-6-210-x86-64-qcow2 | nfv_qe_base_flavor | +--------------------------------------+------------------------------------------+--------+----------------------------------------------------+---------------------------------------+--------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack port list | grep tempest | 9bd78ced-2d86-4275-a77b-55ea0f19ea74 | tempest-port-smoke-141608843 | fa:16:3e:1e:cd:b7 | ip_address='30.30.220.171', subnet_id='c42a58a2-518e-4194-a1cc-bbef7c28ccc6' | ACTIVE | | ea03a1c6-6c81-472e-a7cf-ce13490937de | tempest-port-smoke-115596582 | fa:16:3e:d6:a2:4a | ip_address='30.30.220.141', subnet_id='c42a58a2-518e-4194-a1cc-bbef7c28ccc6' | ACTIVE | | f885e50b-af98-456a-8f0f-403fbb6ab910 | tempest-port-smoke-1825543414 | fa:16:3e:08:ef:1f | ip_address='30.30.220.151', subnet_id='c42a58a2-518e-4194-a1cc-bbef7c28ccc6' | ACTIVE | | ff133e14-8192-4595-8d98-dc4e921f2af0 | tempest-port-smoke-1819262289 | fa:16:3e:5b:95:b8 | ip_address='30.30.220.170', subnet_id='c42a58a2-518e-4194-a1cc-bbef7c28ccc6' | ACTIVE | (overcloud) [stack@undercloud-0 ~]$ openstack port show ff133e14-8192-4595-8d98-dc4e921f2af0 +-------------------------+-------------------------------------------------------------------------------------------------------------------+ | Field | Value | +-------------------------+-------------------------------------------------------------------------------------------------------------------+ | admin_state_up | UP | | allowed_address_pairs | | | binding_host_id | computehwoffload-r730.localdomain | | binding_profile | capabilities='['switchdev']', pci_slot='0000:04:03.3', pci_vendor_info='15b3:1018', physical_network='mx-network' | | binding_vif_details | connectivity='l2', port_filter='True' | | binding_vif_type | ovs | | binding_vnic_type | direct | | created_at | 2023-03-06T14:43:56Z | | data_plane_status | None | | description | | | device_id | e8a92dc1-a911-409f-b036-f4ff8d922a54 | | device_owner | compute:nova | | device_profile | None | | dns_assignment | None | | dns_domain | None | | dns_name | None | | extra_dhcp_opts | | | fixed_ips | ip_address='30.30.220.170', subnet_id='c42a58a2-518e-4194-a1cc-bbef7c28ccc6' | | id | ff133e14-8192-4595-8d98-dc4e921f2af0 | | ip_allocation | None | | dns_domain | None | | dns_name | None | | extra_dhcp_opts | | | fixed_ips | ip_address='30.30.220.170', subnet_id='c42a58a2-518e-4194-a1cc-bbef7c28ccc6' | | id | ff133e14-8192-4595-8d98-dc4e921f2af0 | | ip_allocation | None | | mac_address | fa:16:3e:5b:95:b8 | | name | tempest-port-smoke-1819262289 | | network_id | 314717b6-a7ee-4055-a800-a16cc44f6841 | | numa_affinity_policy | None | | port_security_enabled | False | | project_id | 64ec7e3d990a4ffd89d6b8207b5663be | | propagate_uplink_status | None | | qos_network_policy_id | None | | qos_policy_id | None | | resource_request | None | | revision_number | 4 | | security_group_ids | | | status | ACTIVE | | tags | | | trunk_details | None | | updated_at | 2023-03-06T14:44:09Z | +-------------------------+-------------------------------------------------------------------------------------------------------------------+ I have ping using floating ip and i see variability: [stack@undercloud-0 ~]$ ping 10.46.228.38 PING 10.46.228.38 (10.46.228.38) 56(84) bytes of data. 64 bytes from 10.46.228.38: icmp_seq=1 ttl=61 time=255 ms 64 bytes from 10.46.228.38: icmp_seq=2 ttl=61 time=1303 ms 64 bytes from 10.46.228.38: icmp_seq=3 ttl=61 time=273 ms 64 bytes from 10.46.228.38: icmp_seq=4 ttl=61 time=1321 ms 64 bytes from 10.46.228.38: icmp_seq=5 ttl=61 time=274 ms 64 bytes from 10.46.228.38: icmp_seq=6 ttl=61 time=1066 ms 64 bytes from 10.46.228.38: icmp_seq=7 ttl=61 time=19.1 ms 64 bytes from 10.46.228.38: icmp_seq=8 ttl=61 time=42.1 ms 64 bytes from 10.46.228.38: icmp_seq=9 ttl=61 time=321 ms 64 bytes from 10.46.228.38: icmp_seq=10 ttl=61 time=1388 ms 64 bytes from 10.46.228.38: icmp_seq=11 ttl=61 time=359 ms 64 bytes from 10.46.228.38: icmp_seq=12 ttl=61 time=986 ms 64 bytes from 10.46.228.38: icmp_seq=13 ttl=61 time=387 ms 64 bytes from 10.46.228.38: icmp_seq=14 ttl=61 time=279 ms 64 bytes from 10.46.228.38: icmp_seq=15 ttl=61 time=398 ms 64 bytes from 10.46.228.38: icmp_seq=16 ttl=61 time=1480 ms 64 bytes from 10.46.228.38: icmp_seq=17 ttl=61 time=468 ms If I ping from the compute, I also get variability: [tripleo-admin@computehwoffload-r730 ~]$ sudo ip netns exec ovnmeta-314717b6-a7ee-4055-a800-a16cc44f6841 ping 30.30.220.170 [3/437] PING 30.30.220.170 (30.30.220.170) 56(84) bytes of data. 64 bytes from 30.30.220.170: icmp_seq=1 ttl=64 time=1364 ms 64 bytes from 30.30.220.170: icmp_seq=2 ttl=64 time=347 ms 64 bytes from 30.30.220.170: icmp_seq=3 ttl=64 time=231 ms 64 bytes from 30.30.220.170: icmp_seq=4 ttl=64 time=429 ms 64 bytes from 30.30.220.170: icmp_seq=5 ttl=64 time=233 ms 64 bytes from 30.30.220.170: icmp_seq=6 ttl=64 time=343 ms 64 bytes from 30.30.220.170: icmp_seq=7 ttl=64 time=227 ms 64 bytes from 30.30.220.170: icmp_seq=8 ttl=64 time=497 ms 64 bytes from 30.30.220.170: icmp_seq=9 ttl=64 time=230 ms 64 bytes from 30.30.220.170: icmp_seq=10 ttl=64 time=339 ms 64 bytes from 30.30.220.170: icmp_seq=11 ttl=64 time=224 ms 64 bytes from 30.30.220.170: icmp_seq=12 ttl=64 time=237 ms 64 bytes from 30.30.220.170: icmp_seq=13 ttl=64 time=225 ms 64 bytes from 30.30.220.170: icmp_seq=14 ttl=64 time=335 ms 64 bytes from 30.30.220.170: icmp_seq=15 ttl=64 time=219 ms 64 bytes from 30.30.220.170: icmp_seq=16 ttl=64 time=681 ms 64 bytes from 30.30.220.170: icmp_seq=17 ttl=64 time=1334 ms 64 bytes from 30.30.220.170: icmp_seq=18 ttl=64 time=283 ms 64 bytes from 30.30.220.170: icmp_seq=19 ttl=64 time=167 ms 64 bytes from 30.30.220.170: icmp_seq=20 ttl=64 time=725 ms I have seen it with CX5, not tested with CX6 yet 04:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] 04:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] Version-Release number of selected component (if applicable): RHOS-17.1-RHEL-9-20230301.n.1 How reproducible: 1. deploy ovn hwoffload templates: ospd-17.1-geneve-ovn-hw-offload-ctlplane-dataplane-bonding-hybrid 2. spawn vms using ovs hwoffload vlan vfs (of geneve, same issue) 3. ping either from the undercloud to the fip or from the compute namespace ovnmeta-xxxx to the instance ip address 4. Check the latency variability Actual results: High Ping Latency variability Expected results: Ping Latency should be lower and more stable Additional info:
I opened this issue in a ovn scenario, but same issue in ml2-ovs 64 bytes from 10.46.228.38: icmp_seq=434 ttl=61 time=92.3 ms 64 bytes from 10.46.228.38: icmp_seq=435 ttl=61 time=97.6 ms 64 bytes from 10.46.228.38: icmp_seq=436 ttl=61 time=650 ms 64 bytes from 10.46.228.38: icmp_seq=437 ttl=61 time=650 ms 64 bytes from 10.46.228.38: icmp_seq=438 ttl=61 time=650 ms 64 bytes from 10.46.228.38: icmp_seq=439 ttl=61 time=177 ms 64 bytes from 10.46.228.38: icmp_seq=440 ttl=61 time=650 ms 64 bytes from 10.46.228.38: icmp_seq=441 ttl=61 time=649 ms 64 bytes from 10.46.228.38: icmp_seq=442 ttl=61 time=649 ms 64 bytes from 10.46.228.38: icmp_seq=443 ttl=61 time=650 ms 64 bytes from 10.46.228.38: icmp_seq=444 ttl=61 time=650 ms 64 bytes from 10.46.228.38: icmp_seq=445 ttl=61 time=90.9 ms 64 bytes from 10.46.228.38: icmp_seq=446 ttl=61 time=648 ms
ping from vm to vm in a ml2-ovs scenario [cloud-user@tempest-testnfvoffload-server-995490205 ~]$ ping 30.30.220.199 PING 30.30.220.199 (30.30.220.199) 56(84) bytes of data. 64 bytes from 30.30.220.199: icmp_seq=1 ttl=64 time=3595 ms 64 bytes from 30.30.220.199: icmp_seq=2 ttl=64 time=2595 ms 64 bytes from 30.30.220.199: icmp_seq=3 ttl=64 time=1595 ms 64 bytes from 30.30.220.199: icmp_seq=4 ttl=64 time=595 ms 64 bytes from 30.30.220.199: icmp_seq=5 ttl=64 time=2424 ms 64 bytes from 30.30.220.199: icmp_seq=6 ttl=64 time=1425 ms 64 bytes from 30.30.220.199: icmp_seq=7 ttl=64 time=425 ms 64 bytes from 30.30.220.199: icmp_seq=8 ttl=64 time=1893 ms 64 bytes from 30.30.220.199: icmp_seq=9 ttl=64 time=894 ms 64 bytes from 30.30.220.199: icmp_seq=10 ttl=64 time=824 ms 64 bytes from 30.30.220.199: icmp_seq=11 ttl=64 time=894 ms 64 bytes from 30.30.220.199: icmp_seq=12 ttl=64 time=435 ms 64 bytes from 30.30.220.199: icmp_seq=13 ttl=64 time=873 ms 64 bytes from 30.30.220.199: icmp_seq=14 ttl=64 time=1895 ms 64 bytes from 30.30.220.199: icmp_seq=15 ttl=64 time=896 ms 64 bytes from 30.30.220.199: icmp_seq=16 ttl=64 time=2896 ms 64 bytes from 30.30.220.199: icmp_seq=17 ttl=64 time=1896 ms 64 bytes from 30.30.220.199: icmp_seq=18 ttl=64 time=896 ms
The problem was caused by the guest vm which is a bit old (RHEL 7.6) When updating the vm guest to RHEL 9.2 I do not see the ping issue any more.