Description of problem: Hello, This test is failing: neutron_tempest_plugin.scenario.test_mac_learning.MacLearningTest.test_mac_learning_vms_on_same_network https://code.engineering.redhat.com/gerrit/gitweb?p=python-neutron-tests-tempest.git;a=blob;f=neutron_tempest_plugin/scenario/test_mac_learning.py;h=6cd894fb1320c738d0d954427606124bdbd36aff;hb=refs/heads/rhos-17.0-trunk-patches#l151 on jenkins job: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-17.0_director-rhel-virthost-3cont_2comp_1ipa-ipv4-vlan-tls-dvr/ with this stacktrace: During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/neutron_tempest_plugin/scenario/test_mac_learning.py", line 173, in test_mac_learning_vms_on_same_network self._prepare_listener(non_receiver, 2) File "/usr/lib/python3.9/site-packages/neutron_tempest_plugin/scenario/test_mac_learning.py", line 145, in _prepare_listener self._check_cmd_installed_on_server(server['ssh_client'], server, File "/usr/lib/python3.9/site-packages/neutron_tempest_plugin/scenario/test_mac_learning.py", line 123, in _check_cmd_installed_on_server ssh_client.execute_script('which %s' % cmd) File "/usr/lib/python3.9/site-packages/neutron_tempest_plugin/common/ssh.py", line 225, in execute_script channel = self.open_session() File "/usr/lib/python3.9/site-packages/neutron_tempest_plugin/common/ssh.py", line 150, in open_session client = self.connect() File "/usr/lib/python3.9/site-packages/neutron_tempest_plugin/common/ssh.py", line 138, in connect return super(Client, self)._get_ssh_connection(*args, **kwargs) File "/usr/lib/python3.9/site-packages/tempest/lib/common/ssh.py", line 150, in _get_ssh_connection raise exceptions.SSHTimeout(host=self.host, tempest.lib.exceptions.SSHTimeout: Connection to the 10.0.0.212 via SSH timed out. User: cloud-user, Password: None setup: ovn+vlan+dvr, core_puddle: RHOS-17.0-RHEL-9-20220825.n.1 this test creates: - network with port_security disabled - subnet with dhcp disabled - 3 VMs with config-drive=True I tried to reproduce on the host, no always fail but the failure statistic is 4/5 or 1/5 times. And the test is failing because FIP is unreachable so the test stopped at this point. 3 VM's are created and each one has a FIP, also: - Some FIP are not reachable from undercloud or FIP's. - The conectivity between VM's over tenant network is OK(pings are replied). - The traffic from undercloud reach to compute (with IP dest = FIP), but the traffic cannot reach to the tap of the VM. Version-Release number of selected component (if applicable): RHOS-17.0-RHEL-9-20220825.n.1 How reproducible: Run the test: neutron_tempest_plugin.scenario.test_mac_learning.MacLearningTest.test_mac_learning_vms_on_same_network or just created: - network with port_security disabled - subnet with dhcp disabled - 3 VMs with config-drive=True + FIP - ping between undercloud to FIP or FIP's Actual results: some FIP's are unreachable Expected results: Additional info: For example: (overcloud) [stack@undercloud-0 ~]$ openstack server list --all +--------------------------------------+---------------------------------------+--------+---------------------------------------------------------+-------+-------------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+---------------------------------------+--------+---------------------------------------------------------+-------+-------------+ | a528851d-db25-4c25-a8a2-91b266ba13cc | tempest-maclearning-server-1449286456 | ACTIVE | tempest-test-network--221781334=10.0.0.250, 10.100.0.11 | rhel | guest_image | | 42c0dcc8-6cf8-4b2d-b144-1d4f90a54be2 | tempest-maclearning-server-235549508 | ACTIVE | tempest-test-network--221781334=10.0.0.234, 10.100.0.43 | rhel | guest_image | | e6fbfcb0-0fd2-40d0-8f31-6eeef0bb3ccf | tempest-maclearning-server-524944209 | ACTIVE | tempest-test-network--221781334=10.0.0.170, 10.100.0.8 | rhel | guest_image | +--------------------------------------+---------------------------------------+--------+---------------------------------------------------------+-------+-------------+ (overcloud) [stack@undercloud-0 ~]$ openstack network list +--------------------------------------+---------------------------------+----------------------------------------------------------------------------+ | ID | Name | Subnets | +--------------------------------------+---------------------------------+----------------------------------------------------------------------------+ | 0a35b315-1af8-4341-bdea-6a93e06534be | tempest-test-network--549049431 | d7dfa555-0f36-4d0b-b648-32c898c364b6 | | 83547db5-71ac-47f7-a3a1-502a58e9e93c | tempest-test-network--221781334 | f9447ddd-9c7b-46fc-86df-5128f3369349 | | b956bb50-2510-474f-9535-60f5d5b1afab | public | 581a83a1-084b-43f4-9f89-333c18d328a7, bc9346fb-ece7-4c95-af0f-0eba2f09308d | | b9a4e653-af2a-4dc5-91de-a590dcbb93b4 | heat_tempestconf_network | b960906d-7d1b-4666-abd8-6f48f08c1fd4 | +--------------------------------------+---------------------------------+----------------------------------------------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack subnet list +--------------------------------------+-------------------------+--------------------------------------+---------------------+ | ID | Name | Network | Subnet | +--------------------------------------+-------------------------+--------------------------------------+---------------------+ | 581a83a1-084b-43f4-9f89-333c18d328a7 | external_ipv6_subnet | b956bb50-2510-474f-9535-60f5d5b1afab | 2620:52:0:13b8::/64 | | b960906d-7d1b-4666-abd8-6f48f08c1fd4 | heat_tempestconf_subnet | b9a4e653-af2a-4dc5-91de-a590dcbb93b4 | 192.168.199.0/24 | | bc9346fb-ece7-4c95-af0f-0eba2f09308d | external_subnet | b956bb50-2510-474f-9535-60f5d5b1afab | 10.0.0.0/24 | | d7dfa555-0f36-4d0b-b648-32c898c364b6 | | 0a35b315-1af8-4341-bdea-6a93e06534be | 10.100.0.0/26 | | f9447ddd-9c7b-46fc-86df-5128f3369349 | | 83547db5-71ac-47f7-a3a1-502a58e9e93c | 10.100.0.0/26 | +--------------------------------------+-------------------------+--------------------------------------+---------------------+ Undercloud to FIP: tempest-maclearning-server-1449286456 10.0.0.250 --> ping NOK tempest-maclearning-server-235549508 10.0.0.234 --> ping NOK tempest-maclearning-server-524944209 10.0.0.170 --> ping OK Compute-1 -2VM's and Compute-0 - 1VM: (overcloud) [stack@undercloud-0 ~]$ openstack server show a528851d-db25-4c25-a8a2-91b266ba13cc | grep compute | OS-EXT-SRV-ATTR:host | compute-0.redhat.local | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-0.redhat.local | (overcloud) [stack@undercloud-0 ~]$ openstack server show 42c0dcc8-6cf8-4b2d-b144-1d4f90a54be2 | grep compute | OS-EXT-SRV-ATTR:host | compute-1.redhat.local | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local | (overcloud) [stack@undercloud-0 ~]$ openstack server show e6fbfcb0-0fd2-40d0-8f31-6eeef0bb3ccf | grep compute | OS-EXT-SRV-ATTR:host | compute-1.redhat.local | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local | (overcloud) [stack@undercloud-0 ~]$ Undercloud- interface: 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:27:8b:85 brd ff:ff:ff:ff:ff:ff altname enp0s5 altname ens5 inet 10.0.0.63/24 brd 10.0.0.255 scope global dynamic noprefixroute eth2 Compute-0 -Interfaces: 4: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000 link/ether 52:54:00:b0:eb:02 brd ff:ff:ff:ff:ff:ff inet6 fe80::5054:ff:feb0:eb02/64 scope link 36: tapb75c712d-99: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UNKNOWN group default qlen 1000 link/ether fe:16:3e:8d:0c:d6 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc16:3eff:fe8d:cd6/64 scope link valid_lft forever preferred_lft forever Ping from undercloud[10.0.0.63] to [10.0.0.250] - [reach to compute but not on tap interface] tcpdump: listening on tapb75c712d-99, link-type EN10MB (Ethernet), snapshot length 262144 bytes ^C 0 packets captured 0 packets received by filter 0 packets dropped by kernel [root@compute-0 ~]# tcpdump -i enp3s0 -vne icmp dropped privs to tcpdump tcpdump: listening on enp3s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes 15:19:37.057952 52:54:00:27:8b:85 > fa:16:3e:49:ac:8b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 15088, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.63 > 10.0.0.250: ICMP echo request, id 505, seq 13, length 64 15:19:38.081933 52:54:00:27:8b:85 > fa:16:3e:49:ac:8b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 15749, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.63 > 10.0.0.250: ICMP echo request, id 505, seq 14, length 64 ^C Ping from tempest-maclearning-server-524944209[10.100.0.8] to tempest-maclearning-server-1449286456[10.100.0.11] -> tenant network - ping ok [root@compute-0 ~]# tcpdump -i tapb75c712d-99 -vne icmp dropped privs to tcpdump tcpdump: listening on tapb75c712d-99, link-type EN10MB (Ethernet), snapshot length 262144 bytes 15:18:38.877559 fa:16:3e:74:f4:be > fa:16:3e:8d:0c:d6, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 40921, offset 0, flags [DF], proto ICMP (1), length 84) 10.100.0.8 > 10.100.0.11: ICMP echo request, id 3, seq 25, length 64 15:18:38.877963 fa:16:3e:8d:0c:d6 > fa:16:3e:74:f4:be, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 4591, offset 0, flags [none], proto ICMP (1), length 84) 10.100.0.11 > 10.100.0.8: ICMP echo reply, id 3, seq 25, length 64 15:18:39.879701 fa:16:3e:74:f4:be > fa:16:3e:8d:0c:d6, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 41199, offset 0, flags [DF], proto ICMP (1), length 84) 10.100.0.8 > 10.100.0.11: ICMP echo request, id 3, seq 26, length 64 15:18:39.879906 fa:16:3e:8d:0c:d6 > fa:16:3e:74:f4:be, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 5282, offset 0, flags [none], proto ICMP (1), length 84) 10.100.0.11 > 10.100.0.8: ICMP echo reply, id 3, seq 26, length 64 [root@compute-0 ~]# podman exec -ituroot nova_virtqemud virsh list --all Id Name State ----------------------------------- 23 instance-00000492 running tempest-maclearning-server-1449286456 login: root Password: [root@tempest-maclearning-server-1449286456 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether fa:16:3e:8d:0c:d6 brd ff:ff:ff:ff:ff:ff altname enp3s0 inet 10.100.0.11/26 brd 10.100.0.63 scope global noprefixroute eth0 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe8d:cd6/64 scope link valid_lft forever preferred_lft forever Compute-1 -Interfaces: 4: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000 link/ether 52:54:00:6f:f3:92 brd ff:ff:ff:ff:ff:ff inet6 fe80::5054:ff:fe6f:f392/64 scope link valid_lft forever preferred_lft forever 31: tap09ce0d1b-54: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UNKNOWN group default qlen 1000 link/ether fe:16:3e:74:f4:be brd ff:ff:ff:ff:ff:ff inet6 fe80::fc16:3eff:fe74:f4be/64 scope link valid_lft forever preferred_lft forever 32: tap33e0c10b-b2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UNKNOWN group default qlen 1000 link/ether fe:16:3e:84:fb:7c brd ff:ff:ff:ff:ff:ff inet6 fe80::fc16:3eff:fe84:fb7c/64 scope link valid_lft forever preferred_lft forever n[root@compute-1 ~]# podman exec -ituroot nova_virtqemud virsh list --all Id Name State ----------------------------------- 18 instance-0000048c running 19 instance-0000048f running instance-0000048f: tempest-maclearning-server-235549508 interface: 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether fa:16:3e:84:fb:7c brd ff:ff:ff:ff:ff:ff altname enp3s0 inet 10.100.0.43/26 brd 10.100.0.63 scope global noprefixroute eth0 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe84:fb7c/64 scope link valid_lft forever preferred_lft forever instance-0000048c: tempest-maclearning-server-524944209 interface: 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether fa:16:3e:74:f4:be brd ff:ff:ff:ff:ff:ff altname enp3s0 inet 10.100.0.8/26 brd 10.100.0.63 scope global noprefixroute eth0 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe74:f4be/64 scope link valid_lft forever preferred_lft forever Ping from Undercloud[10.0.0.63] to tempest-maclearning-server-524944209 [10.0.0.170] [root@compute-1 ~]# tcpdump -i enp3s0 -vne icmp dropped privs to tcpdump tcpdump: listening on enp3s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes 18:06:23.146377 52:54:00:27:8b:85 > fa:16:3e:f3:73:39, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 20744, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.63 > 10.0.0.170: ICMP echo request, id 515, seq 1, length 64 18:06:23.147680 fa:16:3e:f3:73:39 > 52:54:00:27:8b:85, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 28562, offset 0, flags [none], proto ICMP (1), length 84) 10.0.0.170 > 10.0.0.63: ICMP echo reply, id 515, seq 1, length 64 18:06:24.147382 52:54:00:27:8b:85 > fa:16:3e:f3:73:39, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 20859, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.63 > 10.0.0.170: ICMP echo request, id 515, seq 2, length 64 18:06:24.147932 fa:16:3e:f3:73:39 > 52:54:00:27:8b:85, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 29055, offset 0, flags [none], proto ICMP (1), length 84) 10.0.0.170 > 10.0.0.63: ICMP echo reply, id 515, seq 2, length 64 [root@compute-1 ~]# tcpdump -i tap09ce0d1b-54 -vne icmp dropped privs to tcpdump tcpdump: listening on tap09ce0d1b-54, link-type EN10MB (Ethernet), snapshot length 262144 bytes 18:06:23.146869 fa:16:3e:46:bc:58 > fa:16:3e:74:f4:be, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 20744, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.63 > 10.100.0.8: ICMP echo request, id 515, seq 1, length 64 18:06:23.147342 fa:16:3e:74:f4:be > fa:16:3e:46:bc:58, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 28562, offset 0, flags [none], proto ICMP (1), length 84) 10.100.0.8 > 10.0.0.63: ICMP echo reply, id 515, seq 1, length 64 18:06:24.147712 fa:16:3e:46:bc:58 > fa:16:3e:74:f4:be, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 20859, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.63 > 10.100.0.8: ICMP echo request, id 515, seq 2, length 64 18:06:24.147919 fa:16:3e:74:f4:be > fa:16:3e:46:bc:58, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 29055, offset 0, flags [none], proto ICMP (1), length 84) 10.100.0.8 > 10.0.0.63: ICMP echo reply, id 515, seq 2, length 64 Ping from Undercloud[10.0.0.63] to tempest-maclearning-server-524944209 [10.0.0.234] [root@compute-1 ~]# tcpdump -i enp3s0 -vne icmp dropped privs to tcpdump tcpdump: listening on enp3s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes 18:09:02.041335 52:54:00:27:8b:85 > fa:16:3e:26:40:c0, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 11221, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.63 > 10.0.0.234: ICMP echo request, id 516, seq 1, length 64 18:09:03.075585 52:54:00:27:8b:85 > fa:16:3e:26:40:c0, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 11942, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.63 > 10.0.0.234: ICMP echo request, id 516, seq 2, length 64 18:09:04.098585 52:54:00:27:8b:85 > fa:16:3e:26:40:c0, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 12256, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.63 > 10.0.0.234: ICMP echo request, id 516, seq 3, length 64 [root@compute-1 ~]# tcpdump -i tap33e0c10b-b2 -vne icmp dropped privs to tcpdump tcpdump: listening on tap33e0c10b-b2, link-type EN10MB (Ethernet), snapshot length 262144 bytes ^C 0 packets captured 0 packets received by filter 0 packets dropped by kernel [stack@undercloud-0 ~]$ metalsmith list +--------------------------------------+--------------+--------------------------------------+--------------+--------+------------------------+ | UUID | Node Name | Allocation UUID | Hostname | State | IP Addresses | +--------------------------------------+--------------+--------------------------------------+--------------+--------+------------------------+ | 069b96ec-0dfd-41e6-a7e9-82346cf5ba59 | compute-0 | c03efec7-6326-4b0d-8ec9-b97ca23c6cd0 | compute-0 | ACTIVE | ctlplane=192.168.24.36 | | 16cef229-a7c8-4b2b-aebb-6ca74bebe962 | compute-1 | 8c124922-00fa-4b24-b571-357b7421e4dc | compute-1 | ACTIVE | ctlplane=192.168.24.49 | | 6c8d2ff4-ea33-435f-bfa3-f6dd0ea3f181 | controller-0 | 653202c4-0d81-401b-8fa2-7dd36ced4599 | controller-1 | ACTIVE | ctlplane=192.168.24.29 | | ebcbbf95-157f-45de-84c0-61ec4d596f09 | controller-1 | c9d64d19-29f5-4978-96f7-040b043e6012 | controller-0 | ACTIVE | ctlplane=192.168.24.46 | | d46069bf-821c-464f-841e-6ac400062a78 | controller-2 | 26435a9d-1b99-4ba9-b0e8-0d855b8a63d4 | controller-2 | ACTIVE | ctlplane=192.168.24.30 | +--------------------------------------+--------------+--------------------------------------+--------------+--------+------------------------+ [stack@undercloud-0 ~]$ openstack network agent list +--------------------------------------+--------------------+--------------------------------------+-------------------+-------+-------+---------------------------+ | ID | Agent Type | Host | Availability Zone | Alive | State | Binary | +--------------------------------------+--------------------+--------------------------------------+-------------------+-------+-------+---------------------------+ | 4ead02b5-1e7f-45f5-a43b-897200b1b416 | L3 agent | undercloud-0.redhat.local | nova | :-) | UP | neutron-l3-agent | | 52e81468-0fee-4972-b1ac-13ae3c00da87 | Baremetal Node | 16cef229-a7c8-4b2b-aebb-6ca74bebe962 | None | :-) | UP | ironic-neutron-agent | | 6bad2464-3951-48af-96f8-4600e0c7e71f | Baremetal Node | ebcbbf95-157f-45de-84c0-61ec4d596f09 | None | :-) | UP | ironic-neutron-agent | | 85d0d2ab-f236-4f88-b6e3-657ee0786f6f | Baremetal Node | 069b96ec-0dfd-41e6-a7e9-82346cf5ba59 | None | :-) | UP | ironic-neutron-agent | | b18b2b60-d593-414c-a804-48c7d734304d | DHCP agent | undercloud-0.redhat.local | nova | :-) | UP | neutron-dhcp-agent | | b70cd996-9c06-4a06-ac6e-5609bea77f76 | Baremetal Node | 6c8d2ff4-ea33-435f-bfa3-f6dd0ea3f181 | None | :-) | UP | ironic-neutron-agent | | eb569327-eb9d-4faf-ac12-ee342437dff2 | Open vSwitch agent | undercloud-0.redhat.local | None | :-) | UP | neutron-openvswitch-agent | | ef716bcf-2f82-4aa8-9e09-997aff366f50 | Baremetal Node | d46069bf-821c-464f-841e-6ac400062a78 | None | :-) | UP | ironic-neutron-agent | +--------------------------------------+--------------------+--------------------------------------+-------------------+-------+-------+---------------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack compute service list +--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+ | ID | Binary | Host | Zone | Status | State | Updated At | +--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+ | bc14c1b4-0987-4102-9396-b2ace969554d | nova-conductor | controller-0.redhat.local | internal | enabled | up | 2022-09-02T17:24:14.000000 | | 04233df9-2191-4512-a78e-6888d71e2c3f | nova-scheduler | controller-0.redhat.local | internal | enabled | up | 2022-09-02T17:24:09.000000 | | 5d271b6e-cfed-482d-8854-345d1017dc7b | nova-conductor | controller-2.redhat.local | internal | enabled | up | 2022-09-02T17:24:16.000000 | | f8540bb3-3fb6-425a-ae62-75b6fe12c593 | nova-conductor | controller-1.redhat.local | internal | enabled | up | 2022-09-02T17:24:17.000000 | | dfef9e63-b14f-487a-b728-ab4e46fa03b8 | nova-scheduler | controller-2.redhat.local | internal | enabled | up | 2022-09-02T17:24:15.000000 | | c13a1b76-c5e8-4cb8-b8be-7bd4ff65310a | nova-scheduler | controller-1.redhat.local | internal | enabled | up | 2022-09-02T17:24:15.000000 | | c58673d3-44b7-4432-8e6f-21e19ffa6efd | nova-compute | compute-1.redhat.local | nova | enabled | up | 2022-09-02T17:24:16.000000 | | 82bd8341-ebf4-4d67-ae9d-ae15dace0981 | nova-compute | compute-0.redhat.local | nova | enabled | up | 2022-09-02T17:24:10.000000 | +--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack network agent list +--------------------------------------+------------------------------+---------------------------+-------------------+-------+-------+----------------------------+ | ID | Agent Type | Host | Availability Zone | Alive | State | Binary | +--------------------------------------+------------------------------+---------------------------+-------------------+-------+-------+----------------------------+ | 6acee4e9-8549-4887-8b90-44f516008527 | OVN Controller agent | compute-0.redhat.local | | :-) | UP | ovn-controller | | ecee67ec-831a-5a39-be2f-5edf5f98abbf | OVN Metadata agent | compute-0.redhat.local | | :-) | UP | neutron-ovn-metadata-agent | | 45b04725-dad7-415e-affa-79648a0dfb31 | OVN Controller Gateway agent | controller-2.redhat.local | | :-) | UP | ovn-controller | | 5b87c777-c5e3-4460-9cc8-2f7afa28ae31 | OVN Controller Gateway agent | controller-1.redhat.local | | :-) | UP | ovn-controller | | 2a6eeb2e-8ec2-4caf-a139-3387b3016677 | OVN Controller agent | compute-1.redhat.local | | :-) | UP | ovn-controller | | b9944a0d-0539-57ef-a3c4-b5997184a3db | OVN Metadata agent | compute-1.redhat.local | | :-) | UP | neutron-ovn-metadata-agent | | 12e3e70b-ef22-4b7d-a9eb-7226a47340fd | OVN Controller Gateway agent | controller-0.redhat.local | | :-) | UP | ovn-controller | +--------------------------------------+------------------------------+---------------------------+-------------------+-------+-------+----------------------------+ I have an environment if more information is needed. Thanks in advance,
Also: . VM that cannot reach ping-icmp from the undercloud, VM:tempest-maclearning-server-1449286456 (10.100.0.11) cannot reach ping-icmp to its gw router (10.100.0.1) -> ping NOK - VM that can reach ping-icmp from the undercloud, VM:tempest-maclearning-server-524944209 (10.100.0.8)can reach ping-icmp to its gw router (10.100.0.1) -> ping OK
I looked at this problem and the reason why one VM works and the other not is because of the vlan network. Given that the ports have port_security disabled, OVN doesn't have their IPs and MACs. Therefore the router interfaces need to learn the VMs' macs and store them to the mac_binding table in OVN. For some reason, the ARP requests generated by ovn-controller arrive to one VM tagged with the network vlan tag 1092. See the difference between the broadcasted ARP requests from router port to 10.100.0.43 (non-working) on the two VM interfaces: [root@compute-1 ~]# tcpdump -ennvvi tap33e0c10b-b2 -c2 dropped privs to tcpdump tcpdump: listening on tap33e0c10b-b2, link-type EN10MB (Ethernet), snapshot length 262144 bytes 17:58:17.060090 fa:16:3e:95:5c:f3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 1092, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.100.0.43 tell 10.100.0.1, length 28 17:58:18.084180 fa:16:3e:95:5c:f3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 1092, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.100.0.43 tell 10.100.0.1, length 28 2 packets captured 2 packets received by filter 0 packets dropped by kernel [root@compute-1 ~]# tcpdump -ennvvi tap09ce0d1b-54 -c2 dropped privs to tcpdump tcpdump: listening on tap09ce0d1b-54, link-type EN10MB (Ethernet), snapshot length 262144 bytes 17:58:25.252080 fa:16:3e:46:bc:58 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.100.0.43 tell 10.100.0.1, length 28 17:58:26.276405 fa:16:3e:46:bc:58 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.100.0.43 tell 10.100.0.1, length 28 2 packets captured 2 packets received by filter 0 packets dropped by kernel In the first case the packets come tagged, which is wrong because the vlan tag should be stripped. I'm continuing investigating why one VM gets the tag while the other does not. I also tried to create the mac binding entry manually and the ping started to work so this is related only to the l2 broadcast traffic.
After I sent my comment, I noticed that source mac differs too. In the first non-working case the mac is from the ovn-chassis-mac mapping: [root@compute-1 ~]# ovs-vsctl get open . external_ids:ovn-chassis-mac-mappings "datacentre:fa:16:3e:31:f9:d8,tenant:fa:16:3e:95:5c:f3" while the working case correctly uses the router port: [root@controller-0 /]# ovn-nbctl --no-leader get logical_router_port lrp-3101dcb6-9ed4-4200-a162-ddc12790126c mac "fa:16:3e:46:bc:58"
I'm switching this BZ to core OVN component as the problem seems to be with the flows. The used OVN version is ovn22.03-22.03.0-69. I'm also gonna attach the DBs.
ovncontroller generated arp requestes: working: 2022-09-07T19:31:47.651Z|00258|vconn(ovn_pinctrl0)|DBG|unix:/var/run/openvswitch/br-int.mgmt: sent (Success): OFPT_PACKET_OUT (OF1.5) (xid=0xd8e3): in_port=CONTROLLER actions=set_field:0xa640008->reg0,set_field:0xa640001->reg1,set_field:0x4->reg9,set_field:0x1->reg10,set_field:0x1->reg11,set_field:0x3->reg12,set_field:0x1->reg14,set_field:0x3->reg15,set_field:0x6->metadata,set_field:ff:ff:ff:ff:ff:ff->eth_dst,move:NXM_NX_XXREG0[64..95]->NXM_OF_ARP_SPA[],move:NXM_NX_XXREG0[96..127]->NXM_OF_ARP_TPA[],set_field:1->arp_op,resubmit(,37) data_len=42 arp,vlan_tci=0x0000,dl_src=fa:16:3e:46:bc:58,dl_dst=00:00:00:00:00:00,arp_spa=10.0.0.63,arp_tpa=10.100.0.8,arp_op=1,arp_sha=fa:16:3e:46:bc:58,arp_tha=00:00:00:00:00:00 not working: 2022-09-07T19:31:31.513Z|00249|vconn(ovn_pinctrl0)|DBG|unix:/var/run/openvswitch/br-int.mgmt: sent (Success): OFPT_PACKET_OUT (OF1.5) (xid=0xd8dd): in_port=CONTROLLER actions=set_field:0xa64002b->reg0,set_field:0xa640001->reg1,set_field:0x4->reg9,set_field:0x1->reg10,set_field:0x1->reg11,set_field:0x3->reg12,set_field:0x1->reg14,set_field:0x3->reg15,set_field:0x6->metadata,set_field:ff:ff:ff:ff:ff:ff->eth_dst,move:NXM_NX_XXREG0[64..95]->NXM_OF_ARP_SPA[],move:NXM_NX_XXREG0[96..127]->NXM_OF_ARP_TPA[],set_field:1->arp_op,resubmit(,37) data_len=42 arp,vlan_tci=0x0000,dl_src=fa:16:3e:46:bc:58,dl_dst=00:00:00:00:00:00,arp_spa=10.0.0.63,arp_tpa=10.100.0.43,arp_op=1,arp_sha=fa:16:3e:46:bc:58,arp_tha=00:00:00:00:00:00
Hi, this one looks the same to me as https://bugzilla.redhat.com/show_bug.cgi?id=2119194 WDYT?
(In reply to Ales Musil from comment #7) > Hi, > > this one looks the same to me as > https://bugzilla.redhat.com/show_bug.cgi?id=2119194 WDYT? It sounds to me like a different problem. Bug 2119194 is about arp replies not being forwarded to the localnet port while this BZ 2119194 is about broadcast traffic in the vlan backed network still delivers tagged packets to the ports.
Correct me if I am wrong, but the ping works if the VMs are on the same compute node right? For the DVR the traffic leaving through localnet port will add vlan tag of that port + the chassis mac. So that would explain why the tag is present there only if the traffic is going between two compute nodes. There should be flow on the other side which takes care of stripping the tag and changing the MAC address back to the router MAC. The flow should be present in table 0 e.g.: cookie=0xd73de030, duration=17.164s, table=0, n_packets=0, n_bytes=0, idle_age=17, priority=180,conj_id=100,in_port=9,dl_vlan=1092 actions=strip_vlan,load:0xa->NXM_NX_REG11[],load:0x3->NXM_NX_REG12[],load:0x5->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],mod_dl_src:fa:16:3e:46:bc:58,resubmit(,8)
(In reply to Ales Musil from comment #10) > Correct me if I am wrong, but the ping works if the VMs are on the same > compute node right? > My understanding is that it doesn't matter because VM to VM works as long as it's the same network. To the best of my knowledge what doesn't work is an ARP response generated by ovn-controller from LRP to LSP on the same LS because that comes tagged. > For the DVR the traffic leaving through localnet port will add vlan tag of > that port + the chassis mac. > So that would explain why the tag is present there only if the traffic is > going between two compute nodes. > > There should be flow on the other side which takes care of stripping the tag > and changing the MAC address back to the > router MAC. The flow should be present in table 0 e.g.: > cookie=0xd73de030, duration=17.164s, table=0, n_packets=0, n_bytes=0, > idle_age=17, priority=180,conj_id=100,in_port=9,dl_vlan=1092 > actions=strip_vlan,load:0xa->NXM_NX_REG11[],load:0x3->NXM_NX_REG12[],load: > 0x5->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],mod_dl_src:fa:16:3e:46:bc:58, > resubmit(,8)
(In reply to Jakub Libosvar from comment #11) > (In reply to Ales Musil from comment #10) > > Correct me if I am wrong, but the ping works if the VMs are on the same > > compute node right? > > > > My understanding is that it doesn't matter because VM to VM works as long as > it's the same network. To the best of my knowledge what doesn't work is an > ARP response generated by ovn-controller from LRP to LSP on the same LS > because that comes tagged. It might matter because the ARP request is generated on the node that is doing the initial routing. In that case we would need to find out why the flow that strips the tag and changes the MAC address back is not applied to the ARP request. Is this the first test of this scenario with ovn22.03-22.03.0-69? Also does "ovn-appctl inc-engine/recompute" help with that issue? Thanks, Ales > > > For the DVR the traffic leaving through localnet port will add vlan tag of > > that port + the chassis mac. > > So that would explain why the tag is present there only if the traffic is > > going between two compute nodes. > > > > There should be flow on the other side which takes care of stripping the tag > > and changing the MAC address back to the > > router MAC. The flow should be present in table 0 e.g.: > > cookie=0xd73de030, duration=17.164s, table=0, n_packets=0, n_bytes=0, > > idle_age=17, priority=180,conj_id=100,in_port=9,dl_vlan=1092 > > actions=strip_vlan,load:0xa->NXM_NX_REG11[],load:0x3->NXM_NX_REG12[],load: > > 0x5->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],mod_dl_src:fa:16:3e:46:bc:58, > > resubmit(,8)
Patch posted: https://patchwork.ozlabs.org/project/ovn/patch/20220916092353.889390-1-amusil@redhat.com/
Accidentally set to modified, it should be post
(In reply to Ales Musil from comment #12) > (In reply to Jakub Libosvar from comment #11) > > (In reply to Ales Musil from comment #10) > > > Correct me if I am wrong, but the ping works if the VMs are on the same > > > compute node right? > > > > > > > My understanding is that it doesn't matter because VM to VM works as long as > > it's the same network. To the best of my knowledge what doesn't work is an > > ARP response generated by ovn-controller from LRP to LSP on the same LS > > because that comes tagged. > > It might matter because the ARP request is generated on the node that is > doing the > initial routing. In that case we would need to find out why the flow that > strips the tag > and changes the MAC address back is not applied to the ARP request. > > Is this the first test of this scenario with ovn22.03-22.03.0-69? > Also does "ovn-appctl inc-engine/recompute" help with that issue? I *think* we tried that and it didn't help. But I don't have the environment. > > Thanks, > Ales > > > > > > For the DVR the traffic leaving through localnet port will add vlan tag of > > > that port + the chassis mac. > > > So that would explain why the tag is present there only if the traffic is > > > going between two compute nodes. > > > > > > There should be flow on the other side which takes care of stripping the tag > > > and changing the MAC address back to the > > > router MAC. The flow should be present in table 0 e.g.: > > > cookie=0xd73de030, duration=17.164s, table=0, n_packets=0, n_bytes=0, > > > idle_age=17, priority=180,conj_id=100,in_port=9,dl_vlan=1092 > > > actions=strip_vlan,load:0xa->NXM_NX_REG11[],load:0x3->NXM_NX_REG12[],load: > > > 0x5->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],mod_dl_src:fa:16:3e:46:bc:58, > > > resubmit(,8)
ovn22.03 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2131896 ovn22.06 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2131897 ovn22.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2131898 ovn22.09 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2131901 ovn22.09 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2131902
Hi Fiorella, I'm not sure if the fix can really sove the issue on your part. could you please help to test with the fixed ovn : http://download-node-02.eng.bos.redhat.com/brewroot/packages/ovn22.03/22.03.0/106.el8fdp/? thanks Thanks & Best Regards, Jianlin Shi
create reproducer based on https://patchwork.ozlabs.org/project/ovn/patch/20220916092353.889390-1-amusil@redhat.com/: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.36.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.36.25 systemctl restart ovn-controller ovs-vsctl add-br br-ext ovs-vsctl set open . external_ids:ovn-bridge-mappings=phys:br-ext ovs-vsctl set open . external-ids:ovn-chassis-mac-mappings="phys:ee:00:00:00:00:10" ovn-nbctl ls-add internal ovn-nbctl lsp-add internal ln_internal "" 100 ovn-nbctl lsp-set-addresses ln_internal unknown ovn-nbctl lsp-set-type ln_internal localnet ovn-nbctl lsp-set-options ln_internal network_name=phys ovn-nbctl lsp-add internal internal-gw ovn-nbctl lsp-set-type internal-gw router ovn-nbctl lsp-set-addresses internal-gw router ovn-nbctl lsp-set-options internal-gw router-port=gw-internal ovn-nbctl lsp-add internal vif0 ovn-nbctl lsp-set-addresses vif0 unknown ovn-nbctl lr-add gw ovn-nbctl lrp-add gw gw-internal 00:00:00:00:20:00 192.168.20.1/24 ip netns add vif0 ovs-vsctl add-port br-int vif0 -- set interface vif0 type=internal external_ids:iface-id=vif0 ip link set vif0 netns vif0 ip netns exec vif0 ip link set vif0 address 00:00:00:00:20:10 ip netns exec vif0 ip link set vif0 up ip netns exec vif0 ip addr add 192.168.20.10/24 dev vif0 ip netns exec vif0 ping 192.168.20.1 -c 3 result on ovn22.03-22.03.0-101.el8: + ip netns exec vif0 ip addr add 192.168.20.10/24 dev vif0 + ip netns exec vif0 ping 192.168.20.1 -c 3 PING 192.168.20.1 (192.168.20.1) 56(84) bytes of data. --- 192.168.20.1 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2085ms [root@dell-per740-12 bz2123837]# rpm -qa | grep -E "openvswitch2.17|ovn22.03" openvswitch2.17-2.17.0-58.el8fdp.x86_64 ovn22.03-22.03.0-101.el8fdp.x86_64 ovn22.03-host-22.03.0-101.el8fdp.x86_64 ovn22.03-central-22.03.0-101.el8fdp.x86_64 result on ovn22.03-22.03.0-106.el8: + ip netns exec vif0 ip addr add 192.168.20.10/24 dev vif0 + ip netns exec vif0 ping 192.168.20.1 -c 3 PING 192.168.20.1 (192.168.20.1) 56(84) bytes of data. 64 bytes from 192.168.20.1: icmp_seq=1 ttl=254 time=1008 ms 64 bytes from 192.168.20.1: icmp_seq=2 ttl=254 time=4.100 ms 64 bytes from 192.168.20.1: icmp_seq=3 ttl=254 time=0.414 ms --- 192.168.20.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2005ms rtt min/avg/max/mdev = 0.414/337.841/1008.112/473.956 ms, pipe 2 [root@dell-per740-12 bz2123837]# rpm -qa | grep -E "openvswitch2.17|ovn22.03" ovn22.03-22.03.0-106.el8fdp.x86_64 openvswitch2.17-2.17.0-58.el8fdp.x86_64 ovn22.03-central-22.03.0-106.el8fdp.x86_64 ovn22.03-host-22.03.0-106.el8fdp.x86_64
Hi Ales, please help to check comment 25, thanks
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn22.03), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:7393