Description of problem: When pinging VM floating ip address from external network in OVN DVR environment some frames with fip mac address are passing through controller node. This is not an expected behavior. In case of DVR all VM-related packets should pass through the external interface of the corresponding compute node. Version-Release number of selected component (if applicable): python3-networking-ovn-7.0.1-0.20191205040313.2ef5322.el8ost.noarch How reproducible: 100% Steps to Reproduce: 1. Setup - router with internal and external network connected. 2. Run tcpdump on external interfaces of all overcloud nodes. 3. Create a vm connected to internal network and attach a floating ip from external network to it 4. Ping fip ip address from the external network 5. Check tcpdump captures on all nodes and make sure that ethernet frames with fips mac address present only on external interface of the corresponding compute node Actual results: Ethernet frames with FIP's MAC address present on controller external interface Expected results: All ethernet frames with the FIP's MAC address are passing through corresponding compute node external interface. Additional info: Note, I used a downstream tempest test ([1], not merged yet) for reproduction the issue (see attached files). When pinging from a VM to external network like test [2] does the issue does not occur. [1]neutron_plugin.tests.scenario.test_dvr_ovn.DVRBaseTest.test_validate_floatingip_compute_ingress [2]neutron_plugin.tests.scenario.test_dvr_ovn.DVRBaseTest.test_validate_floatingip_compute_egress VM (server) that test created and it's properties {"server": {"id": "c79b5732-4e38-4c0b-b748-05d42d5dea96", "name": "tempest-server-test-1740959910", "status": "ACTIVE", "tenant_id": "cb5c8defb17d4a32aa26558f4711f6ea", "user_id": "58a28a26f75741feac5a4f3231efb138", "metadata": {}, "hostId": "26d8652787495dc0707900285583c42daf9702ce2998524a4e530262", "image": {"id": "db75b208-3f1c-40cf-9942-de5e36e78d5e", "links": [{"rel": "bookmark", "href": "http://10.0.0.116:8774/images/db75b208-3f1c-40cf-9942-de5e36e78d5e"}]}, "flavor": {"id": "f5bcd5ed-e741-4cce-aa54-4abb14dd7035", "links": [{"rel": "bookmark", "href": "http://10.0.0.116:8774/flavors/f5bcd5ed-e741-4cce-aa54-4abb14dd7035"}]}, "created": "2020-01-06T15:29:34Z", "updated": "2020-01-06T15:29:44Z", "addresses": {"tempest-test-network--1196590858": [{"version": 4, "addr": "10.100.0.12", "OS-EXT-IPS:type": "fixed", "OS-EXT-IPS-MAC:mac_addr": "fa:16:3e:04:25:f0"}]}, "accessIPv4": "", "accessIPv6": "", "links": [{"rel": "self", "href": "http://10.0.0.116:8774/v2.1/servers/c79b5732-4e38-4c0b-b748-05d42d5dea96"}, {"rel": "bookmark", "href": "http://10.0.0.116:8774/servers/c79b5732-4e38-4c0b-b748-05d42d5dea96"}], "OS-DCF:diskConfig": "MANUAL", "progress": 0, "OS-EXT-AZ:availability_zone": "nova", "config_drive": "", "key_name": "tempest-keypair-test-1648471112", "OS-SRV-USG:launched_at": "2020-01-06T15:29:44.000000", "OS-SRV-USG:terminated_at": null, "security_groups": [{"name": "tempest-secgroup-1426634481"}], "OS-EXT-SRV-ATTR:host": "compute-1.redhat.local", "OS-EXT-SRV-ATTR:instance_name": "instance-000000ef", "OS-EXT-SRV-ATTR:hypervisor_hostname": "compute-1.redhat.local", "OS-EXT-STS:task_state": null, "OS-EXT-STS:vm_state": "active", "OS-EXT-STS:power_state": 1, "os-extended-volumes:volumes_attached": []}} 2020-01-06 15:29:48.510 1022602 DEBUG neutron_plugin.tests.scenario.test_dvr_ovn [-] FIP port MAC: fa:16:3e:13:94:25 test_validate_floatingip_compute_ingress /home/stack/roman/tempest_neutron_plugin/neutron_plugin/tests/scenario/test_dvr_ovn.py:222 ... 2020-01-06 15:29:52.177 1022602 WARNING neutron_tempest_plugin.scenario.base [-] Failed to ping IP: 10.0.0.226 via a ssh connection from: localhost.: neutron_tempest_plugin.common.utils.SSHExecCommandFailed: Command 'ping -c2 -w2 -s56 10.0.0.226' failed, exit status: 1, stderr: 2020-01-06 15:29:55.209 1022602 WARNING neutron_tempest_plugin.scenario.base [-] Failed to ping IP: 10.0.0.226 via a ssh connection from: localhost.: neutron_tempest_plugin.common.utils.SSHExecCommandFailed: Command 'ping -c2 -w2 -s56 10.0.0.226' failed, exit status: 1, stderr: 2020-01-06 15:29:58.242 1022602 WARNING neutron_tempest_plugin.scenario.base [-] Failed to ping IP: 10.0.0.226 via a ssh connection from: localhost.: neutron_tempest_plugin.common.utils.SSHExecCommandFailed: Command 'ping -c2 -w2 -s56 10.0.0.226' failed, exit status: 1, stderr: 2020-01-06 15:30:01.272 1022602 WARNING neutron_tempest_plugin.scenario.base [-] Failed to ping IP: 10.0.0.226 via a ssh connection from: localhost.: neutron_tempest_plugin.common.utils.SSHExecCommandFailed: Command 'ping -c2 -w2 -s56 10.0.0.226' failed, exit status: 1, stderr: 2020-01-06 15:30:04.300 1022602 WARNING neutron_tempest_plugin.scenario.base [-] Failed to ping IP: 10.0.0.226 via a ssh connection from: localhost.: neutron_tempest_plugin.common.utils.SSHExecCommandFailed: Command 'ping -c2 -w2 -s56 10.0.0.226' failed, exit status: 1, stderr: 2020-01-06 15:30:07.329 1022602 WARNING neutron_tempest_plugin.scenario.base [-] Failed to ping IP: 10.0.0.226 via a ssh connection from: localhost.: neutron_tempest_plugin.common.utils.SSHExecCommandFailed: Command 'ping -c2 -w2 -s56 10.0.0.226' failed, exit status: 1, stderr: 2020-01-06 15:30:10.359 1022602 WARNING neutron_tempest_plugin.scenario.base [-] Failed to ping IP: 10.0.0.226 via a ssh connection from: localhost.: neutron_tempest_plugin.common.utils.SSHExecCommandFailed: Command 'ping -c2 -w2 -s56 10.0.0.226' failed, exit status: 1, stderr: 2020-01-06 15:30:12.391 1022602 DEBUG neutron_tempest_plugin.scenario.base [-] ping result: PING 10.0.0.226 (10.0.0.226) 56(84) bytes of data. 64 bytes from 10.0.0.226: icmp_seq=1 ttl=63 time=1.18 ms 64 bytes from 10.0.0.226: icmp_seq=2 ttl=63 time=1.71 ms --- 10.0.0.226 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 2ms rtt min/avg/max/mdev = 1.179/1.445/1.712/0.269 ms ping_remote /home/stack/roman/neutron-tempest-plugin/neutron_tempest_plugin/scenario/base.py:289 2020-01-06 15:30:12.392 1022602 DEBUG tempest.lib.common.utils.test_utils [-] Call ping_remote returns true in 22.438456 seconds call_until_true /home/stack/roman/tempest/tempest/lib/common/utils/test_utils.py:109 2020-01-06 15:30:12.393 1022602 DEBUG neutron_plugin.tests.scenario.test_dvr_ovn [-] Expected routing node: ['compute-1'] check_north_south_icmp_flow /home/stack/roman/tempest_neutron_plugin/neutron_plugin/tests/scenario/test_dvr_ovn.py:84 2020-01-06 15:30:12.779 1022602 DEBUG neutron_plugin.tests.scenario.test_dvr_ovn [-] Actual routing nodes: controller-0,compute-1 check_north_south_icmp_flow /home/stack/roman/tempest_neutron_plugin/neutron_plugin/tests/scenario/test_dvr_ovn.py:89 [rsafrono@rsafrono tmp]$ tshark -r /tmp/controller0.pcap 1 0.000000 10.100.0.12 → 10.0.0.11 ICMP 98 Echo (ping) reply id=0x9ffa, seq=2/512, ttl=63 2 1.987270 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0xa02f, seq=1/256, ttl=64 3 2.998717 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0xa02f, seq=2/512, ttl=64 4 5.016402 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0xa0b6, seq=1/256, ttl=64 [rsafrono@rsafrono tmp]$ tshark -r /tmp/compute1.pcap 1 0.000000 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0x9f4d, seq=1/256, ttl=64 2 1.059799 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0x9f4d, seq=2/512, ttl=64 3 3.029127 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0x9f65, seq=1/256, ttl=64 4 4.067707 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0x9f65, seq=2/512, ttl=64 5 6.063873 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0x9fc8, seq=1/256, ttl=64 6 7.075793 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0x9fc8, seq=2/512, ttl=64 7 9.094020 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0x9fe2, seq=1/256, ttl=64 8 10.147722 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0x9fe2, seq=2/512, ttl=64 9 12.121355 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0x9ffa, seq=1/256, ttl=64 10 13.155744 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0x9ffa, seq=2/512, ttl=64 11 19.235742 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0xa0b6, seq=2/512, ttl=64 12 19.237680 10.0.0.226 → 10.0.0.11 ICMP 98 Echo (ping) reply id=0xa0b6, seq=2/512, ttl=63 (request in 11) 13 21.210023 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0xa0fe, seq=1/256, ttl=64 14 21.210765 10.0.0.226 → 10.0.0.11 ICMP 98 Echo (ping) reply id=0xa0fe, seq=1/256, ttl=63 (request in 13) 15 22.211345 10.0.0.11 → 10.0.0.226 ICMP 98 Echo (ping) request id=0xa0fe, seq=2/512, ttl=64 16 22.212753 10.0.0.226 → 10.0.0.11 ICMP 98 Echo (ping) reply id=0xa0fe, seq=2/512, ttl=63 (request in 15) [rsafrono@rsafrono tmp]$ tshark -r /tmp/controller0.pcap -T fields -e eth.src -e ip.src -e eth.dst -e ip.dst fa:16:3e:13:94:25 10.100.0.12 52:54:00:a0:b6:e6 10.0.0.11 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 [rsafrono@rsafrono tmp]$ tshark -r /tmp/compute1.pcap -T fields -e eth.src -e ip.src -e eth.dst -e ip.dst 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 fa:16:3e:13:94:25 10.0.0.226 52:54:00:a0:b6:e6 10.0.0.11 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 fa:16:3e:13:94:25 10.0.0.226 52:54:00:a0:b6:e6 10.0.0.11 52:54:00:a0:b6:e6 10.0.0.11 fa:16:3e:13:94:25 10.0.0.226 fa:16:3e:13:94:25 10.0.0.226 52:54:00:a0:b6:e6 10.0.0.11
Created attachment 1650181 [details] tempest log when the test reproducing the issue is running
Created attachment 1650182 [details] downstream test_ovn_dvr.py code for reproducing the issue
Assigning to myself to triage
I think we're hitting the same issue in the migration job, specifically in the port-migration validation part. I think I was able to pinpoint the problem. The summary is that because of flooded anonymous unicast, controller replies to ICMP request before compute node and because this is virtualized environment, host linux bridge serving as an L2 switch learns wrong port for given FIP. The situation is fixed once undercloud sends ARP request and only compute node responds. Details are as follows: The test is pinging the floating IP 10.0.0.238 (MAC fa:16:3e:ed:9d:7b) of a VM hosted on compute-1 from the undercloud-0 IP 10.0.0.85 Undercloud knows the MAC of FIP from previous pre-migration tests. [root@undercloud-0 ~]# ip neigh | grep 238 10.0.0.238 dev eth2 lladdr fa:16:3e:ed:9d:7b STALE However entry for given MAC on linux bridge on hypervisor has already expired and linux bridge doesn't know to which port send packets with destination mac of the FIP. Note the times below are not synced well on the nodes. 1) Because undercloud knows the MAC, it's not gonna send ARP request and send ICMP request right away: undercloud: 17:56:24.520460 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 43722, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 1, length 64 2) linux bridge on hypervisor doesn't know the port thus it treats it as anonymous unicast and floods it to all the ports, including port to controller-0 controller-0: 17:56:21.961568 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 43722, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 1, length 64 compute-1: 17:56:22.374363 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 43722, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 1, length 64 3) controller treats it in centralized fashion, NATs the packet and sends the ICMP request via Geneve to compute-1 (I haven't captured that) and VM replies 4) controller replies to the ping - note the source IP being the fixed IP 172.168.199.48 controller-0: 17:56:21.976714 fa:16:3e:ed:9d:7b > 52:54:00:f0:21:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 46602, offset 0, flags [none], proto ICMP (1), length 84) 172.168.199.48 > 10.0.0.85: ICMP echo reply, id 11848, seq 1, length 64 5) linux bridge on hypervisor learns wrongly that FIP MAC fa:16:3e:ed:9d:7b is on controller-0 instead of compute-0, because the ping came from there 6) undercloud sees source address being different than the FIP one so doesn't treat the packet as correct ping response 7) all following ICMP requests destined to FIP MAC are sent to controller by hypervisor 17:56:22.995899 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 43835, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 2, length 64 17:56:24.019398 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 44174, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 3, length 64 17:56:25.044132 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 45046, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 4, length 64 17:56:26.069138 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 45468, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 5, length 64 17:56:27.091393 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 46468, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 6, length 64 17:56:27.348197 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28 17:56:28.115775 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 46594, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 7, length 64 17:56:28.371237 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28 17:56:29.139381 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 47476, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 8, length 64 17:56:29.395205 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28 17:56:30.163767 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 47935, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 9, length 64 8) undercloud sends broadcast ARP Request for 10.0.0.238, meaning hypervisor sends it to all nodes 17:56:31.187610 52:54:00:f0:21:7b > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28 9) compute-0 sees it and responds correctly compute-0: 17:56:31.600349 52:54:00:f0:21:7b > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28 17:56:31.600849 fa:16:3e:ed:9d:7b > 52:54:00:f0:21:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.0.0.238 is-at fa:16:3e:ed:9d:7b, length 28 10) hypervisor linux bridge learns that MAC fa:16:3e:ed:9d:7b is not on compute-1 11) the situation is now correct and pings start working on undercloud There is a thing I don't understand yet, which is "why compute-1 didn't reply to the first ICMP request and why controller-0 stops responding to ICMP seq 2-8"
This is what I see after debugging further, plus will answer to Kuba's question in his last comment :) 1) First, I delete the MAC_Binding table contents from the SB database. This will 'simulate' the migration process where the MAC address for the FIP was learnt from the undercloud (or tempest executor) and the OVN DB is freshly created thus the MAC_Binding table is empty: ()[root@compute-1 /]# for i in $(ovn-sbctl --db=tcp:172.17.1.51:6642 list mac_binding |grep _uuid |awk {'print $3'}); do ovn-sbctl --db=tcp:172.17.1.51:6642 destroy mac_binding $i ; done 2) Then ping the FIP from the undercloud which has a stale entry for the FIP MAC address: (overcloud) [stack@undercloud-0 ~]$ sudo ip neigh | grep 238 10.0.0.238 dev eth2 lladdr fa:16:3e:ed:9d:7b STALE (overcloud) [stack@undercloud-0 ~]$ ping -c5 10.0.0.238 PING 10.0.0.238 (10.0.0.238) 56(84) bytes of data. --- 10.0.0.238 ping statistics --- 5 packets transmitted, 0 received, 100% packet loss, time 83ms ** Traffic flow *** 4) The ICMP echo request is sent by the undercloud and reaches controller and compute nodes As Kuba mentioned in c5, the ToR switch (in this case, the underlying linux bridge in the hypervisor) doesn't know where the MAC address is so it floods the ICMP echo request everywhere reaching the controller and compute nodes. This is observed in the provider interface on both: Compute1: [root@compute-1 ~]# tcpdump -ennvvi ens5 icmp or arp tcpdump: listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes 21:51:50.016952 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 21281, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 46543, seq 1, length 64 Controller0: [root@controller-0 ~]# tcpdump -ennvvi ens5 arp or icmp tcpdump: listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes 21:51:49.603687 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 21281, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 46543, seq 1, length 64 5) ovn-controller in Compute1 processes the ICMP Echo Request and delivers it to the VM Verified by checking n_packets before and after: [root@compute-1 ~]# ovs-ofctl dump-flows br-int | grep "output:7" cookie=0x0, duration=448633.349s, table=65, n_packets=384, n_bytes=34786, idle_age=5, hard_age=65534, priority=100,reg15=0x4,metadata=0x3 actions=output:7 6) The VM replies to the ping with an ICMP Echo Reply: @ the VM: 22:12:44.555837 fa:16:3e:01:45:e5 > fa:16:3e:09:27:87, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 3495, offset 0, flags [none], proto ICMP (1), length 84) 172.168.199.48 > 10.0.0.85: ICMP echo reply, id 9388, seq 1, length 64 7) ovn-controller doesn't know the MAC address of 10.0.0.85 (undercloud) because I cleaned up MAC_Binding table so it sends an ARP: 21:51:50.023766 fa:16:3e:4b:06:ef > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.85 tell 10.0.0.211, length 28 10.0.0.211 is the gateway port IP address which is bound to controller-0: ()[root@compute-1 /]# ovn-nbctl --db=tcp:172.17.1.51:6641 list logical_router_port |grep "10.0.0.211" -C6 | grep ^name name : "lrp-8e202c45-ccd1-472f-bb2e-00378ce667e5" ()[root@compute-1 /]# ovn-sbctl --db=tcp:172.17.1.51:6642 show |grep lrp-8e202c45-ccd1-472f-bb2e-00378ce667e5 -B4 | grep hostname hostname: "controller-0.redhat.local" 8) Undercloud replies to the ARP and it arrives to the controller-0: [root@controller-0 ~]# tcpdump -ennvvi ens5 arp or icmp 21:51:49.610596 52:54:00:f0:21:7b > fa:16:3e:4b:06:ef, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.0.0.85 is-at 52:54:00:f0:21:7b, length 28 So ovn-controller on controller-0 adds a MAC_Binding entry to the SB database: _uuid : d69537a2-ecc9-4398-afec-6c0a997c799f datapath : b92566b8-8b4d-4a58-adee-84acff14a1c2 ip : "10.0.0.85" logical_port : "lrp-8e202c45-ccd1-472f-bb2e-00378ce667e5" mac : "52:54:00:f0:21:7b" 9) After the MAC is learnt, the packet is resumed in compute-1 and the ICMP echo reply from the VM is pushed via the Geneve tunnel to controller-0: *THIS LOOKS LIKE THE REAL BUG* Geneve interface: 21:51:49.614490 fa:16:3e:4b:06:ef > 52:54:00:f0:21:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 54902, offset 0, flags [none], proto ICMP (1), length 84) 172.168.199.48 > 10.0.0.85: ICMP echo reply, id 46543, seq 1, length 64 10) The ICMP echo reply packet is sent out through the provider interface to the undercloud Provider interface: 21:51:49.615709 fa:16:3e:ed:9d:7b > 52:54:00:f0:21:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 54902, offset 0, flags [none], proto ICMP (1), length 84) 172.168.199.48 > 10.0.0.85: ICMP echo reply, id 46543, seq 1, length 64 11) The Linux Bridge learns the MAC of the FIP (10.0.0.238 fa:16:3e:ed:9d:7b) on the port of the controller-0 12) Subsequent ICMP echo requests are observed in the provider interface of the controller-0 and they'll be dropped in the integration bridge: [root@controller-0 ~]# tcpdump -ennvvi ens5 arp or icmp tcpdump: listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes 21:51:50.610001 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 21713, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 46543, seq 2, length 64 21:51:51.634608 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 21817, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 46543, seq 3, length 64 21:51:52.658961 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 22459, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 46543, seq 4, length 64 21:51:53.683261 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 23089, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.85 > 10.0.0.238: ICMP echo request, id 46543, seq 5, length 64 13) The undercloud will now send ARP requests for the FIP (10.0.0.238), compute-1 will reply and the Linux Bridge will learn the FIP MAC on the right port: [root@dell-r730-004 ~]# tcpdump -vvnee -i external arp tcpdump: listening on external, link-type EN10MB (Ethernet), capture size 262144 bytes 04:35:30.516984 fa:16:3e:4b:06:ef > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.85 tell 10.0.0.211, length 28 04:35:30.517126 52:54:00:f0:21:7b > fa:16:3e:4b:06:ef, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.0.0.85 is-at 52:54:00:f0:21:7b, length 28 04:35:35.553573 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28 04:35:36.498712 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28 04:35:37.443864 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28 04:35:39.040301 52:54:00:f0:21:7b > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28 04:35:39.040874 fa:16:3e:ed:9d:7b > 52:54:00:f0:21:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.0.0.238 is-at fa:16:3e:ed:9d:7b, length 28 [root@dell-r730-004 ~]# while true; do echo `date +"%T.%N"` `brctl showmacs external | grep fa:16:3e:ed:9d:7b` && sleep 1; done 04:35:35.959411345 6 fa:16:3e:ed:9d:7b no 5.43 04:35:36.963957327 6 fa:16:3e:ed:9d:7b no 6.44 04:35:37.968618892 6 fa:16:3e:ed:9d:7b no 7.44 04:35:38.973409151 6 fa:16:3e:ed:9d:7b no 8.45 04:35:39.977980409 3 fa:16:3e:ed:9d:7b no 0.01 04:35:40.983020193 3 fa:16:3e:ed:9d:7b no 0.09 04:35:41.988080917 3 fa:16:3e:ed:9d:7b no 0.17 The ARP reply from the compute-1 arrives at 04:35:39.040874 which is when the MAC switches from port 6 to port 3 in the bridge and since this moment, the ping starts to work. The problem seems to be in the resume of the execution of the first ICMP echo reply sent by the VM when the destination IP is tried to be resolved. This packet is sent via the Geneve tunnel while it should be sent via the provider interface. Setting NEEDINFO on Lorenzo who added this feature on core OVN [0] [0] https://patchwork.ozlabs.org/patch/985916/
Some notes on my previous comment: - The stale MAC entry in the undercloud does not matter at all, even if there's an ARP request, the resume of the first ICMP echo reply will first go through the tunnel which is causing the observed behavior - This is not affecting just the migration from ML2/OVS to ML2/OVN but mostly any traffic going from a compute node to the provider network for an unknown destination address.
following step in https://bugzilla.redhat.com/show_bug.cgi?id=1809629#c3. reproduced on 2.11.1-33: [root@dell-per740-12 bz1809629]# rpm -qa | grep -E "openvswitch|ovn" kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch openvswitch2.11-2.11.0-50.el8fdp.x86_64 ovn2.11-2.11.1-33.el8fdp.x86_64 ovn2.11-central-2.11.1-33.el8fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-acl-1.0-5.noarch ovn2.11-host-2.11.1-33.el8fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-basic-1.0-20.noarch openvswitch-selinux-extra-policy-1.0-22.el8fdp.noarch [root@hp-dl380pg8-12 bz1809629]# ip netns exec p2 ping 172.16.1.100 -c 3 PING 172.16.1.100 (172.16.1.100) 56(84) bytes of data. 64 bytes from 172.16.1.100: icmp_seq=1 ttl=63 time=3.87 ms 64 bytes from 172.16.1.100: icmp_seq=2 ttl=63 time=0.273 ms 64 bytes from 172.16.1.100: icmp_seq=3 ttl=63 time=0.242 ms --- 172.16.1.100 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 25ms rtt min/avg/max/mdev = 0.242/1.461/3.869/1.702 ms [root@dell-per740-12 bz1809629]# ovn-sbctl find mac_binding ip=172.16.1.2 _uuid : 5f101df7-7ecc-4929-9073-8777ac92c898 datapath : 7e98b95e-aba9-4fd5-9cb5-618d83ef8456 ip : "172.16.1.2" logical_port : "lr1-ls0" mac : "ba:d2:12:0e:14:ba" [root@dell-per740-12 bz1809629]# ovn-sbctl destroy mac_binding 5f101df7-7ecc-4929-9073-8777ac92c898 [root@hp-dl380pg8-12 bz1809629]# ip netns exec p2 ping 172.16.1.100 -c 3 PING 172.16.1.100 (172.16.1.100) 56(84) bytes of data. 64 bytes from 172.16.1.100: icmp_seq=2 ttl=63 time=1.20 ms 64 bytes from 172.16.1.100: icmp_seq=3 ttl=63 time=0.285 ms --- 172.16.1.100 ping statistics --- 3 packets transmitted, 2 received, 33.3333% packet loss, time 31ms rtt min/avg/max/mdev = 0.285/0.740/1.196/0.456 ms <=== first packet lose Verified on 2.11.1-37: [root@hp-dl380pg8-12 bz1809629]# rpm -qa | grep -E "openvswitch|ovn" ovn2.11-2.11.1-37.el8fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch ovn2.11-host-2.11.1-37.el8fdp.x86_64 ovn2.11-central-2.11.1-37.el8fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-acl-1.0-5.noarch openvswitch2.11-2.11.0-50.el8fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-basic-1.0-20.noarch openvswitch-selinux-extra-policy-1.0-22.el8fdp.noarch [root@hp-dl380pg8-12 bz1809629]# ip netns exec p2 ping 172.16.1.100 -c 3 PING 172.16.1.100 (172.16.1.100) 56(84) bytes of data. 64 bytes from 172.16.1.100: icmp_seq=1 ttl=63 time=7.50 ms 64 bytes from 172.16.1.100: icmp_seq=2 ttl=63 time=0.267 ms 64 bytes from 172.16.1.100: icmp_seq=3 ttl=63 time=0.244 ms --- 172.16.1.100 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 65ms rtt min/avg/max/mdev = 0.244/2.670/7.501/3.416 ms [root@dell-per740-12 bz1809629]# ovn-sbctl find mac_binding ip=172.16.1.2 _uuid : ba3cd4db-2e94-4b3d-b586-7ce8304d7c98 datapath : 6f7e4668-46e7-4bb5-9a79-eecc0e2b5b4a ip : "172.16.1.2" logical_port : "lr1-ls0" mac : "26:5b:8b:68:82:d5" [root@dell-per740-12 bz1809629]# ovn-sbctl destroy mac_binding ba3cd4db-2e94-4b3d-b586-7ce8304d7c98 [root@hp-dl380pg8-12 bz1809629]# ip netns exec p2 ping 172.16.1.100 -c 3 PING 172.16.1.100 (172.16.1.100) 56(84) bytes of data. 64 bytes from 172.16.1.100: icmp_seq=1 ttl=63 time=4.37 ms 64 bytes from 172.16.1.100: icmp_seq=2 ttl=63 time=0.268 ms 64 bytes from 172.16.1.100: icmp_seq=3 ttl=63 time=0.261 ms --- 172.16.1.100 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 63ms rtt min/avg/max/mdev = 0.261/1.633/4.370/1.935 ms <=== no packet lose
verified on rhel7 version: [root@hp-dl380pg8-13 bz1809629]# rpm -qa | grep -E "openvswitch|ovn" ovn2.11-host-2.11.1-37.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch openvswitch2.11-2.11.0-50.el7fdp.x86_64 ovn2.11-central-2.11.1-37.el7fdp.x86_64 ovn2.11-2.11.1-37.el7fdp.x86_64 [root@hp-dl380pg8-13 bz1809629]# ip netns exec p2 ping 172.16.1.100 -c 3 PING 172.16.1.100 (172.16.1.100) 56(84) bytes of data. 64 bytes from 172.16.1.100: icmp_seq=1 ttl=63 time=6.84 ms 64 bytes from 172.16.1.100: icmp_seq=2 ttl=63 time=0.211 ms 64 bytes from 172.16.1.100: icmp_seq=3 ttl=63 time=0.169 ms --- 172.16.1.100 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2002ms rtt min/avg/max/mdev = 0.169/2.407/6.841/3.135 ms [root@dell-per740-42 bz1809629]# ovn-sbctl find mac_binding ip=172.16.1.2 _uuid : 596166b9-f7cc-4eee-968f-f3f251810750 datapath : a9a55239-b456-45f4-a24d-d27190efd258 ip : "172.16.1.2" logical_port : "lr1-ls0" mac : "ae:9c:a3:c4:8f:37" [root@dell-per740-42 bz1809629]# ovn-sbctl destroy mac_binding 596166b9-f7cc-4eee-968f-f3f251810750 [root@hp-dl380pg8-13 bz1809629]# ip netns exec p2 ping 172.16.1.100 -c 3 PING 172.16.1.100 (172.16.1.100) 56(84) bytes of data. 64 bytes from 172.16.1.100: icmp_seq=1 ttl=63 time=3.07 ms 64 bytes from 172.16.1.100: icmp_seq=2 ttl=63 time=0.152 ms 64 bytes from 172.16.1.100: icmp_seq=3 ttl=63 time=0.086 ms --- 172.16.1.100 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.086/1.105/3.079/1.396 ms <==== no packet loss
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1458
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days