Bug 1788193

Summary: [OVN] some icmp packets are passing through controller when pinging VM FIP in DVR environment
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Roman Safronov <rsafrono>
Component: ovn2.11Assignee: lorenzo bianconi <lorenzo.bianconi>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: high Docs Contact:
Priority: unspecified    
Version: RHEL 8.0CC: apevec, ctrautma, dalvarez, ffernand, jlibosva, lhh, lorenzo.bianconi, majopela, mmichels, scohen
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn2.11-2.11.1-35.el7fdn Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1809629 1845839 (view as bug list) Environment:
Last Closed: 2020-04-14 14:29:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1797685, 1809629, 1845839    
Attachments:
Description Flags
tempest log when the test reproducing the issue is running
none
downstream test_ovn_dvr.py code for reproducing the issue none

Description Roman Safronov 2020-01-06 16:41:56 UTC
Description of problem:
When pinging VM floating ip address from external network in OVN DVR environment some frames with fip mac address are passing through controller node. This is not an expected behavior. In case of DVR all VM-related packets should pass through the external interface of the corresponding compute node. 

Version-Release number of selected component (if applicable):
python3-networking-ovn-7.0.1-0.20191205040313.2ef5322.el8ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Setup - router with internal and external network connected. 
2. Run tcpdump on external interfaces of all overcloud nodes.
3. Create a vm connected to internal network and attach a floating ip from external network to it
4. Ping fip ip address from the external network
5. Check tcpdump captures on all nodes and make sure that ethernet frames with fips mac address present only on external interface of the corresponding compute node 


Actual results:
Ethernet frames with FIP's MAC address present on controller external interface


Expected results:
All ethernet frames with the FIP's MAC address are passing through corresponding compute node external interface.

Additional info:
Note, I used a downstream tempest test ([1], not merged yet) for reproduction the issue (see attached files). When pinging from a VM to external network like test [2] does the issue does not occur.
[1]neutron_plugin.tests.scenario.test_dvr_ovn.DVRBaseTest.test_validate_floatingip_compute_ingress
[2]neutron_plugin.tests.scenario.test_dvr_ovn.DVRBaseTest.test_validate_floatingip_compute_egress

VM (server) that test created and it's properties

{"server": 
{"id": "c79b5732-4e38-4c0b-b748-05d42d5dea96", 
 "name": "tempest-server-test-1740959910", 
 "status": "ACTIVE", 
 "tenant_id": "cb5c8defb17d4a32aa26558f4711f6ea", 
 "user_id": "58a28a26f75741feac5a4f3231efb138", 
 "metadata": {}, 
 "hostId": "26d8652787495dc0707900285583c42daf9702ce2998524a4e530262", 
 "image": {"id": "db75b208-3f1c-40cf-9942-de5e36e78d5e", 
 "links": [{"rel": "bookmark", 
            "href": "http://10.0.0.116:8774/images/db75b208-3f1c-40cf-9942-de5e36e78d5e"}]}, 
 "flavor": {"id": "f5bcd5ed-e741-4cce-aa54-4abb14dd7035", 
            "links": [{"rel": "bookmark", 
                       "href": "http://10.0.0.116:8774/flavors/f5bcd5ed-e741-4cce-aa54-4abb14dd7035"}]}, 
 "created": "2020-01-06T15:29:34Z", 
 "updated": "2020-01-06T15:29:44Z", 
 "addresses": {"tempest-test-network--1196590858": 
   [{"version": 4, 
     "addr": "10.100.0.12", 
     "OS-EXT-IPS:type": "fixed", 
     "OS-EXT-IPS-MAC:mac_addr": "fa:16:3e:04:25:f0"}]}, 
 "accessIPv4": "", 
 "accessIPv6": "", 
 "links": [{"rel": "self", "href": 
 "http://10.0.0.116:8774/v2.1/servers/c79b5732-4e38-4c0b-b748-05d42d5dea96"}, 
 {"rel": "bookmark", 
  "href": "http://10.0.0.116:8774/servers/c79b5732-4e38-4c0b-b748-05d42d5dea96"}], 
  "OS-DCF:diskConfig": "MANUAL", 
  "progress": 0, 
  "OS-EXT-AZ:availability_zone": "nova", 
  "config_drive": "", 
  "key_name": "tempest-keypair-test-1648471112", 
  "OS-SRV-USG:launched_at": "2020-01-06T15:29:44.000000", 
  "OS-SRV-USG:terminated_at": null, 
  "security_groups": [{"name": "tempest-secgroup-1426634481"}], 
  "OS-EXT-SRV-ATTR:host": "compute-1.redhat.local", 
  "OS-EXT-SRV-ATTR:instance_name": "instance-000000ef", 
  "OS-EXT-SRV-ATTR:hypervisor_hostname": "compute-1.redhat.local", 
  "OS-EXT-STS:task_state": null, 
  "OS-EXT-STS:vm_state": "active", 
  "OS-EXT-STS:power_state": 1, 
  "os-extended-volumes:volumes_attached": []}}



2020-01-06 15:29:48.510 1022602 DEBUG neutron_plugin.tests.scenario.test_dvr_ovn [-] FIP port MAC: fa:16:3e:13:94:25 test_validate_floatingip_compute_ingress /home/stack/roman/tempest_neutron_plugin/neutron_plugin/tests/scenario/test_dvr_ovn.py:222

...

2020-01-06 15:29:52.177 1022602 WARNING neutron_tempest_plugin.scenario.base [-] Failed to ping IP: 10.0.0.226 via a ssh connection from: localhost.: neutron_tempest_plugin.common.utils.SSHExecCommandFailed: Command 'ping -c2 -w2 -s56 10.0.0.226' failed, exit status: 1, stderr:
2020-01-06 15:29:55.209 1022602 WARNING neutron_tempest_plugin.scenario.base [-] Failed to ping IP: 10.0.0.226 via a ssh connection from: localhost.: neutron_tempest_plugin.common.utils.SSHExecCommandFailed: Command 'ping -c2 -w2 -s56 10.0.0.226' failed, exit status: 1, stderr:
2020-01-06 15:29:58.242 1022602 WARNING neutron_tempest_plugin.scenario.base [-] Failed to ping IP: 10.0.0.226 via a ssh connection from: localhost.: neutron_tempest_plugin.common.utils.SSHExecCommandFailed: Command 'ping -c2 -w2 -s56 10.0.0.226' failed, exit status: 1, stderr:
2020-01-06 15:30:01.272 1022602 WARNING neutron_tempest_plugin.scenario.base [-] Failed to ping IP: 10.0.0.226 via a ssh connection from: localhost.: neutron_tempest_plugin.common.utils.SSHExecCommandFailed: Command 'ping -c2 -w2 -s56 10.0.0.226' failed, exit status: 1, stderr:
2020-01-06 15:30:04.300 1022602 WARNING neutron_tempest_plugin.scenario.base [-] Failed to ping IP: 10.0.0.226 via a ssh connection from: localhost.: neutron_tempest_plugin.common.utils.SSHExecCommandFailed: Command 'ping -c2 -w2 -s56 10.0.0.226' failed, exit status: 1, stderr:
2020-01-06 15:30:07.329 1022602 WARNING neutron_tempest_plugin.scenario.base [-] Failed to ping IP: 10.0.0.226 via a ssh connection from: localhost.: neutron_tempest_plugin.common.utils.SSHExecCommandFailed: Command 'ping -c2 -w2 -s56 10.0.0.226' failed, exit status: 1, stderr:
2020-01-06 15:30:10.359 1022602 WARNING neutron_tempest_plugin.scenario.base [-] Failed to ping IP: 10.0.0.226 via a ssh connection from: localhost.: neutron_tempest_plugin.common.utils.SSHExecCommandFailed: Command 'ping -c2 -w2 -s56 10.0.0.226' failed, exit status: 1, stderr:
2020-01-06 15:30:12.391 1022602 DEBUG neutron_tempest_plugin.scenario.base [-] ping result: PING 10.0.0.226 (10.0.0.226) 56(84) bytes of data.
64 bytes from 10.0.0.226: icmp_seq=1 ttl=63 time=1.18 ms
64 bytes from 10.0.0.226: icmp_seq=2 ttl=63 time=1.71 ms

--- 10.0.0.226 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 2ms
rtt min/avg/max/mdev = 1.179/1.445/1.712/0.269 ms
 ping_remote /home/stack/roman/neutron-tempest-plugin/neutron_tempest_plugin/scenario/base.py:289
2020-01-06 15:30:12.392 1022602 DEBUG tempest.lib.common.utils.test_utils [-] Call ping_remote returns true in 22.438456 seconds call_until_true /home/stack/roman/tempest/tempest/lib/common/utils/test_utils.py:109

2020-01-06 15:30:12.393 1022602 DEBUG neutron_plugin.tests.scenario.test_dvr_ovn [-] Expected routing node: ['compute-1'] check_north_south_icmp_flow /home/stack/roman/tempest_neutron_plugin/neutron_plugin/tests/scenario/test_dvr_ovn.py:84
2020-01-06 15:30:12.779 1022602 DEBUG neutron_plugin.tests.scenario.test_dvr_ovn [-] Actual routing nodes: controller-0,compute-1 check_north_south_icmp_flow /home/stack/roman/tempest_neutron_plugin/neutron_plugin/tests/scenario/test_dvr_ovn.py:89



[rsafrono@rsafrono tmp]$ tshark -r /tmp/controller0.pcap
    1   0.000000  10.100.0.12 → 10.0.0.11    ICMP 98 Echo (ping) reply    id=0x9ffa, seq=2/512, ttl=63
    2   1.987270    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0xa02f, seq=1/256, ttl=64
    3   2.998717    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0xa02f, seq=2/512, ttl=64
    4   5.016402    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0xa0b6, seq=1/256, ttl=64

[rsafrono@rsafrono tmp]$ tshark -r /tmp/compute1.pcap
    1   0.000000    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0x9f4d, seq=1/256, ttl=64
    2   1.059799    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0x9f4d, seq=2/512, ttl=64
    3   3.029127    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0x9f65, seq=1/256, ttl=64
    4   4.067707    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0x9f65, seq=2/512, ttl=64
    5   6.063873    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0x9fc8, seq=1/256, ttl=64
    6   7.075793    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0x9fc8, seq=2/512, ttl=64
    7   9.094020    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0x9fe2, seq=1/256, ttl=64
    8  10.147722    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0x9fe2, seq=2/512, ttl=64
    9  12.121355    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0x9ffa, seq=1/256, ttl=64
   10  13.155744    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0x9ffa, seq=2/512, ttl=64
   11  19.235742    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0xa0b6, seq=2/512, ttl=64
   12  19.237680   10.0.0.226 → 10.0.0.11    ICMP 98 Echo (ping) reply    id=0xa0b6, seq=2/512, ttl=63 (request in 11)
   13  21.210023    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0xa0fe, seq=1/256, ttl=64
   14  21.210765   10.0.0.226 → 10.0.0.11    ICMP 98 Echo (ping) reply    id=0xa0fe, seq=1/256, ttl=63 (request in 13)
   15  22.211345    10.0.0.11 → 10.0.0.226   ICMP 98 Echo (ping) request  id=0xa0fe, seq=2/512, ttl=64
   16  22.212753   10.0.0.226 → 10.0.0.11    ICMP 98 Echo (ping) reply    id=0xa0fe, seq=2/512, ttl=63 (request in 15)


[rsafrono@rsafrono tmp]$ tshark -r /tmp/controller0.pcap -T fields -e eth.src -e ip.src -e eth.dst -e ip.dst
fa:16:3e:13:94:25	10.100.0.12	52:54:00:a0:b6:e6	10.0.0.11
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226


[rsafrono@rsafrono tmp]$ tshark -r /tmp/compute1.pcap -T fields -e eth.src -e ip.src -e eth.dst -e ip.dst
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226
fa:16:3e:13:94:25	10.0.0.226	52:54:00:a0:b6:e6	10.0.0.11
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226
fa:16:3e:13:94:25	10.0.0.226	52:54:00:a0:b6:e6	10.0.0.11
52:54:00:a0:b6:e6	10.0.0.11	fa:16:3e:13:94:25	10.0.0.226
fa:16:3e:13:94:25	10.0.0.226	52:54:00:a0:b6:e6	10.0.0.11

Comment 1 Roman Safronov 2020-01-06 16:43:29 UTC
Created attachment 1650181 [details]
tempest log when the test reproducing the issue is running

Comment 2 Roman Safronov 2020-01-06 16:47:20 UTC
Created attachment 1650182 [details]
downstream test_ovn_dvr.py code for reproducing the issue

Comment 3 Jakub Libosvar 2020-01-07 14:22:31 UTC
Assigning to myself to triage

Comment 5 Jakub Libosvar 2020-01-10 15:05:50 UTC
I think we're hitting the same issue in the migration job, specifically in the port-migration validation part. I think I was able to pinpoint the problem. The summary is that because of flooded anonymous unicast, controller replies to ICMP request before compute node and because this is virtualized environment, host linux bridge serving as an L2 switch learns wrong port for given FIP. The situation is fixed once undercloud sends ARP request and only compute node responds. Details are as follows:


The test is pinging the floating IP 10.0.0.238 (MAC fa:16:3e:ed:9d:7b) of a VM hosted on compute-1 from the undercloud-0 IP 10.0.0.85

Undercloud knows the MAC of FIP from previous pre-migration tests.
[root@undercloud-0 ~]# ip neigh | grep 238
10.0.0.238 dev eth2 lladdr fa:16:3e:ed:9d:7b STALE

However entry for given MAC on linux bridge on hypervisor has already expired and linux bridge doesn't know to which port send packets with destination mac of the FIP.

Note the times below are not synced well on the nodes.

1) Because undercloud knows the MAC, it's not gonna send ARP request and send ICMP request right away:
undercloud:
17:56:24.520460 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 43722, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 1, length 64

2) linux bridge on hypervisor doesn't know the port thus it treats it as anonymous unicast and floods it to all the ports, including port to controller-0
controller-0:
17:56:21.961568 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 43722, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 1, length 64

compute-1:
17:56:22.374363 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 43722, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 1, length 64

3) controller treats it in centralized fashion, NATs the packet and sends the ICMP request via Geneve to compute-1 (I haven't captured that) and VM replies

4) controller replies to the ping - note the source IP being the fixed IP 172.168.199.48
controller-0:
17:56:21.976714 fa:16:3e:ed:9d:7b > 52:54:00:f0:21:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 46602, offset 0, flags [none], proto ICMP (1), length 84)
    172.168.199.48 > 10.0.0.85: ICMP echo reply, id 11848, seq 1, length 64
5) linux bridge on hypervisor learns wrongly that FIP MAC fa:16:3e:ed:9d:7b is on controller-0 instead of compute-0, because the ping came from there

6) undercloud sees source address being different than the FIP one so doesn't treat the packet as correct ping response

7) all following ICMP requests destined to FIP MAC are sent to controller by hypervisor
17:56:22.995899 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 43835, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 2, length 64
17:56:24.019398 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 44174, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 3, length 64
17:56:25.044132 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 45046, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 4, length 64
17:56:26.069138 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 45468, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 5, length 64
17:56:27.091393 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 46468, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 6, length 64
17:56:27.348197 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28
17:56:28.115775 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 46594, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 7, length 64
17:56:28.371237 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28
17:56:29.139381 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 47476, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 8, length 64
17:56:29.395205 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28
17:56:30.163767 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 47935, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 11848, seq 9, length 64

8) undercloud sends broadcast ARP Request for 10.0.0.238, meaning hypervisor sends it to all nodes
17:56:31.187610 52:54:00:f0:21:7b > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28

9) compute-0 sees it and responds correctly
compute-0:
17:56:31.600349 52:54:00:f0:21:7b > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28
17:56:31.600849 fa:16:3e:ed:9d:7b > 52:54:00:f0:21:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.0.0.238 is-at fa:16:3e:ed:9d:7b, length 28

10) hypervisor linux bridge learns that MAC fa:16:3e:ed:9d:7b is not on compute-1

11) the situation is now correct and pings start working on undercloud


There is a thing I don't understand yet, which is "why compute-1 didn't reply to the first ICMP request and why controller-0 stops responding to ICMP seq 2-8"

Comment 6 Daniel Alvarez Sanchez 2020-01-15 09:46:07 UTC
This is what I see after debugging further, plus will answer to Kuba's question in his last comment :)


1) First, I delete the MAC_Binding table contents from the SB database. This will 'simulate' the migration process where the MAC address for the FIP was learnt from the undercloud (or tempest executor) and the OVN DB is freshly created thus the MAC_Binding table is empty:

()[root@compute-1 /]# for i in $(ovn-sbctl --db=tcp:172.17.1.51:6642 list mac_binding |grep _uuid |awk {'print $3'}); do ovn-sbctl --db=tcp:172.17.1.51:6642 destroy mac_binding $i ; done


2) Then ping the FIP from the undercloud which has a stale entry for the FIP MAC address:

(overcloud) [stack@undercloud-0 ~]$ sudo ip neigh | grep 238
10.0.0.238 dev eth2 lladdr fa:16:3e:ed:9d:7b STALE

(overcloud) [stack@undercloud-0 ~]$ ping -c5 10.0.0.238
PING 10.0.0.238 (10.0.0.238) 56(84) bytes of data.

--- 10.0.0.238 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 83ms


** Traffic flow ***


4) The ICMP echo request is sent by the undercloud and reaches controller and compute nodes

As Kuba mentioned in c5, the ToR switch (in this case, the underlying linux bridge in the hypervisor) doesn't know where the MAC address is so it floods the ICMP echo request everywhere reaching the controller and compute nodes. This is observed in the provider interface on both:


Compute1:

[root@compute-1 ~]# tcpdump -ennvvi ens5 icmp or arp
tcpdump: listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes
21:51:50.016952 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 21281, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 46543, seq 1, length 64


Controller0:

[root@controller-0 ~]# tcpdump -ennvvi ens5 arp or icmp
tcpdump: listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes
21:51:49.603687 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 21281, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 46543, seq 1, length 64



5) ovn-controller in Compute1 processes the ICMP Echo Request and delivers it to the VM

Verified by checking n_packets before and after:
[root@compute-1 ~]# ovs-ofctl dump-flows br-int | grep "output:7"
 cookie=0x0, duration=448633.349s, table=65, n_packets=384, n_bytes=34786, idle_age=5, hard_age=65534, priority=100,reg15=0x4,metadata=0x3 actions=output:7

6) The VM replies to the ping with an ICMP Echo Reply:

@ the VM:
22:12:44.555837 fa:16:3e:01:45:e5 > fa:16:3e:09:27:87, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 3495, offset 0, flags [none], proto ICMP (1), length 84)
    172.168.199.48 > 10.0.0.85: ICMP echo reply, id 9388, seq 1, length 64


7) ovn-controller doesn't know the MAC address of 10.0.0.85 (undercloud) because I cleaned up MAC_Binding table so it sends an ARP:

21:51:50.023766 fa:16:3e:4b:06:ef > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.85 tell 10.0.0.211, length 28

10.0.0.211 is the gateway port IP address which is bound to controller-0:

()[root@compute-1 /]# ovn-nbctl --db=tcp:172.17.1.51:6641 list logical_router_port |grep "10.0.0.211" -C6 | grep ^name
name                : "lrp-8e202c45-ccd1-472f-bb2e-00378ce667e5"

()[root@compute-1 /]# ovn-sbctl --db=tcp:172.17.1.51:6642 show |grep lrp-8e202c45-ccd1-472f-bb2e-00378ce667e5 -B4 | grep hostname
    hostname: "controller-0.redhat.local"


8) Undercloud replies to the ARP and it arrives to the controller-0:

[root@controller-0 ~]# tcpdump -ennvvi ens5 arp or icmp
21:51:49.610596 52:54:00:f0:21:7b > fa:16:3e:4b:06:ef, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.0.0.85 is-at 52:54:00:f0:21:7b, length 28


So ovn-controller on controller-0 adds a MAC_Binding entry to the SB database:

_uuid               : d69537a2-ecc9-4398-afec-6c0a997c799f
datapath            : b92566b8-8b4d-4a58-adee-84acff14a1c2
ip                  : "10.0.0.85"
logical_port        : "lrp-8e202c45-ccd1-472f-bb2e-00378ce667e5"
mac                 : "52:54:00:f0:21:7b"


9) After the MAC is learnt, the packet is resumed in compute-1 and the ICMP echo reply from the VM is pushed via the Geneve tunnel to controller-0:  
*THIS LOOKS LIKE THE REAL BUG*

Geneve interface:

21:51:49.614490 fa:16:3e:4b:06:ef > 52:54:00:f0:21:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 54902, offset 0, flags [none], proto ICMP (1), length 84)
    172.168.199.48 > 10.0.0.85: ICMP echo reply, id 46543, seq 1, length 64

10) The ICMP echo reply packet is sent out through the provider interface to the undercloud

Provider interface:

21:51:49.615709 fa:16:3e:ed:9d:7b > 52:54:00:f0:21:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 54902, offset 0, flags [none], proto ICMP (1), length 84)
    172.168.199.48 > 10.0.0.85: ICMP echo reply, id 46543, seq 1, length 64


11) The Linux Bridge learns the MAC of the FIP (10.0.0.238 fa:16:3e:ed:9d:7b) on the port of the controller-0

12) Subsequent ICMP echo requests are observed in the provider interface of the controller-0 and they'll be dropped in the integration bridge:


[root@controller-0 ~]# tcpdump -ennvvi ens5 arp or icmp
tcpdump: listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes
21:51:50.610001 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 21713, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 46543, seq 2, length 64
21:51:51.634608 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 21817, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 46543, seq 3, length 64
21:51:52.658961 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 22459, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 46543, seq 4, length 64
21:51:53.683261 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 23089, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.85 > 10.0.0.238: ICMP echo request, id 46543, seq 5, length 64


13) The undercloud will now send ARP requests for the FIP (10.0.0.238), compute-1 will reply and the Linux Bridge will learn the FIP MAC on the right port:

[root@dell-r730-004 ~]# tcpdump -vvnee -i external arp
tcpdump: listening on external, link-type EN10MB (Ethernet), capture size 262144 bytes
04:35:30.516984 fa:16:3e:4b:06:ef > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.85 tell 10.0.0.211, length 28
04:35:30.517126 52:54:00:f0:21:7b > fa:16:3e:4b:06:ef, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.0.0.85 is-at 52:54:00:f0:21:7b, length 28

04:35:35.553573 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28


04:35:36.498712 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28
04:35:37.443864 52:54:00:f0:21:7b > fa:16:3e:ed:9d:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28
04:35:39.040301 52:54:00:f0:21:7b > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.238 tell 10.0.0.85, length 28
04:35:39.040874 fa:16:3e:ed:9d:7b > 52:54:00:f0:21:7b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.0.0.238 is-at fa:16:3e:ed:9d:7b, length 28



[root@dell-r730-004 ~]# while true; do  echo `date +"%T.%N"` `brctl showmacs external | grep fa:16:3e:ed:9d:7b` && sleep 1; done


04:35:35.959411345 6 fa:16:3e:ed:9d:7b no 5.43
04:35:36.963957327 6 fa:16:3e:ed:9d:7b no 6.44
04:35:37.968618892 6 fa:16:3e:ed:9d:7b no 7.44
04:35:38.973409151 6 fa:16:3e:ed:9d:7b no 8.45
04:35:39.977980409 3 fa:16:3e:ed:9d:7b no 0.01
04:35:40.983020193 3 fa:16:3e:ed:9d:7b no 0.09
04:35:41.988080917 3 fa:16:3e:ed:9d:7b no 0.17


The ARP reply from the compute-1 arrives at 04:35:39.040874 which is when the MAC switches from port 6 to port 3 in the bridge and since this moment, the ping starts to work.


The problem seems to be in the resume of the execution of the first ICMP echo reply sent by the VM when the destination IP is tried to be resolved. This packet is sent via the Geneve tunnel while it should be sent via the provider interface.

Setting NEEDINFO on Lorenzo who added this feature on core OVN [0]


[0] https://patchwork.ozlabs.org/patch/985916/

Comment 7 Daniel Alvarez Sanchez 2020-01-15 09:50:43 UTC
Some notes on my previous comment:

- The stale MAC entry in the undercloud does not matter at all, even if there's an ARP request, the resume of the first ICMP echo reply will first go through the tunnel which is causing the observed behavior

- This is not affecting just the migration from ML2/OVS to ML2/OVN but mostly any traffic going from a compute node to the provider network for an unknown destination address.

Comment 13 Jianlin Shi 2020-03-18 10:32:44 UTC
following step in https://bugzilla.redhat.com/show_bug.cgi?id=1809629#c3.

reproduced on 2.11.1-33:

[root@dell-per740-12 bz1809629]# rpm -qa | grep -E "openvswitch|ovn"
kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch
openvswitch2.11-2.11.0-50.el8fdp.x86_64                                                               
ovn2.11-2.11.1-33.el8fdp.x86_64                                                                       
ovn2.11-central-2.11.1-33.el8fdp.x86_64                                                               
kernel-kernel-networking-openvswitch-ovn-acl-1.0-5.noarch
ovn2.11-host-2.11.1-33.el8fdp.x86_64                                                                  
kernel-kernel-networking-openvswitch-ovn-basic-1.0-20.noarch
openvswitch-selinux-extra-policy-1.0-22.el8fdp.noarch


[root@hp-dl380pg8-12 bz1809629]# ip netns exec p2 ping 172.16.1.100 -c 3
PING 172.16.1.100 (172.16.1.100) 56(84) bytes of data.
64 bytes from 172.16.1.100: icmp_seq=1 ttl=63 time=3.87 ms                                            
64 bytes from 172.16.1.100: icmp_seq=2 ttl=63 time=0.273 ms                                           
64 bytes from 172.16.1.100: icmp_seq=3 ttl=63 time=0.242 ms                                           

--- 172.16.1.100 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 25ms
rtt min/avg/max/mdev = 0.242/1.461/3.869/1.702 ms

[root@dell-per740-12 bz1809629]# ovn-sbctl find mac_binding ip=172.16.1.2
_uuid               : 5f101df7-7ecc-4929-9073-8777ac92c898
datapath            : 7e98b95e-aba9-4fd5-9cb5-618d83ef8456
ip                  : "172.16.1.2"                                                                    
logical_port        : "lr1-ls0"                                                                       
mac                 : "ba:d2:12:0e:14:ba"                                                             
[root@dell-per740-12 bz1809629]# ovn-sbctl destroy mac_binding 5f101df7-7ecc-4929-9073-8777ac92c898


[root@hp-dl380pg8-12 bz1809629]# ip netns exec p2 ping 172.16.1.100 -c 3                              
PING 172.16.1.100 (172.16.1.100) 56(84) bytes of data.                                                
64 bytes from 172.16.1.100: icmp_seq=2 ttl=63 time=1.20 ms                                            
64 bytes from 172.16.1.100: icmp_seq=3 ttl=63 time=0.285 ms

--- 172.16.1.100 ping statistics ---
3 packets transmitted, 2 received, 33.3333% packet loss, time 31ms
rtt min/avg/max/mdev = 0.285/0.740/1.196/0.456 ms

<=== first packet lose

Verified on 2.11.1-37:

[root@hp-dl380pg8-12 bz1809629]# rpm -qa | grep -E "openvswitch|ovn"
ovn2.11-2.11.1-37.el8fdp.x86_64                                                                       
kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch                                          
ovn2.11-host-2.11.1-37.el8fdp.x86_64                                                                  
ovn2.11-central-2.11.1-37.el8fdp.x86_64                                                               
kernel-kernel-networking-openvswitch-ovn-acl-1.0-5.noarch                                             
openvswitch2.11-2.11.0-50.el8fdp.x86_64                                                               
kernel-kernel-networking-openvswitch-ovn-basic-1.0-20.noarch                                          
openvswitch-selinux-extra-policy-1.0-22.el8fdp.noarch

[root@hp-dl380pg8-12 bz1809629]# ip netns exec p2 ping 172.16.1.100 -c 3                              
PING 172.16.1.100 (172.16.1.100) 56(84) bytes of data.                                                
64 bytes from 172.16.1.100: icmp_seq=1 ttl=63 time=7.50 ms                                            
64 bytes from 172.16.1.100: icmp_seq=2 ttl=63 time=0.267 ms                                           
64 bytes from 172.16.1.100: icmp_seq=3 ttl=63 time=0.244 ms                                           
                                                                                                      
--- 172.16.1.100 ping statistics ---                                                                  
3 packets transmitted, 3 received, 0% packet loss, time 65ms                                          
rtt min/avg/max/mdev = 0.244/2.670/7.501/3.416 ms  

[root@dell-per740-12 bz1809629]# ovn-sbctl find mac_binding ip=172.16.1.2
_uuid               : ba3cd4db-2e94-4b3d-b586-7ce8304d7c98
datapath            : 6f7e4668-46e7-4bb5-9a79-eecc0e2b5b4a
ip                  : "172.16.1.2"                                                                    
logical_port        : "lr1-ls0"                                                                       
mac                 : "26:5b:8b:68:82:d5"                                                             
[root@dell-per740-12 bz1809629]# ovn-sbctl destroy mac_binding ba3cd4db-2e94-4b3d-b586-7ce8304d7c98

[root@hp-dl380pg8-12 bz1809629]# ip netns exec p2 ping 172.16.1.100 -c 3                              
PING 172.16.1.100 (172.16.1.100) 56(84) bytes of data.                                                
64 bytes from 172.16.1.100: icmp_seq=1 ttl=63 time=4.37 ms                                            
64 bytes from 172.16.1.100: icmp_seq=2 ttl=63 time=0.268 ms                                           
64 bytes from 172.16.1.100: icmp_seq=3 ttl=63 time=0.261 ms                                           
                                                                                                      
--- 172.16.1.100 ping statistics ---                                                                  
3 packets transmitted, 3 received, 0% packet loss, time 63ms                                          
rtt min/avg/max/mdev = 0.261/1.633/4.370/1.935 ms 

<=== no packet lose

Comment 14 Jianlin Shi 2020-03-19 03:16:09 UTC
verified on rhel7 version:

[root@hp-dl380pg8-13 bz1809629]# rpm -qa | grep -E "openvswitch|ovn"                                  
ovn2.11-host-2.11.1-37.el7fdp.x86_64                                                                  
openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch                                                 
openvswitch2.11-2.11.0-50.el7fdp.x86_64                                                               
ovn2.11-central-2.11.1-37.el7fdp.x86_64                                                               
ovn2.11-2.11.1-37.el7fdp.x86_64

[root@hp-dl380pg8-13 bz1809629]# ip netns exec p2 ping 172.16.1.100 -c 3
PING 172.16.1.100 (172.16.1.100) 56(84) bytes of data.
64 bytes from 172.16.1.100: icmp_seq=1 ttl=63 time=6.84 ms                                            
64 bytes from 172.16.1.100: icmp_seq=2 ttl=63 time=0.211 ms                                           
64 bytes from 172.16.1.100: icmp_seq=3 ttl=63 time=0.169 ms                                           

--- 172.16.1.100 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms                                        
rtt min/avg/max/mdev = 0.169/2.407/6.841/3.135 ms

[root@dell-per740-42 bz1809629]# ovn-sbctl find mac_binding ip=172.16.1.2
_uuid               : 596166b9-f7cc-4eee-968f-f3f251810750
datapath            : a9a55239-b456-45f4-a24d-d27190efd258
ip                  : "172.16.1.2"                                                                    
logical_port        : "lr1-ls0"                                                                       
mac                 : "ae:9c:a3:c4:8f:37"                                                             
[root@dell-per740-42 bz1809629]# ovn-sbctl destroy mac_binding 596166b9-f7cc-4eee-968f-f3f251810750

[root@hp-dl380pg8-13 bz1809629]# ip netns exec p2 ping 172.16.1.100 -c 3                              
PING 172.16.1.100 (172.16.1.100) 56(84) bytes of data.                                                
64 bytes from 172.16.1.100: icmp_seq=1 ttl=63 time=3.07 ms                                            
64 bytes from 172.16.1.100: icmp_seq=2 ttl=63 time=0.152 ms                                           
64 bytes from 172.16.1.100: icmp_seq=3 ttl=63 time=0.086 ms                                           
                                                                                                      
--- 172.16.1.100 ping statistics ---                                                                  
3 packets transmitted, 3 received, 0% packet loss, time 2000ms                                        
rtt min/avg/max/mdev = 0.086/1.105/3.079/1.396 ms 

<==== no packet loss

Comment 16 errata-xmlrpc 2020-04-14 14:29:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1458

Comment 17 Red Hat Bugzilla 2023-09-14 05:49:25 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days