Created attachment 1848849 [details] OVN Northbound DB Description of problem: When using OpenStack Neutron with OVN, create 2 vms plugged to the same L2 network and disable port security on those ports. In such case L2 connectivity between vms will not work properly - local arp responder from OVN will not reply to arp requests and also those broadcast packets will not be sent through geneve tunnel to the other hosts. When for test I enabled port security for those Neutron ports, connectivity worked fine. Also when I disabled port security and added ARP entries inside guest VMs manually, connectivity like e.g. ICMP worked fine then. Version-Release number of selected component (if applicable): ovn-2021-21.09.0-20.el8fdp.x86_64 Additional info from Daniel Alvarez who did more investigation on the OVN's side: An ovn-trace [0] seems to indicate that the traffic will be flooded to all ports, incl the destination one but I can't see that being handled in table 37 and not present in the geneve tunnel with tcpdump either. I'm confused as the dest port (cfd62) is bound to a remote hypervisor so it should be handled in table 37 and ovn-trace shows that it'll be handled in table 64. I then added port security to the dest port and the ARP is not coming out either: [root@controller-0 /]# ovn-nbctl lsp-get-port-security cfd623b9-d99f-43a6-ad2d-55556f6d8365 fa:16:3e:f9:d4:a0 10.100.154.228 Note that the source port had port security enabled already: [root@controller-0 /]# ovn-nbctl lsp-get-port-security 5867b9da-2c7f-4a35-8269-a2f923c937a7 fa:16:3e:5c:3a:36 10.100.154.98 2001:db8:0:9aae:f816:3eff:fe5c:3a36 10.100.154.89 That's probably because of the 'unknown' in the addresses of the dest port. addresses : ["fa:16:3e:f9:d4:a0 10.100.154.228 2001:db8:0:9aae:f816:3eff:fef9:d4a0", unknown] So the question comes down to understanding why, even though the ARP request packet seems to be flooded, it's not seen in the geneve tunnel to the hypervisor where it is bound to. Have a great day! daniel [0] # ovn-trace --detailed --ovs --no-friendly-names neutron-1ff423b1-3ea5-4b95-97ad-19fdf82f958e 'inport == "5867b9da-2c7f-4a35-8269-a2f923c937a7" && eth.src == fa:16:3e:5c:3a:36 && eth.dst == ff:ff:ff:ff:ff:ff && arp.op == 1 && arp.sha == fa:16:3e:5c:3a:36 && arp.spa == 10.100.154.98 && arp.tpa == 10.100.154.228' # arp,reg14=0x7,vlan_tci=0x0000,dl_src=fa:16:3e:5c:3a:36,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=10.100.154.98,arp_tpa=10.100.154.228,arp_op=1,arp_sha=fa:16:3e:5c:3a:36,arp_tha=00:00:00:00:00:00 ingress(dp="08362dcb-2ecf-41ba-a8e9-233fe1636f74", inport="5867b9da-2c7f-4a35-8269-a2f923c937a7") ... multicast(dp="08362dcb-2ecf-41ba-a8e9-233fe1636f74", mcgroup="_MC_flood") ... egress(dp="08362dcb-2ecf-41ba-a8e9-233fe1636f74", inport="5867b9da-2c7f-4a35-8269-a2f923c937a7", outport="cfd623b9-d99f-43a6-ad2d-55556f6d8365") ------------------------------------------------------------------------------------------------------------------------------------------------ 3. ls_out_acl_hint (northd.c:5764): !ct.trk, priority 5, uuid caf21867 cookie=0xcaf21867, duration=1041134.684s, table=43, n_packets=5218, n_bytes=615724, idle_age=408, priority=5,ct_state=-trk,metadata=0x3 actions=set_field:0x100000000000000000000000000/0x100000000000000000000000000->xxreg0,set_field:0x200000000000000000000000000/0x200000000000000000000000000->xxreg0,resubmit(,44) cookie=0xcaf21867, duration=1041134.684s, table=43, n_packets=55398, n_bytes=2326764, idle_age=0, priority=5,ct_state=-trk,metadata=0x1 actions=set_field:0x100000000000000000000000000/0x100000000000000000000000000->xxreg0,set_field:0x200000000000000000000000000/0x200000000000000000000000000->xxreg0,resubmit(,44) reg0[8] = 1; reg0[9] = 1; next; 9. ls_out_port_sec_l2 (northd.c:5355): eth.mcast, priority 100, uuid 77373d73 cookie=0x77373d73, duration=1134558.639s, table=49, n_packets=1209, n_bytes=50922, idle_age=65535, priority=100,metadata=0x1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,64) cookie=0x77373d73, duration=975638.110s, table=49, n_packets=0, n_bytes=0, idle_age=65535, priority=100,metadata=0x8,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,64) cookie=0x77373d73, duration=1041223.419s, table=49, n_packets=5174, n_bytes=610532, idle_age=191, priority=100,metadata=0x5,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,64) cookie=0x77373d73, duration=1041244.111s, table=49, n_packets=5218, n_bytes=615724, idle_age=408, priority=100,metadata=0x3,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,64) cookie=0x77373d73, duration=1041249.860s, table=49, n_packets=43237, n_bytes=5270922, idle_age=206, priority=100,metadata=0x2,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,64) output; /* output to "cfd623b9-d99f-43a6-ad2d-55556f6d8365", type "" */
This happens only for parents of container ports, that is, only for Port-A where "Port-B.parent_name == Port-A".
Fix posted for review: http://patchwork.ozlabs.org/project/ovn/list/?series=279449&state=*
Moving to MODIFIED since this has been accepted upstream and is in a build downstream now.
Reproduced on: [root@bz-2036970 ~]# rpm -qa |grep -E "ovn|openvswitch" ovn-2021-21.09.0-13.el8fdp.x86_64 ovn-2021-central-21.09.0-13.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch ovn-2021-host-21.09.0-13.el8fdp.x86_64 openvswitch2.15-2.15.0-26.el8fdp.x86_64 ### Creatd two hypervisors. Setup on first hypervisor: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:42.42.42.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=42.42.42.1 systemctl restart ovn-controller ovn-nbctl ls-add ls1 ovn-nbctl lsp-add ls1 ls1p1 ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:01" ovn-nbctl lsp-add ls1 ls1p2 ovn-nbctl lsp-set-addresses ls1p2 "00:00:00:01:01:02" ovn-nbctl lsp-add ls1 ls1p1_c100 ls1p1 100 -- lsp-set-addresses ls1p1_c100 "00:00:00:01:11:01" ovn-nbctl lsp-add ls1 ls1p2_c100 ls1p2 100 -- lsp-set-addresses ls1p2_c100 "00:00:00:01:11:02" ovs-vsctl add-port br-int ls1p1 -- set interface ls1p1 type=internal external_ids:iface-id=ls1p1 ip netns add ls1p1 ip link set ls1p1 netns ls1p1 ip netns exec ls1p1 ip link set ls1p1 address 00:00:00:01:01:01 ip netns exec ls1p1 ip link add link ls1p1 name ls1p1.100 type vlan id 100 ip netns exec ls1p1 ip link set ls1p1.100 address 00:00:00:01:11:01 ip netns exec ls1p1 ip link set ls1p1 up ip netns exec ls1p1 ip link set ls1p1.100 up ip netns exec ls1p1 ip addr add 192.168.100.1/24 dev ls1p1.100 ### Setup on second hypervisor: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv0 external_ids:ovn-remote=tcp:42.42.42.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=42.42.42.2 systemctl restart ovn-controller ovs-vsctl add-port br-int ls1p2 -- set interface ls1p2 type=internal external_ids:iface-id=ls1p2 ip netns add ls1p2 ip link set ls1p2 netns ls1p2 ip netns exec ls1p2 ip link set ls1p2 address 00:00:00:01:01:02 ip netns exec ls1p2 ip link add link ls1p2 name ls1p2.100 type vlan id 100 ip netns exec ls1p2 ip link set ls1p2.100 address 00:00:00:01:11:02 ip netns exec ls1p2 ip link set ls1p2 up ip netns exec ls1p2 ip link set ls1p2.100 up ip netns exec ls1p2 ip addr add 192.168.100.2/24 dev ls1p2.100 ip netns exec ls1p2 tcpdump -U -i any -w arpdump.pcap & [root@bz-2036970 ~]# ip netns exec ls1p2 ping 192.168.100.1 -c 3 PING 192.168.100.1 (192.168.100.1) 56(84) bytes of data. --- 192.168.100.1 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2065ms [root@bz-2036970 ~]# pkill -9 tcpdump [root@bz-2036970 ~]# tcpdump -r arpdump.pcap -nnle reading from file arpdump.pcap, link-type LINUX_SLL (Linux cooked v1) dropped privs to tcpdump 07:12:51.097727 Out 00:00:00:01:11:02 ethertype ARP (0x0806), length 44: Request who-has 192.168.100.1 tell 192.168.100.2, length 28 07:12:51.097728 Out 00:00:00:01:11:02 ethertype 802.1Q (0x8100), length 48: vlan 100, p 0, ethertype ARP, Request who-has 192.168.100.1 tell 192.168.100.2, length 28 07:12:51.097892 B 00:00:00:01:11:02 ethertype ARP (0x0806), length 44: Request who-has 192.168.100.1 tell 192.168.100.2, length 28 07:12:52.138909 Out 00:00:00:01:11:02 ethertype ARP (0x0806), length 44: Request who-has 192.168.100.1 tell 192.168.100.2, length 28 07:12:52.138912 Out 00:00:00:01:11:02 ethertype 802.1Q (0x8100), length 48: vlan 100, p 0, ethertype ARP, Request who-has 192.168.100.1 tell 192.168.100.2, length 28 07:12:52.138917 B 00:00:00:01:11:02 ethertype ARP (0x0806), length 44: Request who-has 192.168.100.1 tell 192.168.100.2, length 28 07:12:53.162910 Out 00:00:00:01:11:02 ethertype ARP (0x0806), length 44: Request who-has 192.168.100.1 tell 192.168.100.2, length 28 07:12:53.162913 Out 00:00:00:01:11:02 ethertype 802.1Q (0x8100), length 48: vlan 100, p 0, ethertype ARP, Request who-has 192.168.100.1 tell 192.168.100.2, length 28 07:12:53.162918 B 00:00:00:01:11:02 ethertype ARP (0x0806), length 44: Request who-has 192.168.100.1 tell 192.168.100.2, length 28 07:13:07.626910 Out 00:00:00:01:11:02 ethertype IPv6 (0x86dd), length 72: fe80::200:ff:fe01:1102 > ff02::2: ICMP6, router solicitation, length 16 07:13:07.626912 Out 00:00:00:01:11:02 ethertype 802.1Q (0x8100), length 76: vlan 100, p 0, ethertype IPv6, fe80::200:ff:fe01:1102 > ff02::2: ICMP6, router solicitation, length 16 07:13:07.626962 Out 00:00:00:01:01:02 ethertype IPv6 (0x86dd), length 72: fe80::200:ff:fe01:102 > ff02::2: ICMP6, router solicitation, length 16 07:13:07.627119 M 00:00:00:01:11:02 ethertype IPv6 (0x86dd), length 72: fe80::200:ff:fe01:1102 > ff02::2: ICMP6, router solicitation, length 16 07:13:07.627156 M 00:00:00:01:01:02 ethertype 802.1Q (0x8100), length 76: vlan 100, p 0, ethertype IPv6, fe80::200:ff:fe01:102 > ff02::2: ICMP6, router solicitation, length 16 07:13:07.627157 M 00:00:00:01:01:02 ethertype IPv6 (0x86dd), length 72: fe80::200:ff:fe01:102 > ff02::2: ICMP6, router solicitation, length 16 [1]+ Killed ip netns exec ls1p2 tcpdump -U -i any -w arpdump.pcap <================= ARP failed. No ARP sent via geneve. Ping failed Verified on: [root@bz-2036970 ~]# rpm -qa | grep -E "ovn|openvswitch" openvswitch2.15-2.15.0-53.el8fdp.x86_64 ovn-2021-central-21.12.0-11.el8fdp.x86_64 ovn-2021-host-21.12.0-11.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch ovn-2021-21.12.0-11.el8fdp.x86_64 [root@bz-2036970 ~]# ip netns exec ls1p2 ping 192.168.100.1 -c 3 PING 192.168.100.1 (192.168.100.1) 56(84) bytes of data. 64 bytes from 192.168.100.1: icmp_seq=1 ttl=64 time=1.45 ms 64 bytes from 192.168.100.1: icmp_seq=2 ttl=64 time=0.221 ms 64 bytes from 192.168.100.1: icmp_seq=3 ttl=64 time=0.204 ms --- 192.168.100.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2031ms rtt min/avg/max/mdev = 0.204/0.626/1.454/0.585 ms [root@bz-2036970 ~]# tcpdump -r arpdump.pcap -nnle reading from file arpdump.pcap, link-type LINUX_SLL (Linux cooked v1) dropped privs to tcpdump 05:22:04.176445 M 00:00:00:01:01:01 ethertype 802.1Q (0x8100), length 76: vlan 100, p 0, ethertype IPv6, fe80::200:ff:fe01:101 > ff02::2: ICMP6, router solicitation, length 16 05:22:04.176446 M 00:00:00:01:01:01 ethertype IPv6 (0x86dd), length 72: fe80::200:ff:fe01:101 > ff02::2: ICMP6, router solicitation, length 16 05:22:04.176453 M 00:00:00:01:01:01 ethertype IPv6 (0x86dd), length 72: fe80::200:ff:fe01:101 > ff02::2: ICMP6, router solicitation, length 16 05:22:13.904510 M 00:00:00:01:11:01 ethertype 802.1Q (0x8100), length 76: vlan 100, p 0, ethertype IPv6, fe80::200:ff:fe01:1101 > ff02::2: ICMP6, router solicitation, length 16 05:22:13.904511 M 00:00:00:01:11:01 ethertype IPv6 (0x86dd), length 72: fe80::200:ff:fe01:1101 > ff02::2: ICMP6, router solicitation, length 16 05:22:13.904517 M 00:00:00:01:11:01 ethertype IPv6 (0x86dd), length 72: fe80::200:ff:fe01:1101 > ff02::2: ICMP6, router solicitation, length 16 05:22:14.524277 Out 00:00:00:01:11:02 ethertype ARP (0x0806), length 44: Request who-has 192.168.100.1 tell 192.168.100.2, length 28 05:22:14.524279 Out 00:00:00:01:11:02 ethertype 802.1Q (0x8100), length 48: vlan 100, p 0, ethertype ARP, Request who-has 192.168.100.1 tell 192.168.100.2, length 28 05:22:14.524436 B 00:00:00:01:11:02 ethertype ARP (0x0806), length 44: Request who-has 192.168.100.1 tell 192.168.100.2, length 28 05:22:14.525008 P 00:00:00:01:11:01 ethertype 802.1Q (0x8100), length 48: vlan 100, p 0, ethertype ARP, Reply 192.168.100.1 is-at 00:00:00:01:11:01, length 28 05:22:14.525008 In 00:00:00:01:11:01 ethertype ARP (0x0806), length 44: Reply 192.168.100.1 is-at 00:00:00:01:11:01, length 28 05:22:14.525016 Out 00:00:00:01:11:02 ethertype IPv4 (0x0800), length 100: 192.168.100.2 > 192.168.100.1: ICMP echo request, id 32726, seq 1, length 64 05:22:14.525017 Out 00:00:00:01:11:02 ethertype 802.1Q (0x8100), length 104: vlan 100, p 0, ethertype IPv4, 192.168.100.2 > 192.168.100.1: ICMP echo request, id 32726, seq 1, length 64 05:22:14.525716 P 00:00:00:01:11:01 ethertype 802.1Q (0x8100), length 104: vlan 100, p 0, ethertype IPv4, 192.168.100.1 > 192.168.100.2: ICMP echo reply, id 32726, seq 1, length 64 05:22:14.525716 In 00:00:00:01:11:01 ethertype IPv4 (0x0800), length 100: 192.168.100.1 > 192.168.100.2: ICMP echo reply, id 32726, seq 1, length 64 05:22:15.525882 Out 00:00:00:01:11:02 ethertype IPv4 (0x0800), length 100: 192.168.100.2 > 192.168.100.1: ICMP echo request, id 32726, seq 2, length 64 05:22:15.525885 Out 00:00:00:01:11:02 ethertype 802.1Q (0x8100), length 104: vlan 100, p 0, ethertype IPv4, 192.168.100.2 > 192.168.100.1: ICMP echo request, id 32726, seq 2, length 64 05:22:15.526084 P 00:00:00:01:11:01 ethertype 802.1Q (0x8100), length 104: vlan 100, p 0, ethertype IPv4, 192.168.100.1 > 192.168.100.2: ICMP echo reply, id 32726, seq 2, length 64 05:22:15.526084 In 00:00:00:01:11:01 ethertype IPv4 (0x0800), length 100: 192.168.100.1 > 192.168.100.2: ICMP echo reply, id 32726, seq 2, length 64 05:22:16.554936 Out 00:00:00:01:11:02 ethertype IPv4 (0x0800), length 100: 192.168.100.2 > 192.168.100.1: ICMP echo request, id 32726, seq 3, length 64 05:22:16.554939 Out 00:00:00:01:11:02 ethertype 802.1Q (0x8100), length 104: vlan 100, p 0, ethertype IPv4, 192.168.100.2 > 192.168.100.1: ICMP echo request, id 32726, seq 3, length 64 05:22:16.555123 P 00:00:00:01:11:01 ethertype 802.1Q (0x8100), length 104: vlan 100, p 0, ethertype IPv4, 192.168.100.1 > 192.168.100.2: ICMP echo reply, id 32726, seq 3, length 64 05:22:16.555123 In 00:00:00:01:11:01 ethertype IPv4 (0x0800), length 100: 192.168.100.1 > 192.168.100.2: ICMP echo reply, id 32726, seq 3, length 64 05:22:19.536064 P 00:00:00:01:11:01 ethertype 802.1Q (0x8100), length 48: vlan 100, p 0, ethertype ARP, Request who-has 192.168.100.2 tell 192.168.100.1, length 28 05:22:19.536065 In 00:00:00:01:11:01 ethertype ARP (0x0806), length 44: Request who-has 192.168.100.2 tell 192.168.100.1, length 28 05:22:19.536071 Out 00:00:00:01:11:02 ethertype ARP (0x0806), length 44: Reply 192.168.100.2 is-at 00:00:00:01:11:02, length 28 05:22:19.536073 Out 00:00:00:01:11:02 ethertype 802.1Q (0x8100), length 48: vlan 100, p 0, ethertype ARP, Reply 192.168.100.2 is-at 00:00:00:01:11:02, length 28 [1]+ Killed ip netns exec ls1p2 tcpdump -U -i any -w arpdump.pcap <================= ARP passed through geneve. Ping successful
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0674
*** Bug 2066413 has been marked as a duplicate of this bug. ***