Description of problem: With testing builds of OCP 4.3 with various changes to enable OVN and IPv6 support, we are seeing some IPv6 neighbor discovery flows getting rejected by the kernel, because OVS userspace is using a key that’s only valid for the userspace datapath. Version-Release number of selected component (if applicable): openvswitch2.12-2.12.0-4.el7fdp.x86_64.rpm Additional info: Our OVS userspace is compiled with userspace (DPDK) datapath support, is this hit when compiled with DPDK support and used with the kernel datapath? From dmesg, repeatedly many times: [10237.359250] openvswitch: netlink: Key type 30 is out of range max 29 It seems OVS is using OVS_KEY_ATTR_ND_EXTENSIONS that is only supposed to be used with the userspace datapath, and is not valid for the kernel datapath, so the kernel is rejecting it. This extension is IPv6 specific (ND == Neighbor Discovery), so makes sense that we’re not seeing this with IPv4 clusters. From the ovs-daemons pod: 2019-11-18T10:42:00.708Z|00001|dpif(handler8)|WARN|system@ovs-system: failed to put[create] (Invalid argument) ufid:9d424c18-2399-45b1-aee8-3cd001789704 recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(11),skb_mark(0/0),ct_state(0/0x37),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=36:06:de:00:00:07,dst=33:33:ff:00:00:01),eth_type(0x86dd),ipv6(src=fd01::3406:deff:fe00:7,dst=ff02::1:ff00:1,label=0/0,proto=58,tclass=0/0,hlimit=255,frag=no),icmpv6(type=135,code=0),nd(target=fd01::1,sll=36:06:de:00:00:07,tll=00:00:00:00:00:00), actions:ct_clear,userspace(pid=3775151066,controller(reason=1,dont_send=0,continuation=0,recirc_id=53,rule_cookie=0x8bf20a99,controller_id=0,max_len=65535)),12,7,15,4,10,14,13,6,9,8
Patch posted upstream: https://mail.openvswitch.org/pipermail/ovs-dev/2019-November/364914.html Test build with the patch applied: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=24845421
Ok, I've tested an IPv6 configured cluster with OCP 4.4.0-0.ci-2019-11-20-172519 release image, our IPv6 changes, and openvswitch2.12-2.12.0-4.el7fdp.bz1773598.2.x86_64 The behavior seems to be the same as the previous revert fix. I don't see the previous messages from above in dmesg or ovs-daemons logs. Looks good!
Fixed in Version: openvswitch2.12-2.12.0-8.el7fdp Brew build: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=24906600
with reproducer in https://bugzilla.redhat.com/show_bug.cgi?id=1775778#c2, reproduced on openvswitch2.12.0-4: + ip netns exec fake_vm2 ping6 -I fake_vm2 fd01::28e7:a3ff:fe00:5 -c 5 PING fd01::28e7:a3ff:fe00:5(fd01::28e7:a3ff:fe00:5) from fd01::28e7:a3ff:fe00:8 fake_vm2: 56 data bytes 64 bytes from fd01::28e7:a3ff:fe00:5: icmp_seq=1 ttl=64 time=3.28 ms 64 bytes from fd01::28e7:a3ff:fe00:5: icmp_seq=2 ttl=64 time=0.855 ms 64 bytes from fd01::28e7:a3ff:fe00:5: icmp_seq=3 ttl=64 time=0.126 ms 64 bytes from fd01::28e7:a3ff:fe00:5: icmp_seq=4 ttl=64 time=0.069 ms 64 bytes from fd01::28e7:a3ff:fe00:5: icmp_seq=5 ttl=64 time=0.066 ms --- fd01::28e7:a3ff:fe00:5 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4006ms rtt min/avg/max/mdev = 0.066/0.880/3.286/1.239 ms + dmesg [ 1233.259918] device ovs-system entered promiscuous mode [ 1233.353782] device br-int entered promiscuous mode [ 1233.957868] device fake_vm1 entered promiscuous mode [ 1234.121333] device fake_vm2 entered promiscuous mode [ 1234.528842] Netfilter messages via NETLINK v0.30. [ 1234.553894] ctnetlink v0.93: registering with nfnetlink. [ 1234.596639] i40e 0000:af:00.0 p4p1: port 6081 already offloaded [ 1234.602544] i40e 0000:af:00.1 p4p2: port 6081 already offloaded [ 1234.609224] device genev_sys_6081 entered promiscuous mode [ 1235.069676] openvswitch: netlink: Key type 30 is out of range max 28 <==== error dmesg [root@dell-per740-12 bz1773598]# rpm -qa | grep -E "openvswitch|ovn" ovn2.12-host-2.12.0-7.el7fdp.x86_64 openvswitch2.12-2.12.0-4.el7fdp.x86_64 ovn2.12-central-2.12.0-7.el7fdp.x86_64 ovn2.12-2.12.0-7.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch Verified on openvswitch2.12.0-8: + ip netns exec fake_vm2 ping6 -I fake_vm2 fd01::28e7:a3ff:fe00:5 -c 5 PING fd01::28e7:a3ff:fe00:5(fd01::28e7:a3ff:fe00:5) from fd01::28e7:a3ff:fe00:8 fake_vm2: 56 data bytes 64 bytes from fd01::28e7:a3ff:fe00:5: icmp_seq=1 ttl=64 time=3.14 ms 64 bytes from fd01::28e7:a3ff:fe00:5: icmp_seq=2 ttl=64 time=1.01 ms 64 bytes from fd01::28e7:a3ff:fe00:5: icmp_seq=3 ttl=64 time=0.110 ms 64 bytes from fd01::28e7:a3ff:fe00:5: icmp_seq=4 ttl=64 time=0.104 ms 64 bytes from fd01::28e7:a3ff:fe00:5: icmp_seq=5 ttl=64 time=0.102 ms --- fd01::28e7:a3ff:fe00:5 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4004ms rtt min/avg/max/mdev = 0.102/0.895/3.145/1.178 ms + dmesg [ 1400.958115] device br-int left promiscuous mode [ 1401.078285] device br-int entered promiscuous mode [ 1401.660471] device fake_vm1 entered promiscuous mode [ 1401.818562] device fake_vm2 entered promiscuous mode [ 1402.226253] i40e 0000:af:00.0 p4p1: port 6081 already offloaded [ 1402.232168] i40e 0000:af:00.1 p4p2: port 6081 already offloaded [ 1402.238627] device genev_sys_6081 entered promiscuous mode <=== no error dmesg [root@dell-per740-12 bz1773598]# rpm -qa | grep -E "openvswitch|ovn" ovn2.12-host-2.12.0-7.el7fdp.x86_64 ovn2.12-central-2.12.0-7.el7fdp.x86_64 ovn2.12-2.12.0-7.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch openvswitch2.12-2.12.0-8.el7fdp.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:4206