Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2078026

Summary: ovn-controller is not handling the IPv6 Neigh Adv message properly for mac learning
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Numan Siddique <nusiddiq>
Component: ovn-2021Assignee: Numan Siddique <nusiddiq>
Status: CLOSED ERRATA QA Contact: Ehsan Elahi <eelahi>
Severity: urgent Docs Contact:
Priority: urgent    
Version: FDP 21.KCC: ctrautma, ealcaniz, jiji, mmichels, xzhou, yinxu
Target Milestone: ---Keywords: CustomerScenariosInitiative
Target Release: FDP 22.D   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn-2021-21.12.0-46 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-27 18:14:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Numan Siddique 2022-04-22 21:48:07 UTC
Description of problem:

OVN learns mac bindings from IPv6 Neigh Solicit and Neigh Adv messages in the router pipeline.

The below logical flows are added in the router pipeline for mac binding learning

  table=2 (lr_in_learn_neighbor), priority=90   , match=(nd_na), action=(put_nd(inport, nd.target, nd.tll); next;)

  table=2 (lr_in_learn_neighbor), priority=90   , match=(nd_ns), action=(put_nd(inport, ip6.src, nd.sll); next;)


The logical flow for nd_na match is translated to OpenFlow as :

priority=90,icmp6,metadata=<x>,nw_ttl=255,icmp_type=136,icmp_code=0 actions=push:NXM_NX_XXREG0[],push:NXM_OF_ETH_SRC[],push:NXM_NX_ND_TLL[],push:NXM_NX_ND_TARGET[],pop:NXM_NX_XXREG0[],pop:NXM_OF_ETH_SRC[],controller(userdata=00.00.00.04.00.00.00.00),pop:NXM_OF_ETH_SRC[],pop:NXM_NX_XXREG0[],resubmit(,11)

The above flow, before sending the packet to controller, stores the original eth src in the stack and copies the IPv6 Neigh Adv ICMPv6 target link layer option to the eth src of the packet.  After the packet-in, the flow restores the original value of eth src.

When ovn-controller receives the packet,  it copies the IPv6 address from XXREG0 register and MAC address from dl_src.

The main problem is that ovn-controller assumes that Neigh Adv packet will have the ICMPv6 target link layer option present.
But that is not the case always as this field is optional.  When this option is not included, openflow field  NXM_NX_ND_TLL[] will be zero
and this gets copied to the eth src before the packet-in.

OVN controller updates or learns the mac binding in the SB DB with zero mac address.

This behaviour causes a huge problem with OVN kubernetes. 
In the shared gateway mode,  the GW router port IP is shared between OVN and the external OVS bridge (br-ex or breth0).
And when ovn-controller on a worker node 1 wants to learn the MAC of the gateway IP of worker node 2, it
sends a Neigh Solicitation request.  Worker node 2 receives this packet and Neighbor Adv packet is replied
by ovn-controller and also host linux stack.  And looks like the worker 2 linux stack doesn't include the target link layer option in the reply.

ovn-controller on Worker node 1 will receive both the packets and it will update the mac binding incorrectly.

And this can cause the actual traffic from a pod to the service to get delayed due to mac learning if the learnt mac was 0. And this can cause 
a loop.

The issue can reproduced with a kind ovn-k8s deployment with KIND_IPV6_SUPPORT=true.

To reproduce, deploy a kind setup, create a pod on worker1.
From this pod ping the GW router ip of worker 2.


To address this issue,  ovn-controller can learn the mac from the packet's eth source if the ICMPv6 target link layer option is not present.







Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Numan Siddique 2022-04-23 18:15:50 UTC
Patch to fix this issue is submitted for review - https://patchwork.ozlabs.org/project/ovn/patch/20220423181452.3698721-1-numans@ovn.org/

Comment 5 Numan Siddique 2022-04-25 16:18:20 UTC
Patch is merged u/s - https://github.com/ovn-org/ovn/commit/80187a8031b6abe01fb23657a9bed2372ae23af5

Comment 8 Ehsan Elahi 2022-05-01 14:24:38 UTC
Reproduced on 
[root@bz-2078026 ~]# rpm -qa |grep -E 'ovn|openvswitch'
ovn-2021-central-21.12.0-42.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch
openvswitch2.15-2.15.0-93.el8fdp.x86_64
ovn-2021-21.12.0-42.el8fdp.x86_64
ovn-2021-host-21.12.0-42.el8fdp.x86_64

systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
systemctl start openvswitch
ovs-vsctl set open . external_ids:system-id=hv1

ovs-vsctl set open . external_ids:ovn-remote=tcp:42.42.42.1:6642
ovs-vsctl set open . external_ids:ovn-encap-type=geneve
ovs-vsctl set open . external_ids:ovn-encap-ip=42.42.42.1
systemctl start ovn-controller
ovn-nbctl lr-add rtr
ovn-nbctl lrp-add rtr rtr-ls1 00:00:00:00:01:00 42.42.42.1/24 2000::1/64
ovn-nbctl lrp-add rtr rtr-ls2 00:00:00:00:02:00 77.77.77.1/24 2002::1/64

ovn-nbctl ls-add ls1
ovn-nbctl lsp-add ls1 ls1-rtr
ovn-nbctl lsp-set-addresses ls1-rtr 00:00:00:00:01:00
ovn-nbctl lsp-set-type ls1-rtr router
ovn-nbctl lsp-set-options ls1-rtr router-port=rtr-ls1
ovn-nbctl lsp-add ls1 vm1
ovn-nbctl lsp-set-addresses vm1 00:00:00:00:00:01

ovn-nbctl ls-add ls2
ovn-nbctl lsp-add ls2 ls2-rtr
ovn-nbctl lsp-set-addresses ls2-rtr 00:00:00:00:02:00
ovn-nbctl lsp-set-type ls2-rtr router
ovn-nbctl lsp-set-options ls2-rtr router-port=rtr-ls2
ovn-nbctl lsp-add ls2 vm2
ovn-nbctl lsp-set-addresses vm2 00:00:00:00:00:02

ip netns add vm1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ip link set vm1 netns vm1
ip netns exec vm1 ip link set vm1 address 00:00:00:00:00:01
ip netns exec vm1 ip addr add 42.42.42.2/24 dev vm1
ip netns exec vm1 ip -6 addr add 2000::2/64 dev vm1
ip netns exec vm1 ip link set vm1 up
ip netns exec vm1 ip route add default via 42.42.42.1
ip netns exec vm1 ip -6 route add default via 2000::1
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ip netns add vm2
ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
ip link set vm2 netns vm2
ip netns exec vm2 ip link set vm2 address 00:00:00:00:00:02
ip netns exec vm2 ip addr add 77.77.77.2/24 dev vm2
ip netns exec vm2 ip -6 addr add 2002::2/64 dev vm2
ip netns exec vm2 ip link set vm2 up
ip netns exec vm2 ip link set lo up
ip netns exec vm2 ip route add default via 77.77.77.1
ip netns exec vm2 ip -6 route add default via 2002::1
ovs-vsctl set Interface vm2 external_ids:iface-id=vm2

ip netns exec vm1 ping 77.77.77.2 -c 3

[root@bz-2078026 ~]# ovn-sbctl dump-flows | grep -e lr_in_lookup_neighbor -e lr_in_learn_neighbor
  table=1 (lr_in_lookup_neighbor), priority=100  , match=(arp.op == 2), action=(reg9[2] = lookup_arp(inport, arp.spa, arp.sha); next;)
  table=1 (lr_in_lookup_neighbor), priority=100  , match=(inport == "rtr-ls1" && arp.spa == 42.42.42.0/24 && arp.op == 1), action=(reg9[2] = lookup_arp(inport, arp.spa, arp.sha); next;)
  table=1 (lr_in_lookup_neighbor), priority=100  , match=(inport == "rtr-ls2" && arp.spa == 77.77.77.0/24 && arp.op == 1), action=(reg9[2] = lookup_arp(inport, arp.spa, arp.sha); next;)
  table=1 (lr_in_lookup_neighbor), priority=100  , match=(nd_na), action=(reg9[2] = lookup_nd(inport, nd.target, nd.tll); next;)
  table=1 (lr_in_lookup_neighbor), priority=100  , match=(nd_ns), action=(reg9[2] = lookup_nd(inport, ip6.src, nd.sll); next;)
  table=1 (lr_in_lookup_neighbor), priority=0    , match=(1), action=(reg9[2] = 1; next;)
  table=2 (lr_in_learn_neighbor), priority=100  , match=(reg9[2] == 1), action=(next;)
  table=2 (lr_in_learn_neighbor), priority=90   , match=(arp), action=(put_arp(inport, arp.spa, arp.sha); next;)
  table=2 (lr_in_learn_neighbor), priority=90   , match=(nd_na), action=(put_nd(inport, nd.target, nd.tll); next;)
  table=2 (lr_in_learn_neighbor), priority=90   , match=(nd_ns), action=(put_nd(inport, ip6.src, nd.sll); next;)


Verified on:
[root@wsbz-2078026 ~]# rpm -qa |grep -E 'ovn|openvswitch'
openvswitch2.15-2.15.0-93.el8fdp.x86_64
ovn-2021-central-21.12.0-46.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch
ovn-2021-host-21.12.0-46.el8fdp.x86_64
ovn-2021-21.12.0-46.el8fdp.x86_64

[root@bz-2078026 ~]# ovn-sbctl dump-flows | grep -e lr_in_lookup_neighbor -e lr_in_learn_neighbor | sort
  table=1 (lr_in_lookup_neighbor), priority=0    , match=(1), action=(reg9[2] = 1; next;)
  table=1 (lr_in_lookup_neighbor), priority=100  , match=(arp.op == 2), action=(reg9[2] = lookup_arp(inport, arp.spa, arp.sha); next;)
  table=1 (lr_in_lookup_neighbor), priority=100  , match=(inport == "rtr-ls1" && arp.spa == 42.42.42.0/24 && arp.op == 1), action=(reg9[2] = lookup_arp(inport, arp.spa, arp.sha); next;)
  table=1 (lr_in_lookup_neighbor), priority=100  , match=(inport == "rtr-ls2" && arp.spa == 77.77.77.0/24 && arp.op == 1), action=(reg9[2] = lookup_arp(inport, arp.spa, arp.sha); next;)
  table=1 (lr_in_lookup_neighbor), priority=100  , match=(nd_na), action=(reg9[2] = lookup_nd(inport, nd.target, nd.tll); next;)
  table=1 (lr_in_lookup_neighbor), priority=100  , match=(nd_ns), action=(reg9[2] = lookup_nd(inport, ip6.src, nd.sll); next;)
  table=2 (lr_in_learn_neighbor), priority=100  , match=(reg9[2] == 1), action=(next;)
  table=2 (lr_in_learn_neighbor), priority=90   , match=(arp), action=(put_arp(inport, arp.spa, arp.sha); next;)
  table=2 (lr_in_learn_neighbor), priority=90   , match=(nd_na), action=(put_nd(inport, nd.target, nd.tll); next;)
  table=2 (lr_in_learn_neighbor), priority=90   , match=(nd_ns), action=(put_nd(inport, ip6.src, nd.sll); next;)
  table=2 (lr_in_learn_neighbor), priority=95   , match=(nd_na && nd.tll == 0), action=(put_nd(inport, nd.target, eth.src); next;)


<======== target link layer option is zero in neighbor advertisement so ovn-controller can learn the mac from the packet's eth source

Comment 12 errata-xmlrpc 2022-05-27 18:14:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4784