The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 2155306 - [RFE][ovn] Add ARP/NDP Proxy capabilities
Summary: [RFE][ovn] Add ARP/NDP Proxy capabilities
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn22.12
Version: FDP 22.L
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: ---
Assignee: Quique Llorente
QA Contact: Ehsan Elahi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-12-20 17:31 UTC by Daniel Alvarez Sanchez
Modified: 2023-10-19 09:05 UTC (History)
8 users (show)

Fixed In Version: ovn23.06-23.06.0-13.el8fdp ovn23.06-23.06.0-13.el9fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-2555 0 None None None 2022-12-20 17:46:03 UTC

Description Daniel Alvarez Sanchez 2022-12-20 17:31:53 UTC
OpenStack recently added support for pure L3 deployments using BGP [0]. This architecture relies on building an extra layer to route all the traffic coming out of the VMs to the leaf nodes and viceversa.


The current implementation is based on kernel networking where the routing and ARP/NDP proxy is done by the kernel. However, for us to cover customers that require acceleration such as HWOL/OVS-DPDK we need to move this functionality into OVS.


We're currently prototyping with an architecture (attaching image to this BZ) where a small OVN cluster is running on every OSP compute node. This local OVN cluster will have 3 elements:

1. Logical Switch (localnet) connecting to the integration bridge of OSP
2. Logical Router that (ECMP) routes the traffic from the OSP workloads towards the two leafs (each leaf has a /30 network)
3. Logical Switch (localnet) connecting to an external OVS bridge where the two NICs are added



The OpenStack workloads do not know of this routing layer so the L2 connectivity is simulated by responding to ARP/NDP requests locally. For the purpose of the PoC, we're injecting ARP responder flows in the intermediate br-conn but, ideally, the local OVN cluster should do it (especially for NDP where we currently do not have the ability to do it without a controller action).

Worth mentioning that for the purpose of the PoC we're using the multibridge feature (not merged yet) here [1] which allows us to run two separate ovn-controller instances in the same host.


Example of arping from a local OpenStack VM to an external destination:

[root@vm-provider ~]# ip a sh eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:80:95:72 brd ff:ff:ff:ff:ff:ff
    altname enp3s0
    inet 172.16.100.160/24 brd 172.16.100.255 scope global dynamic noprefixroute eth0


[root@vm-provider ~]# ip r get 8.8.8.8
8.8.8.8 via 172.16.100.1 dev eth0 src 172.16.100.160 uid 0

[root@vm-provider ~]# arping 172.16.100.1 -c1
ARPING 172.16.100.1 from 172.16.100.160 eth0
Unicast reply from 172.16.100.1 [40:44:00:00:00:06]  1.414ms
Sent 1 probes (1 broadcast(s))
Received 1 response(s)



The br-conn bridge will hijack this request and reply with the MAC address of the OVN LS (40:44:00:00:00:06): 



# ovs-ofctl dump-flows br-conn
 cookie=0xbadcafe, duration=4550.493s, table=0, n_packets=93, n_bytes=3906, priority=100,arp,arp_tpa=172.16.100.1,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:40:44:00:00:00:06,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],load:0x404400000006->NXM_NX_ARP_SHA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xac106401->NXM_OF_ARP_SPA[],IN_PORT
 cookie=0x0, duration=363766.342s, table=0, n_packets=100623, n_bytes=27886427, priority=0 actions=NORMAL



[root@vm-provider ~]# ip nei get 172.16.100.1 dev eth0
172.16.100.1 dev eth0 lladdr 40:44:00:00:00:06 REACHABLE



After this, the traffic will be processed by the OVN LR in br-bgp and routed to the leafs with a default route:


# ovn-nbctl lr-route-list bgp-router
IPv4 Routes
Route Table <main>:
                0.0.0.0/0                100.64.0.5 dst-ip ecmp
                0.0.0.0/0                100.65.3.5 dst-ip ecmp


The goal of this RFE is to request ARP/NDP Proxy functionality to be added to OVN.




# ovn-nbctl show
switch 96c723c4-1cbd-40d8-90b5-049bf62ac461 (bgp-conn)
    port conn-bgp-router
        type: router
        router-port: bgp-router-public
    port bgp-conn-localnet
        type: localnet
        addresses: ["unknown"]

switch da474a04-bd47-4b95-bfc3-3112ddbb9431 (bgp-ex)
    port bgp-ex-localnet
        type: localnet
        addresses: ["unknown"]
    port bgp-portbinding
    port ex-bgp-router-2
        type: router
        router-port: bgp-router-ex-2
    port ex-bgp-router-1
        type: router
        router-port: bgp-router-ex-1

router fcd73758-e940-4af4-9ae1-c2e98357e281 (bgp-router)
    port bgp-router-ex-1
        mac: "52:54:00:9e:ac:43"
        networks: ["100.65.3.6/30"]
    port bgp-router-public
        mac: "40:44:00:00:00:06"
        networks: ["172.16.100.1/24"]
    port bgp-router-ex-2
        mac: "52:54:00:4e:f1:eb"
        networks: ["100.64.0.6/30"]




# ip a sh enp2s0
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000
    link/ether 52:54:00:9e:ac:43 brd ff:ff:ff:ff:ff:ff
    inet 100.65.3.6/30 scope global enp2s0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe9e:ac43/64 scope link
       valid_lft forever preferred_lft forever


# ip a sh enp3s0
4: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000
    link/ether 52:54:00:4e:f1:eb brd ff:ff:ff:ff:ff:ff
    inet 100.64.0.6/30 scope global enp3s0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe4e:f1eb/64 scope link
       valid_lft forever preferred_lft forever


# ovs-ofctl dump-flows br-ex
 cookie=0xbadcaf2, duration=554.855s, table=0, n_packets=537, n_bytes=52626, priority=100,ip,in_port=enp2s0,nw_dst=172.16.100.0/24 actions=mod_dl_dst:52:54:00:9e:ac:43,output:"patch-bgp-ex-lo"
 cookie=0xbadcaf2, duration=547.097s, table=0, n_packets=549, n_bytes=53940, priority=100,ip,in_port=enp3s0,nw_dst=172.16.100.0/24 actions=mod_dl_dst:52:54:00:4e:f1:eb,output:"patch-bgp-ex-lo"
 cookie=0x0, duration=364594.904s, table=0, n_packets=6216556, n_bytes=770838838, priority=0 actions=NORMAL




[0] https://developers.redhat.com/articles/2022/09/22/learn-about-new-bgp-capabilities-red-hat-openstack-17
[1] https://patchwork.ozlabs.org/project/ovn/list/?series=330752

Comment 1 Daniel Alvarez Sanchez 2022-12-20 17:55:38 UTC
Created attachment 1933800 [details]
arch

Comment 2 Numan Siddique 2022-12-20 23:27:22 UTC
Thanks for the detailed explanation and the diagram (very helpful).

So If I understand correctly, you don't want to add the openflow arp responder flow in br-conn right ?
Instead the ARP pkt from br-int will enter br-con n (via patch port) and from br-conn to br-bgp (managed by local OVN)
and the ARP responder flows added by local OVN.  Correct me If I'm wrong.


So IMO OVN should add these logical flows in the logical switch bgp-conn pipeline. I think it should
be straightforward to add this feature in OVN.

Right now we don't add ARP responder flows in the logical switch pipeline for the router ip (of the router it is connected to).
Instead the packet would enter the router pipeline and the arp responder flows there would reply.


In your case, if you don't add the arp responder flows in br-conn, what happens ?  I suppose the ARP request packet would enter local
OVN integration bridge br-bgp in the bgp-conn logical switch pipeline and then it would enter the router pipeline bgp-router.
And the ARP responder flows there should respond.   If this works as expected, probably we don't need to add anything in OVN.

Maybe what I'm saying is wrong (or there is a bug).  Can you please check what happens ?

Thanks
Numan

Comment 3 Daniel Alvarez Sanchez 2022-12-21 09:11:12 UTC
Thanks a lot Numan!

(In reply to Numan Siddique from comment #2)
> Thanks for the detailed explanation and the diagram (very helpful).
> 
> So If I understand correctly, you don't want to add the openflow arp
> responder flow in br-conn right ?

Exactly. There's two reasons for this:

1) Since we use OVN for pretty much everything else, it makes sense for us to not have to add extra flows of this sort 'manually'.
2) For IPv6 we can't do this and we'd need a controller action. We rather keep the dependency on ovn-controller than adding an extra one on the OVN BGP Agent



> Instead the ARP pkt from br-int will enter br-con n (via patch port) and
> from br-conn to br-bgp (managed by local OVN)
> and the ARP responder flows added by local OVN.  Correct me If I'm wrong.
> 
> 
> So IMO OVN should add these logical flows in the logical switch bgp-conn
> pipeline. I think it should
> be straightforward to add this feature in OVN.
> 
> Right now we don't add ARP responder flows in the logical switch pipeline
> for the router ip (of the router it is connected to).
> Instead the packet would enter the router pipeline and the arp responder
> flows there would reply.
> 
> 
> In your case, if you don't add the arp responder flows in br-conn, what
> happens ?  I suppose the ARP request packet would enter local
> OVN integration bridge br-bgp in the bgp-conn logical switch pipeline and
> then it would enter the router pipeline bgp-router.

Yes, I think this is correct.

> And the ARP responder flows there should respond.   If this works as
> expected, probably we don't need to add anything in OVN.

If we add entries to the Static_MAC_Binding table perhaps it works out of the box. The problem is that we'd need to respond to *every* IP.
The way that ARP Proxy works in the kernel (sysctl -w net.ipv4.conf.br-conn.proxy_arp=1) is that we give br-conn a loopback IP address (1.1.1.1/32) and then *all* the ARP/NDP requests will be answered by br-conn with its own MAC address.

The ask is to add some magic flows to the router pipeline (likely with less prio than the current ARP responder flows for known addresses) that respond to ARP/NDP requests with the MAC address of the port where we toggle the feature on regardless of the requested address.

Please let me know if the above makes sense. I can also show you live in a setup :)

> 
> Maybe what I'm saying is wrong (or there is a bug).  Can you please check
> what happens ?
> 
> Thanks
> Numan





I can probably mark this is as public and link it in the upstream ML for wider discussion if you think it's good to go.

Comment 4 Dumitru Ceara 2022-12-21 10:50:51 UTC
(In reply to Daniel Alvarez Sanchez from comment #3)
> Thanks a lot Numan!
> 
> (In reply to Numan Siddique from comment #2)
> > Thanks for the detailed explanation and the diagram (very helpful).
> > 
> > So If I understand correctly, you don't want to add the openflow arp
> > responder flow in br-conn right ?
> 
> Exactly. There's two reasons for this:
> 
> 1) Since we use OVN for pretty much everything else, it makes sense for us
> to not have to add extra flows of this sort 'manually'.
> 2) For IPv6 we can't do this and we'd need a controller action. We rather
> keep the dependency on ovn-controller than adding an extra one on the OVN
> BGP Agent
> 
> 
> 
> > Instead the ARP pkt from br-int will enter br-con n (via patch port) and
> > from br-conn to br-bgp (managed by local OVN)
> > and the ARP responder flows added by local OVN.  Correct me If I'm wrong.
> > 
> > 
> > So IMO OVN should add these logical flows in the logical switch bgp-conn
> > pipeline. I think it should
> > be straightforward to add this feature in OVN.
> > 
> > Right now we don't add ARP responder flows in the logical switch pipeline
> > for the router ip (of the router it is connected to).
> > Instead the packet would enter the router pipeline and the arp responder
> > flows there would reply.
> > 
> > 
> > In your case, if you don't add the arp responder flows in br-conn, what
> > happens ?  I suppose the ARP request packet would enter local
> > OVN integration bridge br-bgp in the bgp-conn logical switch pipeline and
> > then it would enter the router pipeline bgp-router.
> 
> Yes, I think this is correct.
> 
> > And the ARP responder flows there should respond.   If this works as
> > expected, probably we don't need to add anything in OVN.
> 
> If we add entries to the Static_MAC_Binding table perhaps it works out of
> the box. The problem is that we'd need to respond to *every* IP.
> The way that ARP Proxy works in the kernel (sysctl -w
> net.ipv4.conf.br-conn.proxy_arp=1) is that we give br-conn a loopback IP
> address (1.1.1.1/32) and then *all* the ARP/NDP requests will be answered by
> br-conn with its own MAC address.
> 
> The ask is to add some magic flows to the router pipeline (likely with less
> prio than the current ARP responder flows for known addresses) that respond
> to ARP/NDP requests with the MAC address of the port where we toggle the
> feature on regardless of the requested address.

Is something like this what you had in mind (per LRP knob to enable replying
to any ARP by default)?

https://github.com/dceara/ovn/commit/e46ea3fbc7088ac009480e2883968383404b79e2

> 
> Please let me know if the above makes sense. I can also show you live in a
> setup :)
> 
> > 
> > Maybe what I'm saying is wrong (or there is a bug).  Can you please check
> > what happens ?
> > 
> > Thanks
> > Numan
> 
> 
> 
> 
> 
> I can probably mark this is as public and link it in the upstream ML for
> wider discussion if you think it's good to go.

Comment 5 Daniel Alvarez Sanchez 2022-12-21 12:14:16 UTC
(In reply to Dumitru Ceara from comment #4)

> 
> Is something like this what you had in mind (per LRP knob to enable replying
> to any ARP by default)?
> 
> https://github.com/dceara/ovn/commit/e46ea3fbc7088ac009480e2883968383404b79e2

Wow! That was fast :)
Exactly what I had in mind yes. Thanks!

The per-LRP knob is what I thought of but it can be problematic if more than one LRP would have it on?
Another option is to have the knob at router level but then we need a way to specify which MAC address we should respond with.

Fine with me the per-LRP approach and let the user handle the fact that only one LRP should have it on for a given router.

Comment 6 Dumitru Ceara 2022-12-21 12:25:44 UTC
(In reply to Daniel Alvarez Sanchez from comment #5)
> (In reply to Dumitru Ceara from comment #4)
> 
> > 
> > Is something like this what you had in mind (per LRP knob to enable replying
> > to any ARP by default)?
> > 
> > https://github.com/dceara/ovn/commit/e46ea3fbc7088ac009480e2883968383404b79e2
> 
> Wow! That was fast :)
> Exactly what I had in mind yes. Thanks!
> 
> The per-LRP knob is what I thought of but it can be problematic if more than
> one LRP would have it on?

Why is it problematic?

> Another option is to have the knob at router level but then we need a way to
> specify which MAC address we should respond with.
> 
> Fine with me the per-LRP approach and let the user handle the fact that only
> one LRP should have it on for a given router.

I don't think we need that restriction.  Different LRPs correspond to different
subnets.  Unless I'm missing something I think we should be ok with a per-lrp
option.

Comment 7 Daniel Alvarez Sanchez 2022-12-21 13:02:35 UTC
(In reply to Dumitru Ceara from comment #6)
> (In reply to Daniel Alvarez Sanchez from comment #5)
> > (In reply to Dumitru Ceara from comment #4)
> > 
> > > 
> > > Is something like this what you had in mind (per LRP knob to enable replying
> > > to any ARP by default)?
> > > 
> > > https://github.com/dceara/ovn/commit/e46ea3fbc7088ac009480e2883968383404b79e2
> > 
> > Wow! That was fast :)
> > Exactly what I had in mind yes. Thanks!
> > 
> > The per-LRP knob is what I thought of but it can be problematic if more than
> > one LRP would have it on?
> 
> Why is it problematic?
> 
> > Another option is to have the knob at router level but then we need a way to
> > specify which MAC address we should respond with.
> > 
> > Fine with me the per-LRP approach and let the user handle the fact that only
> > one LRP should have it on for a given router.
> 
> I don't think we need that restriction.  Different LRPs correspond to
> different
> subnets.  Unless I'm missing something I think we should be ok with a per-lrp
> option.

Most likely it's me missing something :) 
ARP requests are broadcasted so they will reach all LRPs. The way that arp proxy works in the kernel (from what I've seen at least) is that the device will reply to all the requests regardless of the source (as long as the device has an IP address configured - eg. 1.1.1.1/32 is the one we use).

Comment 8 Dumitru Ceara 2022-12-21 13:09:02 UTC
(In reply to Daniel Alvarez Sanchez from comment #7)
> (In reply to Dumitru Ceara from comment #6)
> > (In reply to Daniel Alvarez Sanchez from comment #5)
> > > (In reply to Dumitru Ceara from comment #4)
> > > 
> > > > 
> > > > Is something like this what you had in mind (per LRP knob to enable replying
> > > > to any ARP by default)?
> > > > 
> > > > https://github.com/dceara/ovn/commit/e46ea3fbc7088ac009480e2883968383404b79e2
> > > 
> > > Wow! That was fast :)
> > > Exactly what I had in mind yes. Thanks!
> > > 
> > > The per-LRP knob is what I thought of but it can be problematic if more than
> > > one LRP would have it on?
> > 
> > Why is it problematic?
> > 
> > > Another option is to have the knob at router level but then we need a way to
> > > specify which MAC address we should respond with.
> > > 
> > > Fine with me the per-LRP approach and let the user handle the fact that only
> > > one LRP should have it on for a given router.
> > 
> > I don't think we need that restriction.  Different LRPs correspond to
> > different
> > subnets.  Unless I'm missing something I think we should be ok with a per-lrp
> > option.
> 
> Most likely it's me missing something :) 
> ARP requests are broadcasted so they will reach all LRPs. The way that arp

Hmm, how?  ARP reqs are broadcasted in the L2 domain so they only reach what
LRPs are connected to the logical switch where the ARPs are received.  I don't
think it's a valid configuration to have two LRPs from *the same router*
connected to the same logical switch.  Moreover, I don't think it's a valid
config to enable proxy-arp on two LRPs (different LRs) connected to the same
switch.  It's up to the user to avoid that IMO.

> proxy works in the kernel (from what I've seen at least) is that the device
> will reply to all the requests regardless of the source (as long as the
> device has an IP address configured - eg. 1.1.1.1/32 is the one we use).

I didn't test it but I'm quite sure that if you connect two linux interfaces
to a bridge and enable proxy-arp on both then they will both reply to ARP
reqs received on that bridge.

I think the PoC code I shared above implements that same behavior.

But if we agree that the OVN proxy-ARP implementation should match the
kernel behavior then we probably have a good enough "spec" to work with.

Comment 9 Daniel Alvarez Sanchez 2022-12-21 17:11:10 UTC
(In reply to Dumitru Ceara from comment #8)
> (In reply to Daniel Alvarez Sanchez from comment #7)
> > (In reply to Dumitru Ceara from comment #6)
> > > (In reply to Daniel Alvarez Sanchez from comment #5)
> > > > (In reply to Dumitru Ceara from comment #4)
> > > > 
> > > > > 
> > > > > Is something like this what you had in mind (per LRP knob to enable replying
> > > > > to any ARP by default)?
> > > > > 
> > > > > https://github.com/dceara/ovn/commit/e46ea3fbc7088ac009480e2883968383404b79e2
> > > > 
> > > > Wow! That was fast :)
> > > > Exactly what I had in mind yes. Thanks!
> > > > 
> > > > The per-LRP knob is what I thought of but it can be problematic if more than
> > > > one LRP would have it on?
> > > 
> > > Why is it problematic?
> > > 
> > > > Another option is to have the knob at router level but then we need a way to
> > > > specify which MAC address we should respond with.
> > > > 
> > > > Fine with me the per-LRP approach and let the user handle the fact that only
> > > > one LRP should have it on for a given router.
> > > 
> > > I don't think we need that restriction.  Different LRPs correspond to
> > > different
> > > subnets.  Unless I'm missing something I think we should be ok with a per-lrp
> > > option.
> > 
> > Most likely it's me missing something :) 
> > ARP requests are broadcasted so they will reach all LRPs. The way that arp
> 
> Hmm, how?  ARP reqs are broadcasted in the L2 domain so they only reach what
> LRPs are connected to the logical switch where the ARPs are received.  I
> don't
> think it's a valid configuration to have two LRPs from *the same router*
> connected to the same logical switch.  

Uhm we're doing this actually :)

switch da474a04-bd47-4b95-bfc3-3112ddbb9431 (bgp-ex)
    port ex-bgp-router-neutron
        type: router
        router-port: bgp-router-neutron-ex
    port bgp-ex-localnet
        type: localnet
        addresses: ["unknown"]
    port bgp-portbinding
    port ex-bgp-router-2
        type: router
        router-port: bgp-router-ex-2
    port ex-bgp-router-1
        type: router
        router-port: bgp-router-ex-1
router fcd73758-e940-4af4-9ae1-c2e98357e281 (bgp-router)
    port bgp-router-public
        mac: "40:44:00:00:00:06"
        networks: ["1.1.1.1/32"]
    port bgp-router-ex-1
        mac: "52:54:00:9e:ac:43"
        networks: ["100.65.3.6/30"]
    port bgp-router-ex-2
        mac: "52:54:00:4e:f1:eb"
        networks: ["100.64.0.6/30"]


By connecting two LRPs here we are doing ECMP out of the node. We can probably have one single LRP with the two networks but we wanted this config to match the MAC addresses of the NIC.
Why would this be wrong?



> Moreover, I don't think it's a valid
> config to enable proxy-arp on two LRPs (different LRs) connected to the same
> switch.  It's up to the user to avoid that IMO.

I think this is what I was saying with:

"Fine with me the per-LRP approach and let the user handle the fact that only
one LRP should have it on for a given router."




> 
> > proxy works in the kernel (from what I've seen at least) is that the device
> > will reply to all the requests regardless of the source (as long as the
> > device has an IP address configured - eg. 1.1.1.1/32 is the one we use).
> 
> I didn't test it but I'm quite sure that if you connect two linux interfaces
> to a bridge and enable proxy-arp on both then they will both reply to ARP
> reqs received on that bridge.
> 
> I think the PoC code I shared above implements that same behavior.
> 
> But if we agree that the OVN proxy-ARP implementation should match the
> kernel behavior then we probably have a good enough "spec" to work with.


Great!
Thanks again!

Comment 10 Dumitru Ceara 2022-12-22 09:25:23 UTC
(In reply to Daniel Alvarez Sanchez from comment #9)
> (In reply to Dumitru Ceara from comment #8)
> > (In reply to Daniel Alvarez Sanchez from comment #7)
> > > (In reply to Dumitru Ceara from comment #6)
> > > > (In reply to Daniel Alvarez Sanchez from comment #5)
> > > > > (In reply to Dumitru Ceara from comment #4)
> > > > > 
> > > > > > 
> > > > > > Is something like this what you had in mind (per LRP knob to enable replying
> > > > > > to any ARP by default)?
> > > > > > 
> > > > > > https://github.com/dceara/ovn/commit/e46ea3fbc7088ac009480e2883968383404b79e2
> > > > > 
> > > > > Wow! That was fast :)
> > > > > Exactly what I had in mind yes. Thanks!
> > > > > 
> > > > > The per-LRP knob is what I thought of but it can be problematic if more than
> > > > > one LRP would have it on?
> > > > 
> > > > Why is it problematic?
> > > > 
> > > > > Another option is to have the knob at router level but then we need a way to
> > > > > specify which MAC address we should respond with.
> > > > > 
> > > > > Fine with me the per-LRP approach and let the user handle the fact that only
> > > > > one LRP should have it on for a given router.
> > > > 
> > > > I don't think we need that restriction.  Different LRPs correspond to
> > > > different
> > > > subnets.  Unless I'm missing something I think we should be ok with a per-lrp
> > > > option.
> > > 
> > > Most likely it's me missing something :) 
> > > ARP requests are broadcasted so they will reach all LRPs. The way that arp
> > 
> > Hmm, how?  ARP reqs are broadcasted in the L2 domain so they only reach what
> > LRPs are connected to the logical switch where the ARPs are received.  I
> > don't
> > think it's a valid configuration to have two LRPs from *the same router*
> > connected to the same logical switch.  
> 
> Uhm we're doing this actually :)
> 
> switch da474a04-bd47-4b95-bfc3-3112ddbb9431 (bgp-ex)
>     port ex-bgp-router-neutron
>         type: router
>         router-port: bgp-router-neutron-ex
>     port bgp-ex-localnet
>         type: localnet
>         addresses: ["unknown"]
>     port bgp-portbinding
>     port ex-bgp-router-2
>         type: router
>         router-port: bgp-router-ex-2
>     port ex-bgp-router-1
>         type: router
>         router-port: bgp-router-ex-1
> router fcd73758-e940-4af4-9ae1-c2e98357e281 (bgp-router)
>     port bgp-router-public
>         mac: "40:44:00:00:00:06"
>         networks: ["1.1.1.1/32"]
>     port bgp-router-ex-1
>         mac: "52:54:00:9e:ac:43"
>         networks: ["100.65.3.6/30"]
>     port bgp-router-ex-2
>         mac: "52:54:00:4e:f1:eb"
>         networks: ["100.64.0.6/30"]
> 
> 
> By connecting two LRPs here we are doing ECMP out of the node. We can
> probably have one single LRP with the two networks but we wanted this config
> to match the MAC addresses of the NIC.
> Why would this be wrong?
> 

Oh, I see now.  It's not.  I was over constraining things (at most one subnet
per LS).

> 
> 
> > Moreover, I don't think it's a valid
> > config to enable proxy-arp on two LRPs (different LRs) connected to the same
> > switch.  It's up to the user to avoid that IMO.
> 
> I think this is what I was saying with:
> 
> "Fine with me the per-LRP approach and let the user handle the fact that only
> one LRP should have it on for a given router."
> 

Thanks!  While looking a bit at what the kernel should be doing, AFAICT,
proxy-arp should only reply to ARPs targeting IPs the host can reach
(through its own routing table).  I wonder if OVN should behave in the same
way or if it's good enough to "blindly" reply to all ARP reqs for destinations
that are not owned by the LR or VIFs attached to it.

Comment 13 Dumitru Ceara 2023-02-15 08:42:02 UTC
Summarizing our offline follow up, we should probably extend the current LSP.options:arp_proxy option [0] to include:

a. ipv6
b. subnets instead of host IPs
c. proxy mac address different from the one of the LRP (the logical router pipeline must also be updated to support routing traffic with dmac == proxy-mac-address)

[0] https://github.com/ovn-org/ovn/blob/24cd3267c452f6b687e8c03344693709b1c7ae9f/ovn-nb.xml#L994

Comment 21 Jianlin Shi 2023-10-11 02:37:59 UTC
Hi Dumitru,

which specific patch is added for this issue?

Comment 22 Dumitru Ceara 2023-10-11 08:10:03 UTC
Hi Jianlin,

These commits implemented the missing features:

https://github.com/ovn-org/ovn/commit/77846b215f
https://github.com/ovn-org/ovn/commit/551439cb62
https://github.com/ovn-org/ovn/commit/9e34292895

The last one is a follow-up bug fix.

Regards,
Dumitru

Comment 24 Ehsan Elahi 2023-10-17 08:48:54 UTC
Verified On:
openvswitch-selinux-extra-policy-1.0-34.el9fdp.noarch
openvswitch2.17-2.17.0-112.el9fdp.x86_64
ovn23.06-23.06.0-69.el9fdp.x86_64
ovn23.06-host-23.06.0-69.el9fdp.x86_64
ovn23.06-central-23.06.0-69.el9fdp.x86_64

Also Verified On:
openvswitch-selinux-extra-policy-1.0-31.el8fdp.noarch
openvswitch2.17-2.17.0-108.el8fdp.x86_64
ovn23.06-23.06.0-51.el8fdp.x86_64
ovn23.06-host-23.06.0-51.el8fdp.x86_64
ovn23.06-central-23.06.0-51.el8fdp.x86_64

Here is the reproducer:

ystemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
systemctl start openvswitch
ovs-vsctl set open . external_ids:system-id=hv1
#ip a
ifconfig ens5f0 42.42.42.2 netmask 255.0.0.0
ovs-vsctl set open . external_ids:ovn-remote=tcp:42.42.42.2:6642
ovs-vsctl set open . external_ids:ovn-encap-type=geneve
ovs-vsctl set open . external_ids:ovn-encap-ip=42.42.42.2
systemctl start ovn-controller
ovn-nbctl lr-add rtr
ovn-nbctl lrp-add rtr rtr-ls 00:00:00:00:01:00 172.16.10.1/24 2001:10::1/64
ovn-nbctl lrp-add rtr rtr-ls2 00:00:00:00:02:00 172.16.20.1/24 2001:20::1/64

ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls ls-rtr
ovn-nbctl lsp-set-addresses ls-rtr router
ovn-nbctl lsp-set-type ls-rtr router
ovn-nbctl lsp-set-options ls-rtr router-port=rtr-ls
ovn-nbctl lsp-add ls vm1
ovn-nbctl lsp-set-addresses vm1 00:00:00:00:00:01

ovn-nbctl ls-add ls2
ovn-nbctl lsp-add ls2 ls2-rtr
ovn-nbctl lsp-set-addresses ls2-rtr router
ovn-nbctl lsp-set-type ls2-rtr router
ovn-nbctl lsp-set-options ls2-rtr router-port=rtr-ls2
ovn-nbctl lsp-add ls2 vm2
ovn-nbctl lsp-set-addresses vm2 00:00:00:00:00:02

ovn-nbctl --wait=hv set Logical_Switch_Port ls-rtr options:arp_proxy="172.16.20.2 2001:20::2"

ip netns add vm1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ip link set vm1 netns vm1
ip netns exec vm1 ip link set vm1 address 00:00:00:00:00:01
ip netns exec vm1 ip addr add 172.16.10.2/16 dev vm1
ip netns exec vm1 ip -6 addr add 2001:10::2/16 dev vm1
ip netns exec vm1 ip link set vm1 up
#ip netns exec vm1 ip route add default via 172.16.10.1
#ip netns exec vm1 ip -6 route add default via 2001:10::1
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1

ip netns add vm2
ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
ip link set vm2 netns vm2
ip netns exec vm2 ip link set vm2 address 00:00:00:00:00:02
ip netns exec vm2 ip addr add 172.16.20.2/24 dev vm2
ip netns exec vm2 ip -6 addr add 2001:20::2/64 dev vm2
ip netns exec vm2 ip link set vm2 up
ip netns exec vm2 ip route add default via 172.16.20.1
ip netns exec vm2 ip -6 route add default via 2001:20::1
ovs-vsctl set Interface vm2 external_ids:iface-id=vm2

ip netns exec vm1 tcpdump -U -i any  -w 1arp.pcap&
ip netns exec vm2 tcpdump -U -i any  -w 2arp.pcap&
sleep 2
ip netns exec vm1 ping 172.16.20.2 -c3
ip netns exec vm1 ping 2001:20::2 -c3
sleep 2
pkill tcpdump
sleep 2
tcpdump -r 1arp.pcap -nnle |grep 'Request who-has 172.16.20.2 tell 172.16.10.2'
tcpdump -r 1arp.pcap -nnle |grep 'Reply 172.16.20.2 is-at 00:00:00:00:01:00'
tcpdump -r 1arp.pcap -nnle |grep 'neighbor solicitation, who has 2001:20::2'
tcpdump -r 1arp.pcap -nnle |grep '2001:20::2 > 2001:10::2: ICMP6, neighbor advertisement'
ip netns exec vm1 arp |grep '172.16.20.2'|grep '00:00:00:00:01:00'
ip netns exec vm1 arp |grep '172.16.10.1'|grep '00:00:00:00:01:00'


Note You need to log in before you can comment on or make changes to this bug.