Bug 1947823

Summary: [ovn] ARP responder flows for virtual ports in more than one chassis causing disruption and ARP suppression in the switches
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Daniel Alvarez Sanchez <dalvarez>
Component: ovn2.13Assignee: Numan Siddique <nusiddiq>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: high Docs Contact:
Priority: high    
Version: FDP 21.BCC: averi, ctrautma, dcbw, dceara, eelahi, ffernand, fhallal, jishi, jmelvin, ljozsa, mchappel, mhofmann, nusiddiq, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn2.13-20.12.0-104 ovn-2021-21.03.0-33 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-20 19:28:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Daniel Alvarez Sanchez 2021-04-09 10:31:15 UTC
Version used:
ovn2.13-20.12.0-24.el8fdp.x86_64


In a 170 node setup we identified that a port with virtual parent, and bound to chassis X was replying to ARP requests from chassis X and chassis Y.

On node Y, there were flows installed for ARP responder which explains why it was replying to that. This behavior caused disruption in the network and ToRs enforced ARP suppression mechanisms.

This seems like a bug in the Incremental Processing framework since triggering a recompute in node Y solved the issue.

Comment 7 Dan Williams 2021-04-14 14:20:13 UTC
Initial patch posted upstream: http://patchwork.ozlabs.org/project/ovn/patch/20210414133758.3410184-1-numans@ovn.org/

Comment 12 Jianlin Shi 2021-04-25 09:08:22 UTC
tested with following script:

#!/bin/bash

systemctl start openvswitch                                                
systemctl start ovn-northd                                
ovn-nbctl set-connection ptcp:6641                                                          
ovn-sbctl set-connection ptcp:6642                                    
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.175.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.175.25
systemctl restart ovn-controller                                      

ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls vm1
ovn-nbctl lsp-set-addresses vm1 00:00:00:00:00:01
ovn-nbctl lsp-add ls ls-vir
ovn-nbctl lsp-set-addresses ls-vir "00:00:00:00:00:01 42.42.42.42"
ovn-nbctl lsp-set-port-security ls-vir "00:00:00:00:00:01 42.42.42.42"
ovn-nbctl lsp-set-type ls-vir virtual
ovn-nbctl set logical_switch_port ls-vir options:virtual-ip=42.42.42.42
ovn-nbctl set logical_switch_port ls-vir options:virtual-parents=vm1

# Add an ACL that matches on ls-vir being bound locally.
ovn-nbctl acl-add ls to-lport 1 'is_chassis_resident("ls-vir") && ip' allow

# Bind an internal OVS interface to vm1.
ip netns add vm1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal -- set Interface vm1 external_ids:iface-id=vm1
ip link set vm1 netns vm1
ip netns exec vm1 ip link set vm1 address 00:00:00:00:00:01
ip netns exec vm1 ip addr add 42.42.42.2/24 dev vm1
ip netns exec vm1 ip link set vm1 up
ip netns exec vm1 ip r a default via 42.42.42.1
ip netns exec vm1 ip a a 42.42.42.42/32 dev vm1
ovn-nbctl --wait=hv sync

# Inject a GARP from vm1 for 42.42.42.42:
ip netns exec vm1 arping -c 1 -A -I vm1 42.42.42.42
sleep 1

# Check that an OF is generated for the ACL.

ovs-ofctl --no-stats dump-flows br-int table=45 | grep priority=1001


# Release vm1, rebind vm1, reinject GARP for ls-vir, the OF is not reinserted
# unless a full recompute is triggered.
ovs-vsctl set interface vm1 external_ids:iface-id=foo
ovs-ofctl --no-stats dump-flows br-int table=45 | grep priority=1001
ovs-vsctl set interface vm1 external_ids:iface-id=vm1
ip netns exec vm1 arping -c 1 -A -I vm1 42.42.42.42
ovs-ofctl --no-stats dump-flows br-int table=45 | grep priority=1001

reproduced on ovn2.13-20.12.0-95.el8fdp.x86_64:

+ ip netns exec vm1 arping -c 1 -A -I vm1 42.42.42.42
ARPING 42.42.42.42 from 42.42.42.42 vm1
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
+ sleep 1
+ ovs-ofctl --no-stats dump-flows br-int table=45
+ grep priority=1001
 cookie=0x3df02489, table=45, priority=1001,ipv6,metadata=0x1 actions=resubmit(,46)
 cookie=0x3df02489, table=45, priority=1001,ip,metadata=0x1 actions=resubmit(,46)
+ ovs-vsctl set interface vm1 external_ids:iface-id=foo
+ ovs-ofctl --no-stats dump-flows br-int table=45
+ grep priority=1001
+ ovs-vsctl set interface vm1 external_ids:iface-id=vm1
+ ip netns exec vm1 arping -c 1 -A -I vm1 42.42.42.42
ARPING 42.42.42.42 from 42.42.42.42 vm1
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
+ ovs-ofctl --no-stats dump-flows br-int table=45
+ grep priority=1001

<=== no flow for acl added

Verified on ovn2.13-central-20.12.0-104.el8fdp.x86_64:

[root@wsfd-advnetlab21 bz1947823]# rpm -qa | grep ovn2.13
ovn2.13-central-20.12.0-104.el8fdp.x86_64
ovn2.13-20.12.0-104.el8fdp.x86_64
ovn2.13-host-20.12.0-104.el8fdp.x86_64

+ ip netns exec vm1 arping -c 1 -A -I vm1 42.42.42.42
ARPING 42.42.42.42 from 42.42.42.42 vm1
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
+ sleep 1
+ ovs-ofctl --no-stats dump-flows br-int table=45
+ grep priority=1001
 cookie=0xfe7bff24, table=45, priority=1001,ipv6,metadata=0x1 actions=resubmit(,46)
 cookie=0xfe7bff24, table=45, priority=1001,ip,metadata=0x1 actions=resubmit(,46)
+ ovs-vsctl set interface vm1 external_ids:iface-id=foo
+ ovs-ofctl --no-stats dump-flows br-int table=45
+ grep priority=1001
+ ovs-vsctl set interface vm1 external_ids:iface-id=vm1
+ ip netns exec vm1 arping -c 1 -A -I vm1 42.42.42.42
ARPING 42.42.42.42 from 42.42.42.42 vm1
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
+ ovs-ofctl --no-stats dump-flows br-int table=45
+ grep priority=1001
 cookie=0xfe7bff24, table=45, priority=1001,ipv6,metadata=0x1 actions=resubmit(,46)
 cookie=0xfe7bff24, table=45, priority=1001,ip,metadata=0x1 actions=resubmit(,46)

<=== flow added

Comment 13 Jianlin Shi 2021-04-25 09:10:29 UTC
also verified on ovn-2021-21.03.0-21.el8fdp.x86_64:

[root@wsfd-advnetlab21 bz1947823]# rpm -qa | grep ovn-2021
ovn-2021-21.03.0-21.el8fdp.x86_64
ovn-2021-host-21.03.0-21.el8fdp.x86_64
ovn-2021-central-21.03.0-21.el8fdp.x86_64

+ ip netns exec vm1 arping -c 1 -A -I vm1 42.42.42.42
ARPING 42.42.42.42 from 42.42.42.42 vm1
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
+ sleep 1
+ ovs-ofctl --no-stats dump-flows br-int table=45
+ grep priority=1001
 cookie=0xe7fd05c3, table=45, priority=1001,ipv6,metadata=0x1 actions=resubmit(,46)
 cookie=0xe7fd05c3, table=45, priority=1001,ip,metadata=0x1 actions=resubmit(,46)
+ ovs-vsctl set interface vm1 external_ids:iface-id=foo
+ ovs-ofctl --no-stats dump-flows br-int table=45
+ grep priority=1001
+ ovs-vsctl set interface vm1 external_ids:iface-id=vm1
+ ip netns exec vm1 arping -c 1 -A -I vm1 42.42.42.42
ARPING 42.42.42.42 from 42.42.42.42 vm1
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
+ ovs-ofctl --no-stats dump-flows br-int table=45
+ grep priority=1001
 cookie=0xe7fd05c3, table=45, priority=1001,ipv6,metadata=0x1 actions=resubmit(,46)
 cookie=0xe7fd05c3, table=45, priority=1001,ip,metadata=0x1 actions=resubmit(,46)
+ ovn-appctl -t ovn-controller recompute
+ ovs-ofctl --no-stats dump-flows br-int table=45
+ grep priority=1001
 cookie=0xe7fd05c3, table=45, priority=1001,ipv6,metadata=0x1 actions=resubmit(,46)
 cookie=0xe7fd05c3, table=45, priority=1001,ip,metadata=0x1 actions=resubmit(,46)

<=== flow added

Comment 14 Jianlin Shi 2021-04-25 09:12:23 UTC
also verified on ovn2.13-20.12.0-104.el7fdp.x86_64:

[root@wsfd-advnetlab16 bz1947823]# rpm -qa | grep ovn2.13
ovn2.13-host-20.12.0-104.el7fdp.x86_64
ovn2.13-central-20.12.0-104.el7fdp.x86_64
ovn2.13-20.12.0-104.el7fdp.x86_64

+ ip netns exec vm1 arping -c 1 -A -I vm1 42.42.42.42
ARPING 42.42.42.42 from 42.42.42.42 vm1
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
+ sleep 1
+ ovs-ofctl --no-stats dump-flows br-int table=45
+ grep priority=1001
 cookie=0xa34c8767, table=45, priority=1001,ipv6,metadata=0x1 actions=resubmit(,46)
 cookie=0xa34c8767, table=45, priority=1001,ip,metadata=0x1 actions=resubmit(,46)
+ ovs-vsctl set interface vm1 external_ids:iface-id=foo
+ ovs-ofctl --no-stats dump-flows br-int table=45
+ grep priority=1001
+ ovs-vsctl set interface vm1 external_ids:iface-id=vm1
+ ip netns exec vm1 arping -c 1 -A -I vm1 42.42.42.42
ARPING 42.42.42.42 from 42.42.42.42 vm1
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
+ ovs-ofctl --no-stats dump-flows br-int table=45
+ grep priority=1001
 cookie=0xa34c8767, table=45, priority=1001,ipv6,metadata=0x1 actions=resubmit(,46)
 cookie=0xa34c8767, table=45, priority=1001,ip,metadata=0x1 actions=resubmit(,46)
+ ovn-appctl -t ovn-controller recompute
+ ovs-ofctl --no-stats dump-flows br-int table=45
+ grep priority=1001
 cookie=0xa34c8767, table=45, priority=1001,ipv6,metadata=0x1 actions=resubmit(,46)
 cookie=0xa34c8767, table=45, priority=1001,ip,metadata=0x1 actions=resubmit(,46)

<=== flow added

Comment 16 errata-xmlrpc 2021-05-20 19:28:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2080