Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2196286

Summary: tug-of-war between ovn-controllers for external gateway port causes havoc for ml2-ovn
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Ihar Hrachyshka <ihrachys>
Component: ovn-2021Assignee: Ales Musil <amusil>
Status: CLOSED ERRATA QA Contact: Ehsan Elahi <eelahi>
Severity: high Docs Contact:
Priority: high    
Version: FDP 20.HCC: amusil, ctrautma, dhill, ffernand, froyo, ihrachys, jamsmith, jiji, jishi, lmartins, mmichels, ovnteam, ralongi, twilson, ykarel
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn-2021-21.12.0-136.el8fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1974898 Environment:
Last Closed: 2023-11-30 00:16:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1974898    
Bug Blocks: 1728282, 1994427, 2081631, 2189267    

Comment 5 Mark Michelson 2023-07-28 18:26:28 UTC
*** Bug 2189267 has been marked as a duplicate of this bug. ***

Comment 6 Ales Musil 2023-09-15 05:32:20 UTC
Backport posted: https://patchwork.ozlabs.org/project/ovn/list/?series=372659

Comment 9 Jianlin Shi 2023-10-25 02:24:40 UTC
Hi Ales,

the link for patch in comment 6 is expired, which patch fix the issue? any reproducer?

Comment 10 Jianlin Shi 2023-10-26 07:43:33 UTC
thanks to Dumitru's guidance, I found the patches: https://patchwork.ozlabs.org/project/ovn/list/?series=372659&state=*

Comment 11 Ehsan Elahi 2023-10-27 11:50:29 UTC
Reproduced on: 
[root@~ bz2196286]# rpm -qa | grep -E 'ovn|openvswitch'
openvswitch-selinux-extra-policy-1.0-31.el8fdp.noarch
openvswitch2.17-2.17.0-50.el8fdp.x86_64
ovn-2021-21.12.0-103.el8fdp.x86_64
ovn-2021-central-21.12.0-103.el8fdp.x86_64
ovn-2021-host-21.12.0-103.el8fdp.x86_64

Here is the reproducer:
####### HV1 ########
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
systemctl start openvswitch
ovs-vsctl set open . external_ids:system-id=hv1

ovs-vsctl set open . external_ids:ovn-remote=tcp:192.168.20.1:6642
ovs-vsctl set open . external_ids:ovn-encap-type=geneve
ovs-vsctl set open . external_ids:ovn-encap-ip=192.168.20.1
ovs-vsctl set open . external_ids:ovn-monitor-all=true
systemctl start ovn-controller

ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls lsp0
ovn-nbctl lsp-set-addresses lsp0 "00:00:00:00:20:10 192.168.20.10"

ip netns add vm0
ovs-vsctl add-port br-int vm0 -- set interface vm0 type=internal
ip netns exec vm0 ip link set lo up
ip link set vm0 netns vm0
ip netns exec vm0 ip link set vm0 address 00:00:00:00:20:10
ip netns exec vm0 ip link set vm0 up
ip netns exec vm0 ip addr add 192.168.20.10/24 dev vm0
ovs-vsctl set interface vm0 external_ids:iface-id=lsp0

############ HV2 ###############
systemctl start ovn-northd
systemctl start openvswitch
ovs-vsctl set open . external_ids:system-id=hv0

ovs-vsctl set open . external_ids:ovn-remote=tcp:192.168.20.1:6642
ovs-vsctl set open . external_ids:ovn-encap-type=geneve
ovs-vsctl set open . external_ids:ovn-encap-ip=192.168.20.2
ovs-vsctl set open . external_ids:ovn-monitor-all=true
systemctl start ovn-controller

ip netns add vm0
ovs-vsctl add-port br-int vm0 -- set interface vm0 type=internal
ip netns exec vm0 ip link set lo up
ip link set vm0 netns vm0
ip netns exec vm0 ip link set vm0 address 00:00:00:00:20:10
ip netns exec vm0 ip link set vm0 up
ip netns exec vm0 ip addr add 192.168.20.10/24 dev vm0
ovs-vsctl set interface vm0 external_ids:iface-id=lsp0

########## Results on non fixed release ##################
[root@~ bz2196286]# grep -c 'Claiming\|Changing chassis' /var/log/ovn/ovn-controller.log && sleep 3 && grep -c 'Claiming\|Changing chassis' /var/log/ovn/ovn-controller.log
30460
84534

<<< ############ Within 3 seconds, there's a bulk of port claiming and changing logs ########


Verified ON:
[root@~ bz2196286]# rpm -qa | grep -E 'ovn|openvswitch'
openvswitch-selinux-extra-policy-1.0-31.el8fdp.noarch
openvswitch2.17-2.17.0-134.el8fdp.x86_64
ovn-2021-21.12.0-137.el8fdp.x86_64
ovn-2021-central-21.12.0-137.el8fdp.x86_64
ovn-2021-host-21.12.0-137.el8fdp.x86_64

########## Results on fixed release ##################
[root@~ bz2196286]# grep -c 'Claiming\|Changing chassis' /var/log/ovn/ovn-controller.log && sleep 3 && grep -c 'Claiming\|Changing chassis' /var/log/ovn/ovn-controller.log
1874
1886

<<< ############ Within 3 seconds, there's less than 10 port claiming and less than 10 chassis changing logs ########

<<======== Few tail lines from ovn-controller.log on HV1. This shows that port claiming/changing is done once in 0.5 seconds. 

2023-10-27T10:21:45.133Z|02023|binding|INFO|Changing chassis for lport lsp0 from hv0 to hv1.
2023-10-27T10:21:45.133Z|02024|binding|INFO|lsp0: Claiming 00:00:00:00:20:10 192.168.20.10
2023-10-27T10:21:45.633Z|02027|binding|INFO|Changing chassis for lport lsp0 from hv0 to hv1.
2023-10-27T10:21:45.633Z|02028|binding|INFO|lsp0: Claiming 00:00:00:00:20:10 192.168.20.10
2023-10-27T10:21:46.134Z|02030|binding|INFO|Changing chassis for lport lsp0 from hv0 to hv1.
2023-10-27T10:21:46.134Z|02031|binding|INFO|lsp0: Claiming 00:00:00:00:20:10 192.168.20.10
2023-10-27T10:21:46.634Z|02033|binding|INFO|Changing chassis for lport lsp0 from hv0 to hv1.
2023-10-27T10:21:46.634Z|02034|binding|INFO|lsp0: Claiming 00:00:00:00:20:10 192.168.20.10
2023-10-27T10:21:47.134Z|02036|binding|INFO|Changing chassis for lport lsp0 from hv0 to hv1.
2023-10-27T10:21:47.134Z|02037|binding|INFO|lsp0: Claiming 00:00:00:00:20:10 192.168.20.10
2023-10-27T10:21:47.635Z|02039|binding|INFO|Changing chassis for lport lsp0 from hv0 to hv1.
2023-10-27T10:21:47.635Z|02040|binding|INFO|lsp0: Claiming 00:00:00:00:20:10 192.168.20.10
2023-10-27T10:21:48.136Z|02041|binding|INFO|Changing chassis for lport lsp0 from hv0 to hv1.
2023-10-27T10:21:48.136Z|02042|binding|INFO|lsp0: Claiming 00:00:00:00:20:10 192.168.20.10
2023-10-27T10:21:48.637Z|02043|binding|INFO|Changing chassis for lport lsp0 from hv0 to hv1.
2023-10-27T10:21:48.637Z|02044|binding|INFO|lsp0: Claiming 00:00:00:00:20:10 192.168.20.10
2023-10-27T10:21:49.138Z|02045|binding|INFO|Changing chassis for lport lsp0 from hv0 to hv1.
2023-10-27T10:21:49.138Z|02046|binding|INFO|lsp0: Claiming 00:00:00:00:20:10 192.168.20.10
2023-10-27T10:21:49.638Z|02047|binding|INFO|Changing chassis for lport lsp0 from hv0 to hv1.
2023-10-27T10:21:49.638Z|02048|binding|INFO|lsp0: Claiming 00:00:00:00:20:10 192.168.20.10

Comment 13 errata-xmlrpc 2023-11-30 00:16:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn-2021 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7591