Description of problem: If ovn-controller is woken up (e.g., a SB DB update needs to be processed) it will try to run the incremental processing engine. If the current SB DB transaction is still in progress this will fail and trigger a full recompute. However, in some cases processing the changes incrementally doesn't require write operations to the SB DB. To fix this, the engine should try to run even when no SB DB txn is available. Version-Release number of selected component (if applicable): How reproducible: Sometimes, based on how long it takes to process the MAC_Binding row update in ovsdb-server. Steps to Reproduce: 1. Configure a logical switch attached to a logical router. Configure an IP subnet on the logical router port. 2. Enable debug traces in ovn-controller: ovn-appctl -t ovn-controller vlog/set DBG 2. Send a GARP from a VM attached to the logical switch. 3. This should not trigger a full recompute of the database, i.e., no "engine did not run, force recompute next time" log should be seen in ovn-controller.log. Actual results: If the transaction issued by ovn-controller is still in progress when the update is received back from ovsdb-server, ovn-controller will trigger a full recompute. Expected results: MAC_Binding updates should be processed incrementally even when SB DB txn is NULL. Additional info: Fixed by upstream commit: https://github.com/ovn-org/ovn/commit/e2ab60e3a7c60f3adb8da40e4d1cfeb890d6f80e
tried to reproduce on ovn2.12.0-19 with following steps: #!/bin/bash systemctl restart openvswitch systemctl restart ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external-ids:system_id=hv1 external-ids:ovn-remote=tcp:20.0.30.25:6642 external-ids:ovn-encap-type=geneve external-ids:ovn-encap-ip=20.0.30.25 systemctl restart ovn-controller ovn-nbctl lr-add lr1 ovn-nbctl lrp-add lr1 lrp1 00:01:02:00:02:01 192.168.0.254/24 2001::a/64 ovn-nbctl ls-add ls1 ovn-nbctl lsp-add ls1 ls1-lr1 ovn-nbctl lsp-set-options ls1-lr1 router-port=lrp1 ovn-nbctl lsp-set-addresses ls1-lr1 "00:01:02:00:02:01 192.168.0.254 2001::a" ovn-nbctl lsp-add ls1 lsp1 ovn-nbctl set Logical-Switch ls1 other_config:subnet=192.168.0.0/16 ovn-nbctl set Logical-switch ls1 other_config:ipv6_prefix=2001::0 ovn-nbctl lsp-set-addresses lsp1 "00:01:02:00:02:02 192.168.0.1 2001::1" ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal ip netns add server0 ip link set vm1 netns server0 ip netns exec server0 ip link set lo up ip netns exec server0 ip link set vm1 up ip netns exec server0 ip link set vm1 address 00:01:02:00:02:02 ip netns exec server0 ip addr add 192.168.0.1/24 dev vm1 ip netns exec server0 ip addr add 2001::1/64 dev vm1 ovs-vsctl set Interface vm1 external_ids:iface-id=lsp1 ovn-nbctl lsp-add ls1 lsp2 ovn-nbctl lsp-set-addresses lsp2 "00:01:02:00:02:03 192.168.0.2 2001::2" ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal ip netns add server1 ip link set vm2 netns server1 ip netns exec server1 ip link set lo up ip netns exec server1 ip link set vm2 up ip netns exec server1 ip link set vm2 address 00:01:02:00:02:03 ip netns exec server1 ip addr add 192.168.0.2/24 dev vm2 ip netns exec server1 ip addr add 2001::2/64 dev vm2 ovs-vsctl set Interface vm2 external_ids:iface-id=lsp2 ovn-appctl -t ovn-controller vlog/set DBG ip netns exec server0 python garp.py [root@dell-per740-12 bz1787360]# cat garp.py from scapy.all import * sendp(Ether(src="00:01:02:00:02:02",dst="ff:ff:ff:ff:ff:ff")/ARP(op=1,hwsrc="00:01:02:00:02:02",hwdst="00:00:00:00:00:00",psrc="192.168.0.1",pdst="192.168.0.1"),iface="vm1") log on ovn-2.12.0-19: [root@dell-per740-12 bz1787360]# rpm -qa | grep -E "openvswitch|ovn" ovn2.12-host-2.12.0-19.el7fdp.x86_64 openvswitch2.12-2.12.0-21.el7fdp.x86_64 ovn2.12-central-2.12.0-19.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch ovn2.12-2.12.0-19.el7fdp.x86_64 [root@dell-per740-12 bz1787360]# grep "engine did not run, force recompute next time" /var/log/ovn/ovn-controller.log -c 479 <==== several lines of "engine did not run, force recompute next time" then on 2.12.0-26: [root@dell-per740-12 bz1787360]# grep "engine did not run, force recompute next time" /var/log/ovn/ovn-controller.log -c 1 <=== only one line of "engine did not run, force recompute next time" Dumitru, does above result verify the issue?
(In reply to Jianlin Shi from comment #3) > [root@dell-per740-12 bz1787360]# grep "engine did not run, force recompute > next time" /var/log/ovn/ovn-controller.log -c > 479 > > <==== several lines of "engine did not run, force recompute next time" > > then on 2.12.0-26: > > [root@dell-per740-12 bz1787360]# grep "engine did not run, force recompute > next time" /var/log/ovn/ovn-controller.log -c > 1 > > <=== only one line of "engine did not run, force recompute next time" > > Dumitru, does above result verify the issue? Looks good to me. Thanks, Dumitru
set VERIFIED per comment 4
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0752