+++ This bug was initially created as a clone of Bug #1962344 +++ Description of problem: We see at scale of 300 nodes in a steady state cluster that ovn-controller is running anywhere between 20-30 second poll intervals. According to Dumitru the cause is a change to any SB datapath triggeringrecompute of the ct_zones and physical_flow_changes https://github.com/ovn-org/ovn/commit/f9cab11d5fabe2ae321a3b4bad5972b61df958c0 https://github.com/ovn-org/ovn/commit/f9cab11d5fabe2ae321a3b4bad5972b61df958c0#diff-220cd89c1bf69b5cf68c6e9ea377[…]61c58aaf871f29f13d8cccd6cff1R2392 2021-05-19T18:27:02Z|03612|inc_proc_eng|DBG|node: ct_zones, recompute (triggered) 2021-05-19T18:27:02Z|03613|inc_proc_eng|DBG|controller/ovn-controller.c:1590: node: ct_zones, old_state Stale, new_state Updated 2021-05-19T18:27:02Z|03614|inc_proc_eng|DBG|node: physical_flow_changes, handle change for input ct_zones 2021-05-19T18:27:02Z|03615|inc_proc_eng|DBG|node: physical_flow_changes, can't handle change for input ct_zones, fall back to recompute 2021-05-19T18:27:02Z|03616|inc_proc_eng|DBG|node: physical_flow_changes, recompute (triggered)
Please also add an INFO debug message to ovn-controller to tell us when a full recompute is triggered due to being unable to handle incremental processing. Also list the culprit please.
The patches are posted for review - https://patchwork.ozlabs.org/project/ovn/list/?series=246413
Created attachment 1788514 [details] OVS conf db
Created attachment 1788515 [details] NB DB
So. The merged patches which split the logical flow and physical flow processing would definitely. But I think we still do full recompute of physical flow engine for ct zone changes. so moving the BZ backt to assigned. I'll start looking into this issue to not do full recompute for ct zone changes.
Patches up for review - https://patchwork.ozlabs.org/project/ovn/list/?series=253841
(In reply to Numan Siddique from comment #11) > (In reply to Ehsan Elahi from comment #10) > > On ovn-2021-21.09.0-12.el8fdp, it does not seems to work. When a port is > > deleted, flow recalculation is not triggered until explicitly called destroy > > port_binding. > > > > # ovn-nbctl lsp-del ls1p1 > > > > # sw_dpkey=$(ovn-sbctl --bare --columns tunnel_key list datapath_binding > > ls1) > > # lp_dpkey=$(ovn-sbctl --bare --columns tunnel_key list port_binding ls1p1) > > > > # ovs-ofctl dump-flows br-int > > table=38,metadata=${sw_dpkey},reg15=0x${lp_dpkey} | grep REG13 | wc -l > > 1 > > # ovs-vsctl get bridge br-int external_ids:ct-zone-ls1p1 > > "1" > > > > # ls1p1_uuid=$(ovn-sbctl find port_binding logical_port=ls1p1 | awk > > '/_uuid/{print $3}') > > # ovn-sbctl destroy port_binding $ls1p1_uuid > > > > # ovs-ofctl dump-flows br-int > > table=38,metadata=${sw_dpkey},reg15=0x${lp_dpkey} | grep REG13 | wc -l > > 0 > > > > #ovs-vsctl get bridge br-int external_ids:ct-zone-ls1p1 > > ovs-vsctl: no key "ct-zone-ls1p1" in Bridge record "br-int" column > > external_ids > > > > Am I missing something? > > > This BZ improves the performance of ovn-controller in general. > > Also ideally you should not destroy a port_binding yourself. > When you delete a logical port (ovn-nbctl lsp-del ls1p1) > then ovn-northd should delete the port_binding in south db. > > It is strange that in your case that is not happening. > > Can you please confirm if ovn-northd is running or not ? > > If you can reproduce this scenarion I can take a quick look. > > Thanks Yes you were right Numan, I used "ovn-appctl -t ovn-northd pause" earlier but forget to use "ovn-appctl -t ovn-northd resume" that is why recalculation did not occur. Verified on ovn-2021-21.09.0-12.el8fdp.x86_64. ovn-controller triggers re-computation of the flows as soon as any change occur. systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:42.42.42.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=42.42.42.1 systemctl restart ovn-controller ovn-nbctl ls-add ls1 ovn-nbctl lsp-add ls1 ls1p1 ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:01 192.168.1.1 2001::1" ovn-nbctl lsp-add ls1 ls1p2 ovn-nbctl lsp-set-addresses ls1p2 "00:00:00:01:01:02 192.168.1.2 2001::2" ovs-vsctl add-port br-int ls1p1 -- set interface ls1p1 type=internal external_ids:iface-id=ls1p1 ovs-vsctl add-port br-int ls1p2 -- set interface ls1p2 type=internal external_ids:iface-id=ls1p2 ip netns add ls1p1 ip link set ls1p1 netns ls1p1 ip netns exec ls1p1 ip link set ls1p1 address 00:00:00:01:01:01 ip netns exec ls1p1 ip link set ls1p1 up ip netns exec ls1p1 ip addr add 192.168.1.1/24 dev ls1p1 ip netns exec ls1p1 ip addr add 2001::1/64 dev ls1p1 ip netns add ls1p2 ip link set ls1p2 netns ls1p2 ip netns exec ls1p2 ip link set ls1p2 address 00:00:00:01:01:02 ip netns exec ls1p2 ip link set ls1p2 up ip netns exec ls1p2 ip addr add 192.168.1.2/24 dev ls1p2 ip netns exec ls1p2 ip addr add 2001::2/64 dev ls1p2 ip netns exec ls1p1 ping 192.168.1.2 -c 3 sw_dpkey=$(ovn-sbctl --bare --columns tunnel_key list datapath_binding ls1) lp_dpkey=$(ovn-sbctl --bare --columns tunnel_key list port_binding ls1p1) ovs-ofctl dump-flows br-int table=38,metadata=${sw_dpkey},reg15=0x${lp_dpkey} | grep REG13 | wc -l 1 ovs-vsctl get bridge br-int external_ids:ct-zone-ls1p1 "1" ovn-nbctl lsp-del ls1p1 ovs-ofctl dump-flows br-int table=38,metadata=${sw_dpkey},reg15=0x${lp_dpkey} | grep REG13 | wc -l 0 ovs-vsctl get bridge br-int external_ids:ct-zone-ls1p1 ovs-vsctl: no key "ct-zone-ls1p1" in Bridge record "br-int" column external_ids
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:5059