The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1962345 - [SCALE] ovn-controller physical flow recalculation due to CT zone change
Summary: [SCALE] ovn-controller physical flow recalculation due to CT zone change
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: RHEL 8.0
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: ---
: FDP 21.I
Assignee: Numan Siddique
QA Contact: Ehsan Elahi
URL:
Whiteboard: perfscale-ovn
Depends On:
Blocks: 1962344
TreeView+ depends on / blocked
 
Reported: 2021-05-19 19:08 UTC by Tim Rozet
Modified: 2021-12-09 15:37 UTC (History)
8 users (show)

Fixed In Version: ovn21.09-21.09.0-11
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1962344
Environment:
Last Closed: 2021-12-09 15:37:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
OVS conf db (417.07 KB, text/plain)
2021-06-01 14:19 UTC, Dumitru Ceara
no flags Details
NB DB (2.85 MB, application/gzip)
2021-06-01 14:20 UTC, Dumitru Ceara
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1320 0 None None None 2021-10-19 13:29:55 UTC
Red Hat Product Errata RHBA-2021:5059 0 None None None 2021-12-09 15:37:56 UTC

Description Tim Rozet 2021-05-19 19:08:04 UTC
+++ This bug was initially created as a clone of Bug #1962344 +++

Description of problem:
We see at scale of 300 nodes in a steady state cluster that ovn-controller is running anywhere between 20-30 second poll intervals. According to Dumitru the cause is a change to any SB datapath triggeringrecompute of the ct_zones and physical_flow_changes

https://github.com/ovn-org/ovn/commit/f9cab11d5fabe2ae321a3b4bad5972b61df958c0

https://github.com/ovn-org/ovn/commit/f9cab11d5fabe2ae321a3b4bad5972b61df958c0#diff-220cd89c1bf69b5cf68c6e9ea377[…]61c58aaf871f29f13d8cccd6cff1R2392

2021-05-19T18:27:02Z|03612|inc_proc_eng|DBG|node: ct_zones, recompute (triggered)
2021-05-19T18:27:02Z|03613|inc_proc_eng|DBG|controller/ovn-controller.c:1590: node: ct_zones, old_state Stale, new_state Updated
2021-05-19T18:27:02Z|03614|inc_proc_eng|DBG|node: physical_flow_changes, handle change for input ct_zones
2021-05-19T18:27:02Z|03615|inc_proc_eng|DBG|node: physical_flow_changes, can't handle change for input ct_zones, fall back to recompute
2021-05-19T18:27:02Z|03616|inc_proc_eng|DBG|node: physical_flow_changes, recompute (triggered)

Comment 1 Tim Rozet 2021-05-19 19:11:01 UTC
Please also add an INFO debug message to ovn-controller to tell us when a full recompute is triggered due to being unable to handle incremental processing. Also list the culprit please.

Comment 2 Numan Siddique 2021-06-01 13:39:42 UTC
The patches are posted for review - https://patchwork.ozlabs.org/project/ovn/list/?series=246413

Comment 3 Dumitru Ceara 2021-06-01 14:19:45 UTC
Created attachment 1788514 [details]
OVS conf db

Comment 4 Dumitru Ceara 2021-06-01 14:20:22 UTC
Created attachment 1788515 [details]
NB DB

Comment 6 Numan Siddique 2021-06-28 19:55:57 UTC
So.  The merged patches which split the logical flow and physical flow processing would definitely.
But I think we still do full recompute of physical flow engine for ct zone changes.

so moving the BZ backt to assigned.  I'll start looking into this issue to not do full recompute for ct zone changes.

Comment 7 Numan Siddique 2021-07-16 15:10:42 UTC
Patches up for review - https://patchwork.ozlabs.org/project/ovn/list/?series=253841

Comment 12 Ehsan Elahi 2021-10-26 23:20:59 UTC
(In reply to Numan Siddique from comment #11)
> (In reply to Ehsan Elahi from comment #10)
> > On ovn-2021-21.09.0-12.el8fdp, it does not seems to work. When a port is
> > deleted, flow recalculation is not triggered until explicitly called destroy
> > port_binding. 
> > 
> > # ovn-nbctl lsp-del ls1p1
> > 
> > # sw_dpkey=$(ovn-sbctl  --bare --columns tunnel_key list datapath_binding
> > ls1)
> > # lp_dpkey=$(ovn-sbctl  --bare --columns tunnel_key list port_binding ls1p1)
> > 
> > # ovs-ofctl dump-flows br-int
> > table=38,metadata=${sw_dpkey},reg15=0x${lp_dpkey} | grep REG13 | wc -l
> > 1
> > # ovs-vsctl get bridge br-int external_ids:ct-zone-ls1p1
> > "1"
> > 
> > # ls1p1_uuid=$(ovn-sbctl find port_binding logical_port=ls1p1 | awk
> > '/_uuid/{print $3}')
> > # ovn-sbctl destroy port_binding $ls1p1_uuid
> > 
> > # ovs-ofctl dump-flows br-int
> > table=38,metadata=${sw_dpkey},reg15=0x${lp_dpkey} | grep REG13 | wc -l
> > 0
> > 
> > #ovs-vsctl get bridge br-int external_ids:ct-zone-ls1p1
> > ovs-vsctl: no key "ct-zone-ls1p1" in Bridge record "br-int" column
> > external_ids
> > 
> > Am I missing something?
> 
> 
> This BZ improves the performance of ovn-controller in general.
> 
> Also ideally you should not destroy a port_binding yourself.
> When you delete a logical port (ovn-nbctl lsp-del ls1p1)
> then ovn-northd should delete the port_binding in south db.
> 
> It is strange that in your case that is not happening.
> 
> Can you please confirm if ovn-northd is running or not ?
> 
> If you can reproduce this scenarion I can take a quick look.
> 
> Thanks

Yes you were right Numan, I used "ovn-appctl -t ovn-northd pause" earlier but forget to use "ovn-appctl -t ovn-northd resume" that is why recalculation did not occur. 

Verified on ovn-2021-21.09.0-12.el8fdp.x86_64. 
ovn-controller triggers re-computation of the flows as soon as any change occur. 

systemctl start openvswitch
systemctl start ovn-northd 
ovn-nbctl set-connection ptcp:6641                                       
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:42.42.42.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=42.42.42.1
systemctl restart ovn-controller

ovn-nbctl ls-add ls1                                                                   
ovn-nbctl lsp-add ls1 ls1p1                                                                     
ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:01 192.168.1.1 2001::1"
ovn-nbctl lsp-add ls1 ls1p2 
ovn-nbctl lsp-set-addresses ls1p2 "00:00:00:01:01:02 192.168.1.2 2001::2"

ovs-vsctl add-port br-int ls1p1 -- set interface ls1p1 type=internal external_ids:iface-id=ls1p1
ovs-vsctl add-port br-int ls1p2 -- set interface ls1p2 type=internal external_ids:iface-id=ls1p2

ip netns add ls1p1                     
ip link set ls1p1 netns ls1p1              
ip netns exec ls1p1 ip link set ls1p1 address 00:00:00:01:01:01
ip netns exec ls1p1 ip link set ls1p1 up
ip netns exec ls1p1 ip addr add 192.168.1.1/24 dev ls1p1
ip netns exec ls1p1 ip addr add 2001::1/64 dev ls1p1

ip netns add ls1p2                     
ip link set ls1p2 netns ls1p2              
ip netns exec ls1p2 ip link set ls1p2 address 00:00:00:01:01:02
ip netns exec ls1p2 ip link set ls1p2 up
ip netns exec ls1p2 ip addr add 192.168.1.2/24 dev ls1p2
ip netns exec ls1p2 ip addr add 2001::2/64 dev ls1p2

ip netns exec ls1p1 ping 192.168.1.2 -c 3

sw_dpkey=$(ovn-sbctl  --bare --columns tunnel_key list datapath_binding ls1)
lp_dpkey=$(ovn-sbctl  --bare --columns tunnel_key list port_binding ls1p1)

ovs-ofctl dump-flows br-int table=38,metadata=${sw_dpkey},reg15=0x${lp_dpkey} | grep REG13 | wc -l
1
ovs-vsctl get bridge br-int external_ids:ct-zone-ls1p1
"1"

ovn-nbctl lsp-del ls1p1

ovs-ofctl dump-flows br-int table=38,metadata=${sw_dpkey},reg15=0x${lp_dpkey} | grep REG13 | wc -l
0
ovs-vsctl get bridge br-int external_ids:ct-zone-ls1p1
ovs-vsctl: no key "ct-zone-ls1p1" in Bridge record "br-int" column external_ids

Comment 14 errata-xmlrpc 2021-12-09 15:37:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:5059


Note You need to log in before you can comment on or make changes to this bug.