Bug 2069623
| Summary: | [OVN SCALE][ovn-northd] Unnecessary SB record no-op changes added to SB transaction. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Dumitru Ceara <dceara> | ||||
| Component: | OVN | Assignee: | Dumitru Ceara <dceara> | ||||
| Status: | CLOSED ERRATA | QA Contact: | ying xu <yinxu> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | FDP 22.B | CC: | ctrautma, jiji, jishi, mmichels, surya | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | ovn22.03-22.03.0-24.el8fdp ovn22.03-22.03.0-24.el9fdp | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 2073473 (view as bug list) | Environment: | |||||
| Last Closed: | 2022-06-30 18:00:28 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 2073473 | ||||||
| Attachments: |
|
||||||
|
Description
Dumitru Ceara
2022-03-29 09:37:38 UTC
v2 posted for review: https://patchwork.ozlabs.org/project/ovn/list/?series=292465&state=* Posted patch to add support for change tracking of write-only columns in the IDL instead: https://patchwork.ozlabs.org/project/openvswitch/list/?series=293071&state=* This is a more generic solution. Moving back to ASSIGNED until the OVS patch is accepted and a patch to bump the OVS submodule in OVN is posted. We are testing https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimonar@redhat.com/ which delays setting port-up:true in case SBDB is readonly. Quoting Dumitru: It's just to avoid some unnecessary full recomputes locally. Right, if ovn-controller wakes up and processes some changes incrementally, if the result of that processing means it has to write something in the SB but the previous transaction it had sent to the SB is still in progress then we abort the incremental processing and trigger a full recompute (of everything) after the in-flight transaction is completed. We did four runs: 1st run: 8.26p99/4.44avg 2nd run: 7.5 P99 4.12avg 3rd run with logging: 7.77 P99 / 4.22 avg 4th run with logging: 8.16 P99 / 4.39 avg $ grep ovn-installed ./ovnkube-node-*/ovn-controller/ovn-controller/logs/* | grep -oE 'delayed by [0-9]+ msec' | sort -h -k 3 | tail -50 delayed by 80 msec delayed by 80 msec delayed by 82 msec delayed by 85 msec delayed by 85 msec delayed by 85 msec delayed by 87 msec delayed by 87 msec delayed by 87 msec delayed by 88 msec delayed by 88 msec delayed by 89 msec delayed by 91 msec delayed by 94 msec delayed by 95 msec delayed by 96 msec delayed by 97 msec delayed by 99 msec delayed by 102 msec delayed by 108 msec delayed by 114 msec delayed by 118 msec delayed by 119 msec delayed by 121 msec delayed by 123 msec delayed by 123 msec delayed by 132 msec delayed by 136 msec delayed by 139 msec delayed by 139 msec delayed by 147 msec delayed by 147 msec delayed by 147 msec delayed by 147 msec delayed by 149 msec delayed by 151 msec delayed by 156 msec delayed by 159 msec delayed by 168 msec delayed by 170 msec delayed by 170 msec delayed by 170 msec delayed by 171 msec delayed by 176 msec delayed by 176 msec delayed by 183 msec delayed by 208 msec delayed by 208 msec delayed by 235 msec delayed by 525 msec Xavier/Dumitru is investigating the delay on their side. Probably might not get in before FF, but will get in before CF. (In reply to Surya Seetharaman from comment #4) > We are testing > https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1- > xsimonar/ which delays setting port-up:true in case SBDB is > readonly. > I think there might be some confusion. Xavier's patch is independent of the fix for this BZ. We were just scale testing it in a build which included the fix for bug 2069623 too. The proper fix for bug 2069623 is under review upstream and has two parts: 1. changing the IDL to support change-tracking of write-only columns: https://patchwork.ozlabs.org/project/openvswitch/list/?series=293071&state=* 2. changing ovn-northd to use the new IDL mode. > Quoting Dumitru: > It's just to avoid some unnecessary full recomputes locally. Right, if > ovn-controller wakes up and processes some changes incrementally, if the > result of that processing means it has to write something in the SB but the > previous transaction it had sent to the SB is still in progress then we > abort the incremental processing and trigger a full recompute (of > everything) after the in-flight transaction is completed. > [...] > > Xavier/Dumitru is investigating the delay on their side. Probably might not > get in before FF, but will get in before CF. For bug 2069623 we don't need Xavier's change. We just need the two parts mentioned above to happen before OCP code freeze. (In reply to Dumitru Ceara from comment #5) > (In reply to Surya Seetharaman from comment #4) > > We are testing > > https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1- > > xsimonar/ which delays setting port-up:true in case SBDB is > > readonly. > > > > I think there might be some confusion. Xavier's patch is independent of > the fix for this BZ. We were just scale testing it in a build which > included the fix for bug 2069623 too. > > The proper fix for bug 2069623 is under review upstream and has two > parts: > > 1. changing the IDL to support change-tracking of write-only columns: > https://patchwork.ozlabs.org/project/openvswitch/list/?series=293071&state=* > 2. changing ovn-northd to use the new IDL mode. > > > Quoting Dumitru: > > It's just to avoid some unnecessary full recomputes locally. Right, if > > ovn-controller wakes up and processes some changes incrementally, if the > > result of that processing means it has to write something in the SB but the > > previous transaction it had sent to the SB is still in progress then we > > abort the incremental processing and trigger a full recompute (of > > everything) after the in-flight transaction is completed. > > > > [...] > > > > > Xavier/Dumitru is investigating the delay on their side. Probably might not > > get in before FF, but will get in before CF. > > For bug 2069623 we don't need Xavier's change. We just need the two > parts mentioned above to happen before OCP code freeze. Thanks Dumitru, I was the confused one, clearly I have too many things clouding my judgement and mixing up bz-es/prs v2 of the IDL patch: https://patchwork.ozlabs.org/project/openvswitch/list/?series=295369&state=* v3 of the IDL patch: https://patchwork.ozlabs.org/project/openvswitch/list/?series=296026&state=* v4 of the IDL patch: https://patchwork.ozlabs.org/project/openvswitch/list/?series=296892&state=* IDL patch was accepted, posted OVN patch to use the new IDL mode: https://patchwork.ozlabs.org/project/ovn/list/?series=297426&state=* The patch to bump the OVS submodule was accepted upstream: https://github.com/ovn-org/ovn/commit/f93206ce40ae38f26ffaf69c5d79a63abfbd5e3c And backported downstream to: ovn22.03-22.03.0-24.el8fdp ovn22.03-22.03.0-24.el9fdp reproduced on version:
# rpm -qa|grep ovn
ovn22.03-central-22.03.0-22.el8fdp.x86_64
ovn22.03-host-22.03.0-22.el8fdp.x86_64
ovn22.03-22.03.0-22.el8fdp.x86_64
systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:127.0.0.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=127.0.0.1
systemctl restart ovn-controller
ovs-vsctl set open . external_ids:ovn-monitor-all=true
ovn-nbctl ls-add ls -- lsp-add ls lsp
sleep 2
ovn-appctl -t ovn-northd vlog/disable-rate-limit
ovn-appctl -t ovn-northd vlog/set jsonrpc:dbg
sleep 5
> /var/log/ovn/ovn-northd.log
ovn-nbctl --wait=sb sync
sleep 5
cat /var/log/ovn/ovn-northd.log |grep OVN_Southbound
cat /var/log/ovn/ovn-northd.log |grep OVN_Southbound'
2022-06-13T03:04:55.054Z|00015|jsonrpc|DBG|unix:/run/ovn/ovnsb_db.sock: send request, method="transact", params=["OVN_Southbound",{"where":[["_uuid","==",["uuid","2a210b2d-6365-4a29-92b4-d31f070dbca3"]]],"row":{"ports":["uuid","68807ccd-7c12-4b9e-bdb9-941b40b58d1c"]},"op":"update","table":"Multicast_Group"},{"where":[["_uuid","==",["uuid","9d3f9379-6535-4aaf-addf-ea214cd08904"]]],"row":{"query_max_resp":1,"query_interval":150,"datapath":["uuid","c5330e17-6e9f-49e1-927a-d3e8cd9270e9"],"table_size":2048,"querier":true,"idle_timeout":300,"enabled":false},"op":"update","table":"IP_Multicast"},{"where":[["_uuid","==",["uuid","68807ccd-7c12-4b9e-bdb9-941b40b58d1c"]]],"row":{"parent_port":["set",[]],"mac":["set",[]],"external_ids":["map",[]],"options":["map",[]],"nat_addresses":["set",[]],"type":"","datapath":["uuid","c5330e17-6e9f-49e1-927a-d3e8cd9270e9"],"tag":["set",[]]},"op":"update","table":"Port_Binding"},{"where":[["_uuid","==",["uuid","814e67e7-2104-4cf8-ae34-fef88f361d54"]]],"row":{"ports":["uuid","68807ccd-7c12-4b9e-bdb9-941b40b58d1c"]},"op":"update","table":"Multicast_Group"},{"where":[["_uuid","==",["uuid","b9be81e3-a57e-4ee9-a2a6-a141a407ae5a"]]],"row":{"addresses":"42:ff:bf:e2:aa:ec"},"op":"update","table":"Address_Set"},{"where":[["_uuid","==",["uuid","d318b1df-dc64-4839-b311-71b3229a1425"]]],"row":{"nb_cfg":1,"options":["map",[["mac_prefix","5a:0b:d4"],["max_tunid","16711680"],["northd_internal_version","22.03.1-20.21.0-58.3"],["svc_monitor_mac","42:ff:bf:e2:aa:ec"]]]},"op":"update","table":"SB_Global"},{"where":[["_uuid","==",["uuid","c5330e17-6e9f-49e1-927a-d3e8cd9270e9"]]],"row":{"external_ids":["map",[["logical-switch","79ed4e8e-0836-46bb-b37b-0a968fb7c5cd"],["name","ls"]]]},"op":"update","table":"Datapath_Binding"},{"comment":"ovn-northd","op":"comment"},{"lock":"ovn_northd","op":"assert"}], id=19
verified on version:
ovn22.03-central-22.03.0-52.el8fdp.x86_64
ovn22.03-host-22.03.0-52.el8fdp.x86_64
ovn22.03-22.03.0-52.el8fdp.x86_64
cat /var/log/ovn/ovn-northd.log |grep OVN_Southbound'
2022-06-13T02:27:46.536Z|00015|jsonrpc|DBG|unix:/run/ovn/ovnsb_db.sock: send request, method="transact", params=["OVN_Southbound",{"where":[["_uuid","==",["uuid","96cdaf1c-9453-446a-aca7-587e0e758bae"]]],"row":{"nb_cfg":1},"op":"update","table":"SB_Global"},{"comment":"ovn-northd","op":"comment"},{"lock":"ovn_northd","op":"assert"}], id=18
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn22.03 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:5447 |