Description of problem: The python-ovs library does not support monitor_cond_since/update3. This means that for OSP, on large deployments where you can have 600+ connections to the SB ovsdb-server, that when you fail over ovn-dbs-bundle you break all of those connections and dump 600+ copies of the db simultaneously from the single-threaded ovsdb-server. It can take over 40 minutes for this process to actually complete and for the system to become usable. Version-Release number of selected component (if applicable): ovs master as of 2021-04-12 How reproducible: 100% Steps to Reproduce: 1. Initialize a ovsdb connection with python-ovs 2. call force_reconnect() Actual results: All rows from all registered tables are transferred to the client Expected results: Only modifications since the latest successful request are transferred to the client Additional info:
Cloned this bug for the necessary changes to ovsdb. But the python bits will need updates for monitor_cond_since too, so over to python-networking-ovn for that. These two things can proceed in parallel.
@Terry, if i'm not mistaken the only change required here is in the OVS component and it'd be adding support for 'update3' in its Python library right? No changes required in the OSP side, not even on ovsdb-server. Can you please confirm and select the right component?
This should just be the opensvwitch component. Specifically the Python ovs library in the openvswtich component (ovs/db/idl.py). There is nothing to do on the neutron/networking-ovn side.
(In reply to Terry Wilson from comment #5) > This should just be the opensvwitch component. Specifically the Python ovs > library in the openvswtich component (ovs/db/idl.py). There is nothing to do > on the neutron/networking-ovn side. Fair enough, should we assign to you since you and Ihar have the most relevant commits to that code in OVS while nobody else on the OVS/OVN team does?
Dan created https://bugzilla.redhat.com/show_bug.cgi?id=1957273 to track work on ovsdb. This BZ will be used to track Python OVS library.
Sure. I have a patch started, just working on fixing tests.
(In reply to Terry Wilson from comment #8) > Sure. I have a patch started, just working on fixing tests. Can you please update what's the status of the patch + tests?
It was determined that I should focus on getting raft support tested and migration working, so I haven't looked at this in a while. There were quite a few test failures, mostly it looked like it was due to the graceful fallback from monitor_cond_since -> monitor_cond -> monitor. I'll see if I can at least get a patch up on a github fork to give someone else a head start if this needs to progress without me.
Upstream patch: https://patchwork.ozlabs.org/project/openvswitch/patch/20211201175120.3229612-1-twilson@redhat.com/
Other patches related to the monitor-cond-since support patch: 4e3966e64 python: Politely handle misuse of table.condition d29491eeb python: idl: Set cond_changed to true if condition change requested. e3de0bd82 python: idl: Set cond_changed to false if last id is zero. 6de8868d1 reconnect: Fix broken inactivity probe if there is no other reason to wake up. 5202710a7 python: idl: Clear last_id on reconnect if condition changes in-flight. 718dc8fca python: idl: Resend requested but not acked conditions when reconnecting. 46d44cf3b python: idl: Add monitor_cond_since support. all appear to be in branch-2.17.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (openvswitch2.17 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:5445