Description of problem: The ovsdbapp library queue up transactions that still needs to be processed [0]. When investigating the bug [1] we noticed that due to a wrong behavior the router gateway ports started to flap between different nodes which generated a huge amount of transactions (bound/unbound/bound/unbound). That led to the ovsdbapp queue until there was no memory left on the system and it got OOM-killed, we believe. This BZ is about investigating such issues. [0] https://github.com/openstack/ovsdbapp/blob/bc06517ba37037cc21699c5e34a18b21379ea9f0/ovsdbapp/backend/ovs_idl/connection.py#L38-L39 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1670794 Version-Release number of selected component (if applicable): OSP 13 Steps to Reproduce: It's possible to mimic the port flap problem in a more constraint environment by: 1. Cleaning the port_binding chassis column (from a port type "chassisredirect") in the SBDB in a loop. That will result in a lot of transactions being created, for example: while true; do ovn-sbctl clear port_binding ae2db6eb-26d1-401e-aa0f-c866ec9974f3 chassis; done Actual results: Memory consumption can go up indefinitely Expected results: Something to discuss. We believe that we could cap the size of the queue and if it goes up that number perhaps ovsdbapp could do a full sync instead of processing each transaction in the queue ?
Closing this since there really isn't much we can do about this. It isn't a bug that it is technically possible to send commands faster than they can be processed. It's also possible on an even lower level to stream arbitrarily long strings json strings until ovsdb-server runs out of memory. When I tried to fix that in upstream OVS it was decided that adding an arbitrary cap wasn't something that they though was a good idea either. If there is a real-world situation where we are getting so behind on processing that the queue grows large enough to matter, something is *super* broken.