Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1672606

Summary:	Transactions QUEUE in ovsdbapp doesn't have a limit and could result in OOM-killer
Product:	Red Hat OpenStack	Reporter:	Lucas Alvares Gomes <lmartins>
Component:	python-ovsdbapp	Assignee:	Terry Wilson <twilson>
Status:	CLOSED NOTABUG	QA Contact:	Toni Freger <tfreger>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	13.0 (Queens)	CC:	jraju
Target Milestone:	---	Keywords:	Triaged, ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-10-17 21:09:03 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Lucas Alvares Gomes 2019-02-05 12:03:26 UTC

Description of problem:

The ovsdbapp library queue up transactions that still needs to be processed [0]. When investigating the bug [1] we noticed that due to a wrong behavior the router gateway ports started to flap between different nodes which generated a huge amount of transactions (bound/unbound/bound/unbound). That led to the ovsdbapp queue until there was no memory left on the system and it got OOM-killed, we believe.

This BZ is about investigating such issues.

[0] https://github.com/openstack/ovsdbapp/blob/bc06517ba37037cc21699c5e34a18b21379ea9f0/ovsdbapp/backend/ovs_idl/connection.py#L38-L39

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1670794

Version-Release number of selected component (if applicable):

OSP 13

Steps to Reproduce:

It's possible to mimic the port flap problem in a more constraint environment by:

1. Cleaning the port_binding chassis column (from a port type "chassisredirect") in the SBDB in a loop. That will result in a lot of transactions being created, for example:

while true; do ovn-sbctl clear port_binding ae2db6eb-26d1-401e-aa0f-c866ec9974f3 chassis; done

Actual results:

Memory consumption can go up indefinitely  

Expected results:

Something to discuss. We believe that we could cap the size of the queue and if it goes up that number perhaps ovsdbapp could do a full sync instead of processing each transaction in the queue ?

Comment 2 Terry Wilson 2019-10-17 21:09:03 UTC

Closing this since there really isn't much we can do about this. It isn't a bug that it is technically possible to send commands faster than they can be processed. It's also possible on an even lower level to stream arbitrarily long strings json strings until ovsdb-server runs out of memory. When I tried to fix that in upstream OVS it was decided that adding an arbitrary cap wasn't something that they though was a good idea either. If there is a real-world situation where we are getting so behind on processing that the queue grows large enough to matter, something is *super* broken.