Bug 1672606 - Transactions QUEUE in ovsdbapp doesn't have a limit and could result in OOM-killer
Summary: Transactions QUEUE in ovsdbapp doesn't have a limit and could result in OOM-k...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-ovsdbapp
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Terry Wilson
QA Contact: Toni Freger
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-05 12:03 UTC by Lucas Alvares Gomes
Modified: 2019-10-17 21:09 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-17 21:09:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Lucas Alvares Gomes 2019-02-05 12:03:26 UTC
Description of problem:

The ovsdbapp library queue up transactions that still needs to be processed [0]. When investigating the bug [1] we noticed that due to a wrong behavior the router gateway ports started to flap between different nodes which generated a huge amount of transactions (bound/unbound/bound/unbound). That led to the ovsdbapp queue until there was no memory left on the system and it got OOM-killed, we believe.

This BZ is about investigating such issues.

[0] https://github.com/openstack/ovsdbapp/blob/bc06517ba37037cc21699c5e34a18b21379ea9f0/ovsdbapp/backend/ovs_idl/connection.py#L38-L39

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1670794

Version-Release number of selected component (if applicable):

OSP 13

Steps to Reproduce:

It's possible to mimic the port flap problem in a more constraint environment by:

1. Cleaning the port_binding chassis column (from a port type "chassisredirect") in the SBDB in a loop. That will result in a lot of transactions being created, for example:

while true; do ovn-sbctl clear port_binding ae2db6eb-26d1-401e-aa0f-c866ec9974f3 chassis; done

Actual results:

Memory consumption can go up indefinitely  

Expected results:

Something to discuss. We believe that we could cap the size of the queue and if it goes up that number perhaps ovsdbapp could do a full sync instead of processing each transaction in the queue ?

Comment 2 Terry Wilson 2019-10-17 21:09:03 UTC
Closing this since there really isn't much we can do about this. It isn't a bug that it is technically possible to send commands faster than they can be processed. It's also possible on an even lower level to stream arbitrarily long strings json strings until ovsdb-server runs out of memory. When I tried to fix that in upstream OVS it was decided that adding an arbitrary cap wasn't something that they though was a good idea either. If there is a real-world situation where we are getting so behind on processing that the queue grows large enough to matter, something is *super* broken.


Note You need to log in before you can comment on or make changes to this bug.