Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1444609 - Journal multi threading causes neutron server to become unresponsive
Journal multi threading causes neutron server to become unresponsive
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-odl (Show other bugs)
11.0 (Ocata)
Unspecified Unspecified
high Severity high
: Upstream M3
: 12.0 (Pike)
Assigned To: Mike Kolesnik
Itzik Brown
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-04-23 04:19 EDT by Itzik Brown
Modified: 2018-10-18 03:19 EDT (History)
4 users (show)

See Also:
Fixed In Version: python-networking-odl-11.0.0-0.20170806093629.2e78dca.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
N/A
Last Closed: 2017-12-13 16:23:38 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1683797 None None None 2017-04-23 04:19 EDT
OpenStack gerrit 486606 None None None 2017-07-25 02:42 EDT
Red Hat Product Errata RHEA-2017:3462 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-15 20:43:25 EST

  None (edit)
Description Itzik Brown 2017-04-23 04:19:53 EDT
Description of problem:
We have noticed in scale testing that was made that the amount of journal threads is unlimited, so for a service running on a node with a high amount of cores the API workers and RPC workers default to the amount of cores, so for a 50 core machine we will get 100 neutron server processes. In each such process there will be at least 1 journal thread, but at most there could be several depending on how much V2 drivers are used since each one instantiates a thread.

While this is already not optimal, the influx of journal threads causes the DB to misbehave due to multiple threads either querying the journal table all the time or exhausting all available DB connections (on the server side).

Each time an operation occurs it is written to the DB and the thread gets awoken to take care of it right after. In case there's a post commit hook the journal entry will be processed immediately. In case there's no such hook point, there might be a race where the journal will "miss" the entry.
Each 5 seconds (default config) a timer will awaken the thread to take care of any such missed entries, and also other entries that weren't handled due to conditions such as network connectivity loss (which halts the journal processing, to avoid a busy loop).

There's no need to have more than one thread per process as a single thread will either get awoken by the operation callback or by the timer, and process all journal entries it can process.
Also considering that python doesn't have true parallelism, it makes little sense to have virtual multi-threading in this context.

Version-Release number of selected component (if applicable):
python-networking-odl-4.0.0-1.el7ost.noarch

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 Nir Yechiel 2017-07-20 04:57:34 EDT
@Mike, was this fixed already? Should we target this for RHOSP 12?

Thanks,
Nir
Comment 2 Mike Kolesnik 2017-07-23 07:25:07 EDT
It hasn't been fixed yet upstream, I'll get to it as soon as I can and will update the bug when the fix is available
Comment 4 Itzik Brown 2017-09-17 08:02:46 EDT
Scale testing didn't encounter this behavior.
I also didn't

Checked with:
python-networking-odl-11.0.1-0.20170831202719.81010b8.el7ost.noarch
Comment 7 errata-xmlrpc 2017-12-13 16:23:38 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Note You need to log in before you can comment on or make changes to this bug.