Due to a recent update on Javascript code a full page refresh on your browser might be needed.
Bug 1778428 - Increase rabbitmq tcp listen backlog
Summary: Increase rabbitmq tcp listen backlog
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z11
: 13.0 (Queens)
Assignee: Michele Baldessari
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-01 00:11 UTC by John Eckersberg
Modified: 2020-03-19 14:56 UTC (History)
7 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.4.1-42.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-17 10:36:03 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Launchpad 1854704 None None None 2019-12-02 08:14:08 UTC
OpenStack gerrit 699916 None MERGED Increase rabbitmq tcp backlog 2020-04-01 17:27:39 UTC

Description John Eckersberg 2019-12-01 00:11:53 UTC
Quick bug during an escalation for bug 1778388 ...

We need to tune the default rabbitmq tcp listen backlog.  Currently it defaults to 128, but here's what happens...

Say we have 1500 total rabbitmq client connections spread across a 3 node cluster, evenly distributed so each node has 500 clients.

Then, we stop rabbitmq on one of the nodes.

Now those 500 client connections all immediately fail over to the other two node.  Assume roughly even split, and each gets 250 connections simultaneously.  Since the tcp listen backlog is only 128, a large number of the failover connections cannot connect and get ECONNREFUSED because the kernel just drops them.

Eventually things retry and the backlog clears, but it just makes things noisy in the logs and makes failover take a little bit longer.

Upstream docs discuss here:

https://www.rabbitmq.com/networking.html#tuning-for-large-number-of-connections-connection-backlog

I propose we increase the default to 4096 to avoid this scenario.

Comment 5 Lon Hohberger 2020-03-17 10:36:03 UTC
According to our records, this should be resolved by openstack-tripleo-heat-templates-8.4.1-42.el7ost.  This build is available now.


Note You need to log in before you can comment on or make changes to this bug.