Bug 1778428

Summary: Increase rabbitmq tcp listen backlog
Product: Red Hat OpenStack Reporter: John Eckersberg <jeckersb>
Component: openstack-tripleo-heat-templatesAssignee: Michele Baldessari <michele>
Status: CLOSED CURRENTRELEASE QA Contact: nlevinki <nlevinki>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: dabarzil, jjoyce, jschluet, lmiccini, mburns, michele, slinaber, tvignaud
Target Milestone: z11Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.4.1-42.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-17 10:36:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Eckersberg 2019-12-01 00:11:53 UTC
Quick bug during an escalation for bug 1778388 ...

We need to tune the default rabbitmq tcp listen backlog.  Currently it defaults to 128, but here's what happens...

Say we have 1500 total rabbitmq client connections spread across a 3 node cluster, evenly distributed so each node has 500 clients.

Then, we stop rabbitmq on one of the nodes.

Now those 500 client connections all immediately fail over to the other two node.  Assume roughly even split, and each gets 250 connections simultaneously.  Since the tcp listen backlog is only 128, a large number of the failover connections cannot connect and get ECONNREFUSED because the kernel just drops them.

Eventually things retry and the backlog clears, but it just makes things noisy in the logs and makes failover take a little bit longer.

Upstream docs discuss here:

https://www.rabbitmq.com/networking.html#tuning-for-large-number-of-connections-connection-backlog

I propose we increase the default to 4096 to avoid this scenario.

Comment 5 Lon Hohberger 2020-03-17 10:36:03 UTC
According to our records, this should be resolved by openstack-tripleo-heat-templates-8.4.1-42.el7ost.  This build is available now.