Bug 2216579

Summary: deployment failed at step2 - rabbitmq_init_bundle failed on controller0
Product: Red Hat OpenStack Reporter: Francois Palin <fpalin>
Component: puppet-tripleoAssignee: OSP Team <rhos-maint>
Status: CLOSED CANTFIX QA Contact: Joe H. Rahme <jhakimra>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: bshephar, jeckersb, jjoyce, jschluet, lmiccini, rhos-maint, rosingh, slinaber, tarcher, tvignaud
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-11 12:48:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 6 Luca Miccini 2023-08-11 12:48:20 UTC
we had an issue in the past where additional Rabbitmq config options were ignored, this seems to be the case here:

https://bugzilla.redhat.com/show_bug.cgi?id=1848705 

extract from rabbitmq-env.conf where we see there are no additional options:

RABBITMQ_CTL_DIST_PORT_MAX=25683
RABBITMQ_CTL_DIST_PORT_MIN=25673
RABBITMQ_CTL_ERL_ARGS="+sbwt none -proto_dist inet6_tcp"
RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+sbwt none"
RABBITMQ_SERVER_ERL_ARGS="-proto_dist inet6_tcp"

this results in a too low value for the inter-node communication buffer size:

https://www.rabbitmq.com/runtime.html

Inter-node Communication Buffer Size
Inter-node traffic between a pair of nodes uses a TCP connection with a buffer known as the inter-node communication buffer.

When the buffer is hovering around full capacity, nodes will log a warning mentioning a busy distribution port (busy_dist_port):
2019-04-06 22:48:19.031 [warning] <0.242.0> rabbit_sysmon_handler busy_dist_port <0.1401.0>
Increasing buffer size may help increase throughput and/or reduce latency.


Solution/workaround in 16.1 is to set the following parameter in any of the templates used to deploy the overcloud:

RabbitAdditionalErlArgs: "'+sbwt none +zdbbl 128000 +P 1048576 +t 5000000'"

so to force a more appropriate value for the aforementioned buffer size.


This is already fixed in 16.2.