Bug 2216579 - deployment failed at step2 - rabbitmq_init_bundle failed on controller0
Summary: deployment failed at step2 - rabbitmq_init_bundle failed on controller0
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: OSP Team
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-21 22:21 UTC by Francois Palin
Modified: 2023-08-11 12:48 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-11 12:48:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-25954 0 None None None 2023-06-21 22:24:48 UTC

Comment 6 Luca Miccini 2023-08-11 12:48:20 UTC
we had an issue in the past where additional Rabbitmq config options were ignored, this seems to be the case here:

https://bugzilla.redhat.com/show_bug.cgi?id=1848705 

extract from rabbitmq-env.conf where we see there are no additional options:

RABBITMQ_CTL_DIST_PORT_MAX=25683
RABBITMQ_CTL_DIST_PORT_MIN=25673
RABBITMQ_CTL_ERL_ARGS="+sbwt none -proto_dist inet6_tcp"
RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+sbwt none"
RABBITMQ_SERVER_ERL_ARGS="-proto_dist inet6_tcp"

this results in a too low value for the inter-node communication buffer size:

https://www.rabbitmq.com/runtime.html

Inter-node Communication Buffer Size
Inter-node traffic between a pair of nodes uses a TCP connection with a buffer known as the inter-node communication buffer.

When the buffer is hovering around full capacity, nodes will log a warning mentioning a busy distribution port (busy_dist_port):
2019-04-06 22:48:19.031 [warning] <0.242.0> rabbit_sysmon_handler busy_dist_port <0.1401.0>
Increasing buffer size may help increase throughput and/or reduce latency.


Solution/workaround in 16.1 is to set the following parameter in any of the templates used to deploy the overcloud:

RabbitAdditionalErlArgs: "'+sbwt none +zdbbl 128000 +P 1048576 +t 5000000'"

so to force a more appropriate value for the aforementioned buffer size.


This is already fixed in 16.2.


Note You need to log in before you can comment on or make changes to this bug.