Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1591501

Summary: Default RabbitMQ timeout settings cause issues with OpenStack services.
Product: Red Hat OpenStack Reporter: Siggy Sigwald <ssigwald>
Component: openstack-tripleo-heat-templatesAssignee: John Eckersberg <jeckersb>
Status: CLOSED ERRATA QA Contact: Gurenko Alex <agurenko>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: jeckersb, jjoyce, jschluet, mburns, michele, mkrcmari, pkomarov, slinaber, ssigwald, tvignaud
Target Milestone: z9Keywords: Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-5.3.10-14.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1592554 (view as bug list) Environment:
Last Closed: 2018-09-17 16:56:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1592554    

Description Siggy Sigwald 2018-06-14 22:10:28 UTC
Description of problem:
RabbitMQ timeout is set to 5000ms (5 seconds) by default in all our deployments.
In some cases if the load on the system or network is too high the 5 second timeout can force RabbitMQ into a split brain situation. 
This value is hardcoded in /usr/share/openstack-tripleo-heat-templates/puppet/services/rabbitmq.yaml

Current value:
RABBITMQ_SERVER_ERL_ARGS: '"+K true +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<<5000:64/native>>}] -kernel inet_default_listen_options [{raw,6,18,<<5000:64/native>>}]"'

Should be changed to:
RABBITMQ_SERVER_ERL_ARGS: '"+K true +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<<30000:64/native>>}] -kernel inet_default_listen_options [{raw,6,18,<<30000:64/native>>}]"'

Version-Release number of selected component (if applicable):
This setting is present in all currently supported versions of RHOSP

How reproducible:
100%

Comment 2 John Eckersberg 2018-06-18 20:02:37 UTC
This was fixed in 12/pike by increasing it from 5 to 15 seconds:

https://review.openstack.org/#/c/485248/

We should backport that for 10/newton and 11/ocata.

Note that 13/queens removes this behavior entirely, see:

https://review.openstack.org/#/c/503788/
https://bugs.launchpad.net/tripleo/+bug/1717006

Comment 14 Alex McLeod 2018-09-03 08:01:31 UTC
Hi there,

If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field.

The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Thanks,
Alex

Comment 16 errata-xmlrpc 2018-09-17 16:56:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2670