1123296 – Rubygem-staypuft: HA: rabbitmq haproxy config should set its timeout must be a lot higher.

Bug 1123296 - Rubygem-staypuft: HA: rabbitmq haproxy config should set its timeout must be a lot higher.

Summary: Rubygem-staypuft: HA: rabbitmq haproxy config should set its timeout must be ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-foreman-installer
Sub Component:
Version:	5.0 (RHEL 7)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z1
Target Release:	Installer
Assignee:	John Eckersberg
QA Contact:	Leonid Natapov
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-07-25 09:13 UTC by Leonid Natapov
Modified:	2016-04-26 18:07 UTC (History)
CC List:	11 users (show)
Fixed In Version:	openstack-foreman-installer-2.0.24-1.el6ost
Doc Type:	Bug Fix
Doc Text:	Previously, the timeout for RabbitMQ HAProxy was set too short. As a consequence, services disconnected and reconnected too often. This has been fixed by increasing the timeout period. Now, services will disconnect and reconnect less frequently.
Clone Of:
Environment:
Last Closed:	2014-10-01 13:25:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2014:1350	0	normal	SHIPPED_LIVE	Red Hat Enterprise Linux OpenStack Platform Bug Fix Advisory	2014-10-01 17:22:34 UTC

Description Leonid Natapov 2014-07-25 09:13:31 UTC

Rubygem-staypuft: HA: amqp is still set to A/P hot-standby mode. Should be A/A.
its timeout must be a lot higher.  (900m or so) due to rabbitmq oslo driver missing a rabbitmq_heartbeat feature. This is a known upstream limitation/bug (already reported). Or you will end up with services reconnecting a gazillion times to rabbit.

Comment 1 Perry Myers 2014-07-25 13:12:50 UTC

(In reply to Leonid Natapov from comment #0)
> Rubygem-staypuft: HA: amqp is still set to A/P hot-standby mode. Should be
> A/A.

A/P hot standby mode is absolutely valid for when the user has selected qpid as the messaging layer.

Did you mean for $subject of the bug to read:
HA: When using RabbitMQ, amqp is still set to A/P hot-standby mode and its timout must be a lot higher.

Comment 2 Jason Guiditta 2014-07-31 13:42:53 UTC

(In reply to Leonid Natapov from comment #0)
> Rubygem-staypuft: HA: amqp is still set to A/P hot-standby mode. Should be
> A/A.
> its timeout must be a lot higher.  (900m or so) due to rabbitmq oslo driver
> missing a rabbitmq_heartbeat feature. This is a known upstream
> limitation/bug (already reported). Or you will end up with services
> reconnecting a gazillion times to rabbit.

Are you referring to the haproxy timeout here, or rabbit itself (or both)?  If both, I would think we want haproxy to not have exactly the same timeout value as the service it is in front of, so what are the values we want to set here?

Comment 3 Fabio Massimo Di Nitto 2014-08-01 05:29:07 UTC

(In reply to Jason Guiditta from comment #2)
> (In reply to Leonid Natapov from comment #0)
> > Rubygem-staypuft: HA: amqp is still set to A/P hot-standby mode. Should be
> > A/A.
> > its timeout must be a lot higher.  (900m or so) due to rabbitmq oslo driver
> > missing a rabbitmq_heartbeat feature. This is a known upstream
> > limitation/bug (already reported). Or you will end up with services
> > reconnecting a gazillion times to rabbit.
> 
> Are you referring to the haproxy timeout here, or rabbit itself (or both)? 
> If both, I would think we want haproxy to not have exactly the same timeout
> value as the service it is in front of, so what are the values we want to
> set here?

haproxy timeout.

Comment 5 Mike Burns 2014-08-11 14:26:04 UTC

Eck,  any updates?

Comment 6 John Eckersberg 2014-08-11 18:24:36 UTC

The A/A bit has been fixed, see bug 1121185.

The timeout bit is more involved.  The short of it is to read through this upstream bug:

https://bugs.launchpad.net/oslo.messaging/+bug/856764/

Setting the haproxy timeout higher isn't necessarily going to help things, and may hurt under some circumstances.  I need to do some more legwork to examine all the different versions of components we have for RHOS5 (amqplib, python-kombu, rabbitmq-server) and evaluate the state of oslo.messaging, which components are using it, and which are off doing their own thing.  It's a tangled mess.

Comment 7 Mike Burns 2014-09-12 14:52:58 UTC

Any updates on this?

Comment 8 John Eckersberg 2014-09-15 18:10:05 UTC

I am going to update the backend timeout to 900m to match:

https://github.com/fabbione/rhos-ha-deploy/blob/master/rhos5-rhel7/mrgcloud-setup/RHOS-RHEL-HA-how-to-mrgcloud-rhos5-on-rhel7-lb-latest.txt#L41

The oslo stuff is being tracked elsewhere.  Pretend I didn't mention it here.

Comment 9 John Eckersberg 2014-09-15 18:10:58 UTC

I should have said, update both the backend and client timeouts to 900m.

Comment 10 John Eckersberg 2014-09-16 16:38:43 UTC

https://github.com/redhat-openstack/astapor/pull/365

Comment 13 John Eckersberg 2014-09-17 15:40:35 UTC

*** Bug 1142915 has been marked as a duplicate of this bug. ***

Comment 14 Leonid Natapov 2014-09-22 12:30:37 UTC

listen amqp
  bind 192.168.0.36:5672
  mode  tcp
  option  tcplog
  timeout  client 900m
  timeout  server 900m

Comment 16 errata-xmlrpc 2014-10-01 13:25:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1350.html

Note You need to log in before you can comment on or make changes to this bug.