1133958 – Configure mco client conf so broker can handle activemq outages gracefully

Bug 1133958 - Configure mco client conf so broker can handle activemq outages gracefully

Summary: Configure mco client conf so broker can handle activemq outages gracefully

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Documentation
Sub Component:
Version:	2.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	brice
QA Contact:	Alex Dellapenta
Docs Contact:
URL:
Whiteboard:
Depends On:	1065048
Blocks:
TreeView+	depends on / blocked

Reported:	2014-08-26 14:20 UTC by Luke Meyer
Modified:	2015-07-20 00:23 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	1065048
Environment:
Last Closed:	2014-09-15 04:57:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1122872	0	high	CLOSED	"oo-mco" does not timeout when no ActiveMQ available	2021-02-22 00:41:40 UTC

Internal Links: 1122872

Description Luke Meyer 2014-08-26 14:20:52 UTC

+++ This bug was initially created as a clone of Bug #1065048 +++

Description of problem:
Previously when activemq was unavailable (which can happen due to any number of failures: DNS record missing, network broken, port blocked, activemq stopped or crashed...) the broker set no timeout in its attempt to reach activemq via MCollective. Thus the user experience was that their requests to the broker stalled until the httpd request timed out, and they would get no useful error message. There wasn't even anything in the broker logs to indicate to an administrator what is going on.

The installer was changed to address this by configuring the mco client so that it gives up with a relevant error message after a brief period of trying to connect. Incidentally I also changed the default server (node) connection retry timeout so nodes would reconnect faster after an activemq outage.
https://github.com/openshift/openshift-extras/pull/440

The following related changes should be made in the relevant docs sections:

broker mco client config: add to /opt/rh/ruby193/root/etc/mcollective/client.cfg
# Broker will retry ActiveMQ connection, then report error
plugin.activemq.initial_reconnect_delay = 0.1
plugin.activemq.max_reconnect_attempts = 6

node mco server config: add to /opt/rh/ruby193/root/etc/mcollective/server.cfg
# Node should retry connecting to ActiveMQ forever
plugin.activemq.max_reconnect_attempts = 0
plugin.activemq.initial_reconnect_delay = 0.1
plugin.activemq.max_reconnect_delay = 4.0

Comment 2 brice 2014-08-27 06:06:26 UTC

Hi, Luke.

I added the recommended stanzas to 7.7.2 and 8.7.

Luke, is that all that was needed for this BZ?

Thanks!

Comment 10 Luke Meyer 2014-09-08 19:58:53 UTC

Looks good.

Comment 11 brice 2014-09-09 02:06:55 UTC

Groovy, thanks Luke. 

Putting this onto QA.

Comment 12 Alex Dellapenta 2014-09-11 18:51:11 UTC

QA'd, looks good.

Note You need to log in before you can comment on or make changes to this bug.