+++ This bug was initially created as a clone of Bug #1065048 +++ Description of problem: Previously when activemq was unavailable (which can happen due to any number of failures: DNS record missing, network broken, port blocked, activemq stopped or crashed...) the broker set no timeout in its attempt to reach activemq via MCollective. Thus the user experience was that their requests to the broker stalled until the httpd request timed out, and they would get no useful error message. There wasn't even anything in the broker logs to indicate to an administrator what is going on. The installer was changed to address this by configuring the mco client so that it gives up with a relevant error message after a brief period of trying to connect. Incidentally I also changed the default server (node) connection retry timeout so nodes would reconnect faster after an activemq outage. https://github.com/openshift/openshift-extras/pull/440 The following related changes should be made in the relevant docs sections: broker mco client config: add to /opt/rh/ruby193/root/etc/mcollective/client.cfg # Broker will retry ActiveMQ connection, then report error plugin.activemq.initial_reconnect_delay = 0.1 plugin.activemq.max_reconnect_attempts = 6 node mco server config: add to /opt/rh/ruby193/root/etc/mcollective/server.cfg # Node should retry connecting to ActiveMQ forever plugin.activemq.max_reconnect_attempts = 0 plugin.activemq.initial_reconnect_delay = 0.1 plugin.activemq.max_reconnect_delay = 4.0
Hi, Luke. I added the recommended stanzas to 7.7.2 and 8.7. Luke, is that all that was needed for this BZ? Thanks!
Looks good.
Groovy, thanks Luke. Putting this onto QA.
QA'd, looks good.