Bug 1122872 - "oo-mco" does not timeout when no ActiveMQ available
Summary: "oo-mco" does not timeout when no ActiveMQ available
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 2.1.0
Hardware: Unspecified
OS: Unspecified
high
low
Target Milestone: ---
: ---
Assignee: Luke Meyer
QA Contact: libra bugs
URL:
Whiteboard:
: 1048148 1108462 1122876 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-24 09:20 UTC by Miguel Perez Colino
Modified: 2014-08-26 14:22 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-08-26 14:22:08 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1133958 None None None Never

Internal Links: 1133958

Description Miguel Perez Colino 2014-07-24 09:20:37 UTC
Description of problem:
Running "oo-mco" commands (i.e. "oo-mco ping") while all ActiveMQ services are stopped makes the utility to hang instead of returning an error after a timeout

Version-Release number of selected component (if applicable):
OSE 2.1.2
openshift-origin-util-scl-1.17.2.3-1.el6op.noarch
ruby193-mcollective-client-2.4.1-5.el6op.noarch

How reproducible:
Shut down all ActiveMQ services and run "oo-mco ping"

Steps to Reproduce:
1. Stop all activeMQ services
2. Run "oo-mco ping" and wait
3. Keep waiting

Actual results:
No timeout for the process, it gets hanged instead of returning an error after a timeout

Expected results:
Procces stops with error after a timeout

Additional info:
Adding to "/opt/rh/ruby193/root/etc/mcollective/client.cfg" the following line solved the issue:
plugin.activemq.max_reconnect_attempts=3

Neither the installer nor the documentation take care of setting this option.

Comment 2 Luke Meyer 2014-07-24 13:38:54 UTC
*** Bug 1048148 has been marked as a duplicate of this bug. ***

Comment 3 Luke Meyer 2014-07-24 13:39:28 UTC
*** Bug 1108462 has been marked as a duplicate of this bug. ***

Comment 4 Luke Meyer 2014-07-24 13:39:33 UTC
*** Bug 1122876 has been marked as a duplicate of this bug. ***

Comment 5 Luke Meyer 2014-07-24 13:45:10 UTC
This problem has been observed before with several other mcollective-related commands, but not researched for a solution. It sounds like the proposed config will solve all the problems (giving us a timeout in a sane amount of time) and I can't see any downsides. I think the solution may have been overlooked because we definitely don't want the mcollective *server* to stop re-trying to connect; but clients have no need for this behavior and should time out.

When this is fixed, the following usage should all fail helpfully with ActiveMQ down (or not resolving):
oo-mco ping
oo-admin-ctl-district -c add-node
oo-stats
oo-accept-broker
oo-diagnostics

Given the hanging behavior and the simplicity of fixing it, I propose fixing this sooner rather than later.

Comment 6 Luke Meyer 2014-08-25 23:36:00 UTC
https://github.com/openshift/openshift-extras/pull/440

Changing the installer to set decent defaults for mcollective timeouts. I would consider an ose-upgrade automatic modification to mco configuration but at this time I think it may be best just to note the changes made:

broker: add to /opt/rh/ruby193/root/etc/mcollective/client.cfg
# Broker will retry ActiveMQ connection, then report error
plugin.activemq.initial_reconnect_delay = 0.1
plugin.activemq.max_reconnect_attempts = 6

node: add to /opt/rh/ruby193/root/etc/mcollective/server.cfg
# Node should retry connecting to ActiveMQ forever
plugin.activemq.max_reconnect_attempts = 0
plugin.activemq.initial_reconnect_delay = 0.1
plugin.activemq.max_reconnect_delay = 4.0

Comment 7 Ma xiaoqiang 2014-08-26 05:11:49 UTC
Check on puddle [2.1.z/2014-08-25.2]

1. Stop all activeMQ services
2. Run "oo-mco ping" and wait
3. Run the others related command 
#oo-admin-ctl-district -c add-node
#oo-stats
#oo-diagnostics

The output contains:
error running test_node_profiles_districts_from_broker: #<OpenShift::NodeUnavailableException: Could not connect to ActiveMQ Server: Stomp::Error::MaxReconnectAttempts>

Give out useful message.


Note You need to log in before you can comment on or make changes to this bug.