Bug 1065047
Summary: | broker does not handle activemq outages gracefully | |||
---|---|---|---|---|
Product: | OpenShift Online | Reporter: | Luke Meyer <lmeyer> | |
Component: | Pod | Assignee: | Lili Nader <lnader> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | libra bugs <libra-bugs> | |
Severity: | medium | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 1.x | CC: | dmcphers, jhou, lnader, mfisher | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1065048 (view as bug list) | Environment: | ||
Last Closed: | 2014-05-15 15:28:11 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1065048 |
Description
Luke Meyer
2014-02-13 18:28:14 UTC
It seems like the client tries indefinitely ( I gave up waiting after 147 attempts) to connect I, [2014-04-09T15:02:22.297553 #3298] INFO -- : activemq.rb:113:in `on_connecting' TCP Connection attempt 0 to stomp://mcollective:6163 I, [2014-04-09T15:02:22.299048 #3298] INFO -- : activemq.rb:128:in `on_connectfail' TCP Connection to stomp://mcollective:6163 failed on attempt 0 . . . I, [2014-04-09T16:07:03.439041 #3298] INFO -- : activemq.rb:113:in `on_connecting' TCP Connection attempt 140 to stomp://mcollective:6163 I, [2014-04-09T16:07:03.439698 #3298] INFO -- : activemq.rb:128:in `on_connectfail' TCP Connection to stomp://mcollective:6163 failed on attempt 140 There are ways to configure how many times it attempts to connect before returning an error according to https://github.com/puppetlabs/marionette-collective/blob/master/plugins/mcollective/connector/activemq.rb#L233 I'm looking into how and where I can configure these settings. By setting the value plugin.activemq.max_reconnect_attempts = 0 in mcollective server.cfg you can limit the number of attempts. However, once the reconnect attempts have been exhausted, even if the activemq comes up the mcollective server does not try to reconnect and has to be restarted. Ideally, we would like the mcollective to continue to retry by let the broker know within certain time limit that it cannot connect, so the broker can relay the message back to the client. Now looking into mcollective configuration to see what can be done. So there are 2 config files for mcollective: server.cfg and client.cfg By setting the value of plugin.activemq.max_reconnect_attempts to non zero value in client.cfg we can get the desired outcome. i.e. mcollective client will relay a message back to the broker after several attempts at connecting to activeMQ. By leaving the value for plugin.activemq.max_reconnect_attempts=0 in server.cfg we ensure that mcollective will continue to attempt to connect to activeMQ until activeMQ is back up. More info available http://docs.puppetlabs.com/mcollective/reference/plugins/connector_activemq.html li ---------- Added STOMP config params and set plugin.activemq.max_reconnect_attempts=10 https://github.com/openshift/li/pull/2609 origin-server ------------- - Changed exception raised by rpcclient to NodeUnavailableException to indicate that retry is advisable. i.e. results in HTTP status code 503 - nil check on rpc_client before calling disconnect to prevent runtime exception being thrown when connection is unsuccessful. i.e. undefined method `disconnect' for nil:NilClass https://github.com/openshift/origin-server/pull/5278 New message returned from broker when activeMQ is down. This relays the message from mcollective client "as is". rhc app start -a app Unable to complete the requested operation due to: Could not connect to ActiveMQ Server: Stomp::Error::MaxReconnectAttempts. Please try again and contact support if the issue persists. Commit pushed to master at https://github.com/openshift/li https://github.com/openshift/li/commit/4149f5af518294926b6f60f37be05478721d86a7 Bug 1065047 - limit connection attempts by mcollective client to 10 (rather than indefinite) Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/c66d9cff6a3d90e4ee6adcdc5b31eae7b680fa01 Bug 1065047 - changed exception raised to NodeUnavailableException to indicate retry advisable (503) Verified on devenv_4866 When activemq is in an outage(stopped in my case), more reasonable messages are displayed to end user. Unable to complete the requested operation due to: Could not connect to ActiveMQ Server: Stomp::Error::MaxReconnectAttempts. Please try again and contact support if the issue persists. |