Bug 1065048
Summary: | broker does not handle activemq outages gracefully | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Luke Meyer <lmeyer> | |
Component: | Node | Assignee: | Luke Meyer <lmeyer> | |
Status: | CLOSED ERRATA | QA Contact: | libra bugs <libra-bugs> | |
Severity: | medium | Docs Contact: | ||
Priority: | low | |||
Version: | 2.0.0 | CC: | adellape, bleanhar, charles_sheridan, gpei, libra-onpremise-devel, xiama | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
The MCollective client configuration settings used a default timeout value, which caused the broker to wait for a prolonged period of time when attempting to connect to ActiveMQ. When ActiveMQ was unreachable, the broker waited and eventually failed as if the requests had timed out without displaying helpful error messages. This bug fix updates the client configuration to set a reasonable default timeout value of 6.3 seconds, and broker requests now time out faster and helpful error messages are displayed when ActiveMQ is unreachable. This bug fix configuration change is made only in the installation utility and scripts; for existing installations, an administrator must make the suggested changes manually.
|
Story Points: | --- | |
Clone Of: | 1065047 | |||
: | 1133958 (view as bug list) | Environment: | ||
Last Closed: | 2014-09-11 20:07:03 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1065047 | |||
Bug Blocks: | 1133958 |
Description
Luke Meyer
2014-02-13 18:37:07 UTC
Test this with the following packages: [root@broker ~]# rpm -qa|grep mcollective ruby193-mcollective-client-2.4.1-3.el6op.noarch rubygem-openshift-origin-msg-broker-mcollective-1.22.2-1.git.167.c0332d5.el6op.noarch ruby193-mcollective-common-2.4.1-3.el6op.noarch [root@node1 ~]# rpm -qa|grep mcollective ruby193-mcollective-common-2.4.1-3.el6op.noarch openshift-origin-msg-node-mcollective-1.21.2-1.git.182.5e73e48.el6op.noarch ruby193-mcollective-2.4.1-3.el6op.noarch After stop activemq service, try to restart application, still get the same error from client side: [root@broker conf.d]# rhc app restart app1 Password: ****** An error occurred while communicating with the server. This problem may only be temporary. Check that you have correctly specified your OpenShift server 'https://broker.ose-201403214.com.cn/broker/rest/application/532fa7ffcfb77f671400003c/events'. Only mcollective report this error in ruby193-mcollective.log: E, [2014-03-24T00:13:18.321231 #1084] ERROR -- : activemq.rb:133:in `on_miscerr' Unexpected error on connection stomp://mcollective.com.cn:61613: es_recv: connection.receive returning EOF as nil - resetting connection. No clear error information about activemq in httpd or broker logs, there was only such error logs in httpd error_log: [Sun Mar 23 23:52:14 2014] [error] [client 10.66.78.226] (70007)The timeout spec ified has expired: proxy: error reading status line from remote server 127.0.0.1 [Sun Mar 23 23:52:14 2014] [error] [client 10.66.78.226] proxy: Error reading fr om remote server returned by /broker/rest/application/532fa7ffcfb77f671400003c/e vents https://github.com/openshift/openshift-extras/pull/440 Changing the installer to set decent defaults for mcollective timeouts. Pre-2.1 code changes allowed the timeout error to be displayed. I would consider an ose-upgrade automatic modification to mco configuration but at this time I think it may be best just to note the changes made: broker: add to /opt/rh/ruby193/root/etc/mcollective/client.cfg # Broker will retry ActiveMQ connection, then report error plugin.activemq.initial_reconnect_delay = 0.1 plugin.activemq.max_reconnect_attempts = 6 node: add to /opt/rh/ruby193/root/etc/mcollective/server.cfg # Node should retry connecting to ActiveMQ forever plugin.activemq.max_reconnect_attempts = 0 plugin.activemq.initial_reconnect_delay = 0.1 plugin.activemq.max_reconnect_delay = 4.0 Check on puddle [2.1.z/2014-08-25.2] 1. Create an application "phpapp" 2. Stop the activemq service 3. Try various commands involving the application, e.g.: # rhc app-restart phpapp # rhc cartridge add mysql -a phpapp # rhc app show --gears -a phpapp # rhc app create rb19 ruby-1.9 The output: Unable to complete the requested operation due to: Could not connect to ActiveMQ Server: Stomp::Error::MaxReconnectAttempts. Please try again and contact support if the issue persists. Reference ID: 3cf9f9789921ec0639dd54c4f1a81bb5 Give out useful message. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1183.html |