Bug 1167707 - openshift-routing-daemon miss stomp connection after activemq is restarted and does not reconnect.
Summary: openshift-routing-daemon miss stomp connection after activemq is restarted an...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: chris alfonso
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-11-25 09:58 UTC by Johnny Liu
Modified: 2016-02-01 02:36 UTC (History)
6 users (show)

Fixed In Version: rubygem-openshift-origin-routing-daemon-0.20.2.3-1.el6op
Doc Type: Bug Fix
Doc Text:
In OpenShift Enterprise environments using the routing daemon, if the ActiveMQ service is restarted, the routing daemon did not reconnect to ActiveMQ automatically, and the openshift-routing-daemon service had to be restarted as well. This bug fix updates the routing daemon to now attempt to reconnect automatically after losing its ActiveMQ connection, and as a result this scenario no longer requires manual intervention.
Clone Of:
Environment:
Last Closed: 2014-12-10 13:25:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1979 0 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.2.2 bug fix and enhancement update 2014-12-10 18:23:46 UTC

Description Johnny Liu 2014-11-25 09:58:43 UTC
Description of problem:


Version-Release number of selected component (if applicable):
rubygem-openshift-origin-routing-daemon-0.20.2.1-1.el6op.noarch

How reproducible:
Always

Steps to Reproduce:
1.Set up env, and create a scalable app to make sure openshift-routing-daemon is working well.
2.Reboot activemq host or activemq service.
3.Try to create one more scalable app.

Actual results:
No any nginx config file are built for this new app.
Check /var/log/openshift-routing-daemon.output, found daemon does not reconnect, only hang upon the last activemq restart. When this happened, user have to restart openshift-origin-routing daemon.

==> /var/log/openshift-routing-daemon.output <==
/opt/rh/ruby193/root/usr/share/gems/gems/stomp-1.2.14/lib/stomp/connection.rb:410:in `receive': no current connection exists (Stomp::Error::NoCurrentConnection)
	from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-routing-daemon-0.20.2.1/lib/openshift/routing/daemon.rb:212:in `block in listen'
	from /opt/rh/ruby193/root/usr/share/ruby/timeout.rb:69:in `timeout'
	from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-routing-daemon-0.20.2.1/lib/openshift/routing/daemon.rb:212:in `listen'
	from /etc/init.d/openshift-routing-daemon:102:in `block (2 levels) in <main>'
	from /opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/application.rb:215:in `call'
	from /opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/application.rb:215:in `block in start_proc'
	from /opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/daemonize.rb:192:in `call'
	from /opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/daemonize.rb:192:in `call_as_daemon'
	from /opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/application.rb:219:in `start_proc'
	from /opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/application.rb:255:in `start'
	from /opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/controller.rb:69:in `run'
	from /opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons.rb:188:in `block in run_proc'
	from /opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/cmdline.rb:105:in `call'
	from /opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/cmdline.rb:105:in `catch_exceptions'
	from /opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons.rb:187:in `run_proc'
	from /etc/init.d/openshift-routing-daemon:101:in `block in <main>'
	from /etc/init.d/openshift-routing-daemon:45:in `block (2 levels) in locked'
	from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-common-1.29.1.1/lib/openshift-origin-common/utils/path_utils.rb:94:in `block in flock'
	from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-common-1.29.1.1/lib/openshift-origin-common/utils/path_utils.rb:88:in `open'
	from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-common-1.29.1.1/lib/openshift-origin-common/utils/path_utils.rb:88:in `flock'
	from /etc/init.d/openshift-routing-daemon:44:in `block in locked'
	from /opt/rh/ruby193/root/usr/share/ruby/timeout.rb:69:in `timeout'
	from /etc/init.d/openshift-routing-daemon:43:in `locked'
	from /etc/init.d/openshift-routing-daemon:88:in `<main>'



Expected results:
openshift-routing-daemon should reconnect activemq service when connection came back.

Additional info:

Comment 1 chris alfonso 2014-11-25 20:02:45 UTC
PR is open for enterprise-server.
https://github.com/openshift/enterprise-server/pull/450

Comment 4 Johnny Liu 2014-11-26 03:36:54 UTC
Verified this bug with rubygem-openshift-origin-routing-daemon-0.20.2.3-1.el6op.noarch, and PASS.


During activemq host is restarting, the following log is seen:
==> /var/log/openshift-routing-daemon.output <==
connect to 10.66.79.143 failed: Connection refused - connect(2) will retry(#0) in 0.01
connect to 10.66.79.157 failed: Connection refused - connect(2) will retry(#1) in 0.02
connect to 10.66.79.143 failed: Connection refused - connect(2) will retry(#2) in 0.04
connect to 10.66.79.157 failed: Connection refused - connect(2) will retry(#3) in 0.08
connect to 10.66.79.143 failed: Connection refused - connect(2) will retry(#4) in 0.16
connect to 10.66.79.157 failed: Connection refused - connect(2) will retry(#5) in 0.32
connect to 10.66.79.143 failed: Connection refused - connect(2) will retry(#6) in 0.64
connect to 10.66.79.157 failed: Connection refused - connect(2) will retry(#7) in 1.28
connect to 10.66.79.143 failed: Connection refused - connect(2) will retry(#8) in 2.56
connect to 10.66.79.157 failed: Connection refused - connect(2) will retry(#9) in 5.12
connect to 10.66.79.143 failed: Connection refused - connect(2) will retry(#10) in 10.24
connect to 10.66.79.157 failed: Connection refused - connect(2) will retry(#11) in 20.48
connect to 10.66.79.143 failed: Connection refused - connect(2) will retry(#12) in 30.0


==> /var/log/openshift/routing-daemon.log <==
I, [2014-11-26T11:33:10.184076 #16964]  INFO -- : Subscribing to /topic/routinginfo...


After activemq service came back, scalable app is created successfully, and nginx conf files are built successfully.

Comment 6 errata-xmlrpc 2014-12-10 13:25:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2014-1979.html


Note You need to log in before you can comment on or make changes to this bug.