Bug 1009887 - Need to document 'registerinterval = 30' parameter
Summary: Need to document 'registerinterval = 30' parameter
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: brice
QA Contact: ecs-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-19 12:27 UTC by Arthur Enright
Modified: 2016-05-25 13:24 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-06 03:09:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
mcollective log from node (46.11 KB, text/x-log)
2013-11-20 12:11 UTC, Jim Minter
no flags Details
broker activemq.log (15.56 KB, text/x-log)
2013-11-20 12:13 UTC, Jim Minter
no flags Details
broker wrapper.log (12.42 KB, text/x-log)
2013-11-20 12:13 UTC, Jim Minter
no flags Details

Description Arthur Enright 2013-09-19 12:27:40 UTC
Description of problem: After rebooting the broker, nodes do not re-register and you have to re-start mcollective for them to sync back up with the broker.


Version-Release number of selected component (if applicable): 1.2


How reproducible: 100%


Steps to Reproduce:
1. Reboot broker
2. Notice you have no nodes
3. Restart mcollective on nodes to have them re-register with the broker

Actual results: Nodes do not re-register with the broker until mcollective is restarted.


Expected results: nodes re-register by themselves


Additional info: By setting the parameter 'registerinterval = 30' in '/etc/mcollective/server.cfg' on all nodes in the environment nodes re-register themselves like they should.  Doccumenting this in the install docs will prevent this issue in the future.

Comment 5 Jim Minter 2013-11-20 12:05:08 UTC
I'm seeing this on 1.2 and on vanilla 2.0beta2, and I think it's a significant issue to resolve because each node needs to have the mcollective service restarted whenever the broker is rebooted.  Can I help provide further information at all?

Comment 6 Jim Minter 2013-11-20 12:11:46 UTC
Created attachment 826592 [details]
mcollective log from node

attached mcollective log from node, commented with external events

Comment 7 Jim Minter 2013-11-20 12:13:08 UTC
Created attachment 826593 [details]
broker activemq.log

Comment 8 Jim Minter 2013-11-20 12:13:31 UTC
Created attachment 826594 [details]
broker wrapper.log

Comment 10 Luke Meyer 2013-11-21 14:00:05 UTC
I observed packet traces with and without the parameter and it does seem to resolve this particular problem. What seems to happen at reboot (as opposed to just restarting activemq) is that the ActiveMQ server does not close the connection (not sure why, since it does if simply restarted - perhaps a timing issue during shutdown; anyway, we also need to handle the "kicked the power cord" and "firewall quietly timed-out connection" scenarios). 

Without this parameter, the mcollective server does not seem to notice that it is listening to a dead connection - ever. It doesn't initiate any communications, so the line is dead until mcollective restarts. With the parameter, mcollective tries to send a packet, receives a connection reset notice, and reconnects as we would expect. So, while this parameter isn't really intended for the purpose as far as I can see, it does have the beneficial side effect of recovering from improperly dropped ActiveMQ connections.

As such, I'll put it back in our install scripts, and it should go back in our instructions. If it's called out in any way, we should note that it has this side effect, and we don't actually care about the registration or use it in any way in the product (c.f. http://docs.puppetlabs.com/mcollective/reference/plugins/registration.html).

We may want to file a bug upstream with mcollective to note that it should detect dead connections without this apparently unrelated parameter in place.

Comment 12 Johnny Liu 2013-11-25 12:14:16 UTC
I just tested this against ose-2.0, even I did NOT add 'registerinterval = 30' in mcollective server config file, then reboot activemq, mcollective still could connect activemq. So looks like this is only applied to ose-1.2.

Comment 13 Luke Meyer 2013-11-25 18:25:08 UTC
It is observed to happen with ose-2.0 as well. Note that our install scripts now add this parameter again. It will not happen with a "service activemq restart" as that causes the connection to be closed cleanly. However, it will happen with a host reboot. There may be a timing issue there (i.e. may not happen sporadically) as I would expect activemq to close the connection cleanly if given a chance, and it may just depend.

Comment 15 Johnny Liu 2013-12-06 10:27:28 UTC
For 2.0, if add 'registerinterval = 30' in /opt/rh/ruby193/root/etc/mcollective/server.cfg, reboot activemq machine, this parameter will make mcollective server to register itself automatically. If not add it, reboot activemq service does not affect mcollective connection, but reboot activemq machine, will break mcollective connection. So should add 'registerinterval = 30' in mcollective server.cfg.

For 1.2, definitely need add 'registerinterval = 30' in to /etc/mcollective/server.cfg.

Luke's comments is right.

Take note of that, mcolletive server config file in ose-1.2 is /etc/mcollective/server.cfg, while in ose-2.0, it is /opt/rh/ruby193/root/etc/mcollective/server.cfg.


Note You need to log in before you can comment on or make changes to this bug.