Red Hat Bugzilla – Bug 1009887
Need to document 'registerinterval = 30' parameter
Last modified: 2016-05-25 09:24:46 EDT
Description of problem: After rebooting the broker, nodes do not re-register and you have to re-start mcollective for them to sync back up with the broker.
Version-Release number of selected component (if applicable): 1.2
How reproducible: 100%
Steps to Reproduce:
1. Reboot broker
2. Notice you have no nodes
3. Restart mcollective on nodes to have them re-register with the broker
Actual results: Nodes do not re-register with the broker until mcollective is restarted.
Expected results: nodes re-register by themselves
Additional info: By setting the parameter 'registerinterval = 30' in '/etc/mcollective/server.cfg' on all nodes in the environment nodes re-register themselves like they should. Doccumenting this in the install docs will prevent this issue in the future.
I'm seeing this on 1.2 and on vanilla 2.0beta2, and I think it's a significant issue to resolve because each node needs to have the mcollective service restarted whenever the broker is rebooted. Can I help provide further information at all?
Created attachment 826592 [details]
mcollective log from node
attached mcollective log from node, commented with external events
Created attachment 826593 [details]
Created attachment 826594 [details]
I observed packet traces with and without the parameter and it does seem to resolve this particular problem. What seems to happen at reboot (as opposed to just restarting activemq) is that the ActiveMQ server does not close the connection (not sure why, since it does if simply restarted - perhaps a timing issue during shutdown; anyway, we also need to handle the "kicked the power cord" and "firewall quietly timed-out connection" scenarios).
Without this parameter, the mcollective server does not seem to notice that it is listening to a dead connection - ever. It doesn't initiate any communications, so the line is dead until mcollective restarts. With the parameter, mcollective tries to send a packet, receives a connection reset notice, and reconnects as we would expect. So, while this parameter isn't really intended for the purpose as far as I can see, it does have the beneficial side effect of recovering from improperly dropped ActiveMQ connections.
As such, I'll put it back in our install scripts, and it should go back in our instructions. If it's called out in any way, we should note that it has this side effect, and we don't actually care about the registration or use it in any way in the product (c.f. http://docs.puppetlabs.com/mcollective/reference/plugins/registration.html).
We may want to file a bug upstream with mcollective to note that it should detect dead connections without this apparently unrelated parameter in place.
I just tested this against ose-2.0, even I did NOT add 'registerinterval = 30' in mcollective server config file, then reboot activemq, mcollective still could connect activemq. So looks like this is only applied to ose-1.2.
It is observed to happen with ose-2.0 as well. Note that our install scripts now add this parameter again. It will not happen with a "service activemq restart" as that causes the connection to be closed cleanly. However, it will happen with a host reboot. There may be a timing issue there (i.e. may not happen sporadically) as I would expect activemq to close the connection cleanly if given a chance, and it may just depend.
For 2.0, if add 'registerinterval = 30' in /opt/rh/ruby193/root/etc/mcollective/server.cfg, reboot activemq machine, this parameter will make mcollective server to register itself automatically. If not add it, reboot activemq service does not affect mcollective connection, but reboot activemq machine, will break mcollective connection. So should add 'registerinterval = 30' in mcollective server.cfg.
For 1.2, definitely need add 'registerinterval = 30' in to /etc/mcollective/server.cfg.
Luke's comments is right.
Take note of that, mcolletive server config file in ose-1.2 is /etc/mcollective/server.cfg, while in ose-2.0, it is /opt/rh/ruby193/root/etc/mcollective/server.cfg.