Bug 1009887
Summary: | Need to document 'registerinterval = 30' parameter | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Arthur Enright <aenright> | ||||||||
Component: | Documentation | Assignee: | brice <bfallonf> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | ecs-bugs | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 2.2.0 | CC: | adellape, alyoung, baulakh, jialiu, jminter, jokerman, libra-onpremise-devel, lmeyer, mmccomas | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2014-01-06 03:09:45 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Arthur Enright
2013-09-19 12:27:40 UTC
I'm seeing this on 1.2 and on vanilla 2.0beta2, and I think it's a significant issue to resolve because each node needs to have the mcollective service restarted whenever the broker is rebooted. Can I help provide further information at all? Created attachment 826592 [details]
mcollective log from node
attached mcollective log from node, commented with external events
Created attachment 826593 [details]
broker activemq.log
Created attachment 826594 [details]
broker wrapper.log
I observed packet traces with and without the parameter and it does seem to resolve this particular problem. What seems to happen at reboot (as opposed to just restarting activemq) is that the ActiveMQ server does not close the connection (not sure why, since it does if simply restarted - perhaps a timing issue during shutdown; anyway, we also need to handle the "kicked the power cord" and "firewall quietly timed-out connection" scenarios). Without this parameter, the mcollective server does not seem to notice that it is listening to a dead connection - ever. It doesn't initiate any communications, so the line is dead until mcollective restarts. With the parameter, mcollective tries to send a packet, receives a connection reset notice, and reconnects as we would expect. So, while this parameter isn't really intended for the purpose as far as I can see, it does have the beneficial side effect of recovering from improperly dropped ActiveMQ connections. As such, I'll put it back in our install scripts, and it should go back in our instructions. If it's called out in any way, we should note that it has this side effect, and we don't actually care about the registration or use it in any way in the product (c.f. http://docs.puppetlabs.com/mcollective/reference/plugins/registration.html). We may want to file a bug upstream with mcollective to note that it should detect dead connections without this apparently unrelated parameter in place. I just tested this against ose-2.0, even I did NOT add 'registerinterval = 30' in mcollective server config file, then reboot activemq, mcollective still could connect activemq. So looks like this is only applied to ose-1.2. It is observed to happen with ose-2.0 as well. Note that our install scripts now add this parameter again. It will not happen with a "service activemq restart" as that causes the connection to be closed cleanly. However, it will happen with a host reboot. There may be a timing issue there (i.e. may not happen sporadically) as I would expect activemq to close the connection cleanly if given a chance, and it may just depend. For 2.0, if add 'registerinterval = 30' in /opt/rh/ruby193/root/etc/mcollective/server.cfg, reboot activemq machine, this parameter will make mcollective server to register itself automatically. If not add it, reboot activemq service does not affect mcollective connection, but reboot activemq machine, will break mcollective connection. So should add 'registerinterval = 30' in mcollective server.cfg. For 1.2, definitely need add 'registerinterval = 30' in to /etc/mcollective/server.cfg. Luke's comments is right. Take note of that, mcolletive server config file in ose-1.2 is /etc/mcollective/server.cfg, while in ose-2.0, it is /opt/rh/ruby193/root/etc/mcollective/server.cfg. |