Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1009887

Summary:

Need to document 'registerinterval = 30' parameter

Product:

OpenShift Container Platform

Reporter:

Arthur Enright <aenright>

Component:

Documentation

Assignee:

brice <bfallonf>

Status:

CLOSED CURRENTRELEASE

QA Contact:

ecs-bugs

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

2.2.0

CC:

adellape, alyoung, baulakh, jialiu, jminter, jokerman, libra-onpremise-devel, lmeyer, mmccomas

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-01-06 03:09:45 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
mcollective log from node	none
broker activemq.log	none
broker wrapper.log	none

Description Arthur Enright 2013-09-19 12:27:40 UTC

Description of problem: After rebooting the broker, nodes do not re-register and you have to re-start mcollective for them to sync back up with the broker.


Version-Release number of selected component (if applicable): 1.2


How reproducible: 100%


Steps to Reproduce:
1. Reboot broker
2. Notice you have no nodes
3. Restart mcollective on nodes to have them re-register with the broker

Actual results: Nodes do not re-register with the broker until mcollective is restarted.


Expected results: nodes re-register by themselves


Additional info: By setting the parameter 'registerinterval = 30' in '/etc/mcollective/server.cfg' on all nodes in the environment nodes re-register themselves like they should.  Doccumenting this in the install docs will prevent this issue in the future.

Comment 5 Jim Minter 2013-11-20 12:05:08 UTC

I'm seeing this on 1.2 and on vanilla 2.0beta2, and I think it's a significant issue to resolve because each node needs to have the mcollective service restarted whenever the broker is rebooted.  Can I help provide further information at all?

Comment 6 Jim Minter 2013-11-20 12:11:46 UTC

Created attachment 826592 [details]
mcollective log from node

attached mcollective log from node, commented with external events

Comment 7 Jim Minter 2013-11-20 12:13:08 UTC

Created attachment 826593 [details]
broker activemq.log

Comment 8 Jim Minter 2013-11-20 12:13:31 UTC

Created attachment 826594 [details]
broker wrapper.log

Comment 10 Luke Meyer 2013-11-21 14:00:05 UTC

I observed packet traces with and without the parameter and it does seem to resolve this particular problem. What seems to happen at reboot (as opposed to just restarting activemq) is that the ActiveMQ server does not close the connection (not sure why, since it does if simply restarted - perhaps a timing issue during shutdown; anyway, we also need to handle the "kicked the power cord" and "firewall quietly timed-out connection" scenarios). 

Without this parameter, the mcollective server does not seem to notice that it is listening to a dead connection - ever. It doesn't initiate any communications, so the line is dead until mcollective restarts. With the parameter, mcollective tries to send a packet, receives a connection reset notice, and reconnects as we would expect. So, while this parameter isn't really intended for the purpose as far as I can see, it does have the beneficial side effect of recovering from improperly dropped ActiveMQ connections.

As such, I'll put it back in our install scripts, and it should go back in our instructions. If it's called out in any way, we should note that it has this side effect, and we don't actually care about the registration or use it in any way in the product (c.f. http://docs.puppetlabs.com/mcollective/reference/plugins/registration.html).

We may want to file a bug upstream with mcollective to note that it should detect dead connections without this apparently unrelated parameter in place.

Comment 12 Johnny Liu 2013-11-25 12:14:16 UTC

I just tested this against ose-2.0, even I did NOT add 'registerinterval = 30' in mcollective server config file, then reboot activemq, mcollective still could connect activemq. So looks like this is only applied to ose-1.2.

Comment 13 Luke Meyer 2013-11-25 18:25:08 UTC

It is observed to happen with ose-2.0 as well. Note that our install scripts now add this parameter again. It will not happen with a "service activemq restart" as that causes the connection to be closed cleanly. However, it will happen with a host reboot. There may be a timing issue there (i.e. may not happen sporadically) as I would expect activemq to close the connection cleanly if given a chance, and it may just depend.

Comment 15 Johnny Liu 2013-12-06 10:27:28 UTC

For 2.0, if add 'registerinterval = 30' in /opt/rh/ruby193/root/etc/mcollective/server.cfg, reboot activemq machine, this parameter will make mcollective server to register itself automatically. If not add it, reboot activemq service does not affect mcollective connection, but reboot activemq machine, will break mcollective connection. So should add 'registerinterval = 30' in mcollective server.cfg.

For 1.2, definitely need add 'registerinterval = 30' in to /etc/mcollective/server.cfg.

Luke's comments is right.

Take note of that, mcolletive server config file in ose-1.2 is /etc/mcollective/server.cfg, while in ose-2.0, it is /opt/rh/ruby193/root/etc/mcollective/server.cfg.