Bug 819689

Summary:	mcollective client calls can hang if /etc/qpid directory perms aren't right
Product:	OKD	Reporter:	Thomas Wiest <twiest>
Component:	Pod	Assignee:	William Henry <whenry>
Status:	CLOSED WONTFIX	QA Contact:	libra bugs <libra-bugs>
Severity:	low	Docs Contact:
Priority:	medium
Version:	2.x	CC:	bmeng, mmcgrath, rmillner, xtian
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-08-29 18:59:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	825075
Bug Blocks:

Description Thomas Wiest 2012-05-08 01:25:26 UTC

Description of problem:
We ran into this today in STG. Basically, if the /etc/qpid dir perms aren't right, then mcollective client calls will hang indefinitely.

It's also quite hard to figure this out since the hang makes it so nothing is logged.

This directly affects the broker. The broker is basically useless when this happens.


Version-Release number of selected component (if applicable):
qpid-cpp-client-ssl-0.12-6.el6.x86_64
qpid-cpp-client-0.12-6.el6.x86_64
mcollective-common-1.1.2-4.2.el6_0.noarch
qpid-qmf-0.12-6.el6.x86_64
mcollective-client-1.1.2-4.2.el6_0.noarch
ruby-qpid-qmf-0.12-6.el6.x86_64


How reproducible:
Very


Steps to Reproduce:
1. Create a devenv
2. Run:  sudo -u libra_passenger mc-ping
3. Notice that it works correctly
4. Run:  chmod o-rx /etc/qpid
5. Run:  sudo -u libra_passenger mc-ping
6. Notice that this time the command hangs indefinitely


Actual results:
Command hangs forever.


Expected results:
This should be an error condition and failure message.

Comment 1 William Henry 2012-05-15 19:15:12 UTC

Can we clarify how this effects the broker? Besides the client (mc-ping) command hanging, what other symptoms are we seeing that demonstrate that the broker itself is hung? It's not clear that the broker is actually "basically useless when this happens."

Can you run the mc-ping command from a different machine with the correct permissions while the original mc-ping command is hung?

Comment 2 William Henry 2012-05-15 19:19:55 UTC

Ok so from Mike: broker != qpidd broker but instead the openshift "broker". 

That makes more sense. Do this is essentially the mcollective driver. 

Next to see if it's in the qpid layers or the mcollective driver layer.

Comment 3 William Henry 2012-05-17 15:17:05 UTC

I've reproduced the bug. We noticed in the log that the client isn't really hanging but is attempting reconnection indefinitely despite the reconnect-time = 5. I added reconnect-limit=5 to the args for the connection in the hope that it would override the alleged Ruby 1.8 timeout issue.  However it had not effect.

Working with Qpid team for more ideas.

Comment 4 William Henry 2012-05-17 15:46:10 UTC

This might be related to alleged Ruby 1.8 V Ruby 1.9 timeout.rb issues. 

Output snippet from client side mcoellctive log:

D, [2012-05-16T18:48:17.828268 #28087] DEBUG -- : amqp.rb:69:in `connect' Connecting to localhost.localdomain:5671,  {transport:ssl, reconnect:true, reconnect-timeout:5, reconnect-limit:5, heartbeat:1}

You can see that the reconnect timeout and limit are set. However this log message gets logged continuously, consistently, and indefinitely in the log.


(I added the timeout-limit to see if it would override the timeout. I also tried this with only the limit and removed the timeout. It had not effect.)

Comment 5 William Henry 2012-05-25 03:13:00 UTC

I've created a BZ for MRG Messaging:
https://bugzilla.redhat.com/show_bug.cgi?id=825075

Comment 6 Mike McGrath 2012-08-29 18:59:39 UTC

A workaround for this has been found and we continue to investigate our messaging setup.

(just doing bug cleanup)