819689 – mcollective client calls can hang if /etc/qpid directory perms aren't right

Bug 819689 - mcollective client calls can hang if /etc/qpid directory perms aren't right

Summary: mcollective client calls can hang if /etc/qpid directory perms aren't right

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OKD
Classification:	Red Hat
Component:	Pod
Sub Component:
Version:	2.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	William Henry
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:
Depends On:	825075
Blocks:
TreeView+	depends on / blocked

Reported:	2012-05-08 01:25 UTC by Thomas Wiest
Modified:	2015-05-15 01:53 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-08-29 18:59:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Thomas Wiest 2012-05-08 01:25:26 UTC

Description of problem:
We ran into this today in STG. Basically, if the /etc/qpid dir perms aren't right, then mcollective client calls will hang indefinitely.

It's also quite hard to figure this out since the hang makes it so nothing is logged.

This directly affects the broker. The broker is basically useless when this happens.


Version-Release number of selected component (if applicable):
qpid-cpp-client-ssl-0.12-6.el6.x86_64
qpid-cpp-client-0.12-6.el6.x86_64
mcollective-common-1.1.2-4.2.el6_0.noarch
qpid-qmf-0.12-6.el6.x86_64
mcollective-client-1.1.2-4.2.el6_0.noarch
ruby-qpid-qmf-0.12-6.el6.x86_64


How reproducible:
Very


Steps to Reproduce:
1. Create a devenv
2. Run:  sudo -u libra_passenger mc-ping
3. Notice that it works correctly
4. Run:  chmod o-rx /etc/qpid
5. Run:  sudo -u libra_passenger mc-ping
6. Notice that this time the command hangs indefinitely


Actual results:
Command hangs forever.


Expected results:
This should be an error condition and failure message.

Comment 1 William Henry 2012-05-15 19:15:12 UTC

Can we clarify how this effects the broker? Besides the client (mc-ping) command hanging, what other symptoms are we seeing that demonstrate that the broker itself is hung? It's not clear that the broker is actually "basically useless when this happens."

Can you run the mc-ping command from a different machine with the correct permissions while the original mc-ping command is hung?

Comment 2 William Henry 2012-05-15 19:19:55 UTC

Ok so from Mike: broker != qpidd broker but instead the openshift "broker". 

That makes more sense. Do this is essentially the mcollective driver. 

Next to see if it's in the qpid layers or the mcollective driver layer.

Comment 3 William Henry 2012-05-17 15:17:05 UTC

I've reproduced the bug. We noticed in the log that the client isn't really hanging but is attempting reconnection indefinitely despite the reconnect-time = 5. I added reconnect-limit=5 to the args for the connection in the hope that it would override the alleged Ruby 1.8 timeout issue.  However it had not effect.

Working with Qpid team for more ideas.

Comment 4 William Henry 2012-05-17 15:46:10 UTC

This might be related to alleged Ruby 1.8 V Ruby 1.9 timeout.rb issues. 

Output snippet from client side mcoellctive log:

D, [2012-05-16T18:48:17.828268 #28087] DEBUG -- : amqp.rb:69:in `connect' Connecting to localhost.localdomain:5671,  {transport:ssl, reconnect:true, reconnect-timeout:5, reconnect-limit:5, heartbeat:1}

You can see that the reconnect timeout and limit are set. However this log message gets logged continuously, consistently, and indefinitely in the log.


(I added the timeout-limit to see if it would override the timeout. I also tried this with only the limit and removed the timeout. It had not effect.)

Comment 5 William Henry 2012-05-25 03:13:00 UTC

I've created a BZ for MRG Messaging:
https://bugzilla.redhat.com/show_bug.cgi?id=825075

Comment 6 Mike McGrath 2012-08-29 18:59:39 UTC

A workaround for this has been found and we continue to investigate our messaging setup.

(just doing bug cleanup)

Note You need to log in before you can comment on or make changes to this bug.