Bug 819689 - mcollective client calls can hang if /etc/qpid directory perms aren't right
mcollective client calls can hang if /etc/qpid directory perms aren't right
Status: CLOSED WONTFIX
Product: OpenShift Origin
Classification: Red Hat
Component: Pod (Show other bugs)
2.x
Unspecified Unspecified
medium Severity low
: ---
: ---
Assigned To: William Henry
libra bugs
: Triaged
Depends On: 825075
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-07 21:25 EDT by Thomas Wiest
Modified: 2015-05-14 21:53 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-08-29 14:59:39 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Thomas Wiest 2012-05-07 21:25:26 EDT
Description of problem:
We ran into this today in STG. Basically, if the /etc/qpid dir perms aren't right, then mcollective client calls will hang indefinitely.

It's also quite hard to figure this out since the hang makes it so nothing is logged.

This directly affects the broker. The broker is basically useless when this happens.


Version-Release number of selected component (if applicable):
qpid-cpp-client-ssl-0.12-6.el6.x86_64
qpid-cpp-client-0.12-6.el6.x86_64
mcollective-common-1.1.2-4.2.el6_0.noarch
qpid-qmf-0.12-6.el6.x86_64
mcollective-client-1.1.2-4.2.el6_0.noarch
ruby-qpid-qmf-0.12-6.el6.x86_64


How reproducible:
Very


Steps to Reproduce:
1. Create a devenv
2. Run:  sudo -u libra_passenger mc-ping
3. Notice that it works correctly
4. Run:  chmod o-rx /etc/qpid
5. Run:  sudo -u libra_passenger mc-ping
6. Notice that this time the command hangs indefinitely


Actual results:
Command hangs forever.


Expected results:
This should be an error condition and failure message.
Comment 1 William Henry 2012-05-15 15:15:12 EDT
Can we clarify how this effects the broker? Besides the client (mc-ping) command hanging, what other symptoms are we seeing that demonstrate that the broker itself is hung? It's not clear that the broker is actually "basically useless when this happens."

Can you run the mc-ping command from a different machine with the correct permissions while the original mc-ping command is hung?
Comment 2 William Henry 2012-05-15 15:19:55 EDT
Ok so from Mike: broker != qpidd broker but instead the openshift "broker". 

That makes more sense. Do this is essentially the mcollective driver. 

Next to see if it's in the qpid layers or the mcollective driver layer.
Comment 3 William Henry 2012-05-17 11:17:05 EDT
I've reproduced the bug. We noticed in the log that the client isn't really hanging but is attempting reconnection indefinitely despite the reconnect-time = 5. I added reconnect-limit=5 to the args for the connection in the hope that it would override the alleged Ruby 1.8 timeout issue.  However it had not effect.

Working with Qpid team for more ideas.
Comment 4 William Henry 2012-05-17 11:46:10 EDT
This might be related to alleged Ruby 1.8 V Ruby 1.9 timeout.rb issues. 

Output snippet from client side mcoellctive log:

D, [2012-05-16T18:48:17.828268 #28087] DEBUG -- : amqp.rb:69:in `connect' Connecting to localhost.localdomain:5671,  {transport:ssl, reconnect:true, reconnect-timeout:5, reconnect-limit:5, heartbeat:1}

You can see that the reconnect timeout and limit are set. However this log message gets logged continuously, consistently, and indefinitely in the log.


(I added the timeout-limit to see if it would override the timeout. I also tried this with only the limit and removed the timeout. It had not effect.)
Comment 5 William Henry 2012-05-24 23:13:00 EDT
I've created a BZ for MRG Messaging:
https://bugzilla.redhat.com/show_bug.cgi?id=825075
Comment 6 Mike McGrath 2012-08-29 14:59:39 EDT
A workaround for this has been found and we continue to investigate our messaging setup.

(just doing bug cleanup)

Note You need to log in before you can comment on or make changes to this bug.