Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1303745 - Blocked channels and queues using HA
Blocked channels and queues using HA
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: rabbitmq-server (Show other bugs)
8.0 (Liberty)
Unspecified Unspecified
high Severity high
: ga
: 8.0 (Liberty)
Assigned To: Peter Lemenkov
Asaf Hirshberg
: TestOnly, ZStream
Depends On:
Blocks: 1303746 1303747 1303748
  Show dependency treegraph
 
Reported: 2016-02-01 15:05 EST by John Eckersberg
Modified: 2016-04-26 20:20 EDT (History)
9 users (show)

See Also:
Fixed In Version: rabbitmq-server-3.3.5-16.el7ost
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1303746 1303747 1303748 (view as bug list)
Environment:
Last Closed: 2016-04-15 09:46:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
stress_python.py (1.48 KB, text/x-python)
2016-03-23 07:14 EDT, Asaf Hirshberg
no flags Details
[root@overcloud-controller-2 ~]# rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().' (63.42 KB, text/plain)
2016-03-30 08:29 EDT, Asaf Hirshberg
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0636 normal SHIPPED_LIVE Red Hat OpenStack Platform 8 release candidate Bugfix Advisory 2016-04-15 13:45:07 EDT

  None (edit)
Description John Eckersberg 2016-02-01 15:05:52 EST
https://github.com/rabbitmq/rabbitmq-server/issues/581

Likely a problem in all OSP versions.  This would explain a lot of weird partition-related hangs and such.
Comment 1 Peter Lemenkov 2016-02-01 16:43:40 EST
Patch which partially fixes this issue is available in upstream's master branch.

https://github.com/rabbitmq/rabbitmq-server/commit/a540dcb

15 seconds looks too much for me though.
Comment 2 Fabio Massimo Di Nitto 2016-02-02 01:11:10 EST
(In reply to Peter Lemenkov from comment #1)
> Patch which partially fixes this issue is available in upstream's master
> branch.
> 
> https://github.com/rabbitmq/rabbitmq-server/commit/a540dcb
> 
> 15 seconds looks too much for me though.

Perhaps we can suggest to make that 15000 tuneable?
Comment 3 Peter Lemenkov 2016-02-02 09:24:34 EST
(In reply to Fabio Massimo Di Nitto from comment #2)
> (In reply to Peter Lemenkov from comment #1)
> > Patch which partially fixes this issue is available in upstream's master
> > branch.
> > 
> > https://github.com/rabbitmq/rabbitmq-server/commit/a540dcb
> > 
> > 15 seconds looks too much for me though.
> 
> Perhaps we can suggest to make that 15000 tuneable?

Done.

https://github.com/lemenkov/rabbitmq-server/commit/940d335

I'm cherry-picking both patches now.
Comment 4 Peter Lemenkov 2016-02-03 09:46:47 EST
Current status.

These recent four patches made the situation much better. Still it's just a workaround - when a queue hangs we will know it after a 15 seconds (instead of waiting forever). The issue is still there and we certainly need more patches to address it fully.
Comment 5 John Eckersberg 2016-02-09 17:11:35 EST
Moving to MODIFIED since a build exists
Comment 12 Asaf Hirshberg 2016-03-23 07:13:41 EDT
John,

After following Gsantomaggio comment in https://github.com/rabbitmq/rabbitmq-server/issues/581

I ended with listing problem as it hanged for some time.. 

<rabbit@overcloud-controller-1.1.8839.0>	guest	0	0
<rabbit@overcloud-controller-1.1.8851.0>	guest	0	0
<rabbit@overcloud-controller-2.2.9382.0>	guest	0	0
<rabbit@overcloud-controller-2.2.9393.0>	guest	0	0
...done.
blocked
unblocked
START
Listing channels ...

Attaching the script used for reproducing. 

More info:
[root@overcloud-controller-2 ~]# rpm -qa |grep rabbitmq-server-
rabbitmq-server-3.3.5-19.el7ost.noarch
Comment 13 Asaf Hirshberg 2016-03-23 07:14 EDT
Created attachment 1139510 [details]
stress_python.py
Comment 14 John Eckersberg 2016-03-24 15:56:12 EDT
(In reply to Asaf Hirshberg from comment #12)
> John,
> 
> After following Gsantomaggio comment in
> https://github.com/rabbitmq/rabbitmq-server/issues/581
> 
> I ended with listing problem as it hanged for some time.. 
> 
> <rabbit@overcloud-controller-1.1.8839.0>	guest	0	0
> <rabbit@overcloud-controller-1.1.8851.0>	guest	0	0
> <rabbit@overcloud-controller-2.2.9382.0>	guest	0	0
> <rabbit@overcloud-controller-2.2.9393.0>	guest	0	0
> ...done.
> blocked
> unblocked
> START
> Listing channels ...
> 
> Attaching the script used for reproducing. 
> 
> More info:
> [root@overcloud-controller-2 ~]# rpm -qa |grep rabbitmq-server-
> rabbitmq-server-3.3.5-19.el7ost.noarch

This may actually be a slightly different hang, same as from https://bugzilla.redhat.com/show_bug.cgi?id=1319334#c13 but I'll try to reproduce myself.  Even better if you can get it to hang and then capture the output of:

rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().'
Comment 15 Asaf Hirshberg 2016-03-30 08:29 EDT
Created attachment 1141733 [details]
[root@overcloud-controller-2 ~]# rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().'
Comment 16 Asaf Hirshberg 2016-03-30 08:30:50 EDT
John,
I created a new attachment with the output of:
[root@overcloud-controller-2 ~]# rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().'
Comment 17 John Eckersberg 2016-03-30 08:41:02 EDT
Yep that looks like bug 1319334, there are 16 pids that are stuck in rabbit_amqqueue:with/3.  So for the purposes of this bug, you can ignore it getting stuck that way.
Comment 20 Udi Shkalim 2016-04-10 06:16:40 EDT
Regrading comment #17 - Ignoring bug 1319334, can we verify it since we haven't seen the problem mentioned in the initial report.
Comment 21 John Eckersberg 2016-04-11 16:38:50 EDT
(In reply to Udi Shkalim from comment #20)
> Regrading comment #17 - Ignoring bug 1319334, can we verify it since we
> haven't seen the problem mentioned in the initial report.

Yep that sounds good to me.
Comment 22 Asaf Hirshberg 2016-04-12 01:22:37 EDT
Verified based on comment #20 and following Gsantomaggio comment in https://github.com/rabbitmq/rabbitmq-server/issues/581
Comment 24 errata-xmlrpc 2016-04-15 09:46:48 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0636.html

Note You need to log in before you can comment on or make changes to this bug.