Bug 1303745 - Blocked channels and queues using HA
Summary: Blocked channels and queues using HA
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rabbitmq-server
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ga
: 8.0 (Liberty)
Assignee: Peter Lemenkov
QA Contact: Asaf Hirshberg
URL:
Whiteboard:
Depends On:
Blocks: 1303746 1303747 1303748
TreeView+ depends on / blocked
 
Reported: 2016-02-01 20:05 UTC by John Eckersberg
Modified: 2019-10-10 11:04 UTC (History)
9 users (show)

Fixed In Version: rabbitmq-server-3.3.5-16.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1303746 1303747 1303748 (view as bug list)
Environment:
Last Closed: 2016-04-15 13:46:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
stress_python.py (1.48 KB, text/x-python)
2016-03-23 11:14 UTC, Asaf Hirshberg
no flags Details
[root@overcloud-controller-2 ~]# rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().' (63.42 KB, text/plain)
2016-03-30 12:29 UTC, Asaf Hirshberg
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0636 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 8 release candidate Bugfix Advisory 2016-04-15 17:45:07 UTC

Description John Eckersberg 2016-02-01 20:05:52 UTC
https://github.com/rabbitmq/rabbitmq-server/issues/581

Likely a problem in all OSP versions.  This would explain a lot of weird partition-related hangs and such.

Comment 1 Peter Lemenkov 2016-02-01 21:43:40 UTC
Patch which partially fixes this issue is available in upstream's master branch.

https://github.com/rabbitmq/rabbitmq-server/commit/a540dcb

15 seconds looks too much for me though.

Comment 2 Fabio Massimo Di Nitto 2016-02-02 06:11:10 UTC
(In reply to Peter Lemenkov from comment #1)
> Patch which partially fixes this issue is available in upstream's master
> branch.
> 
> https://github.com/rabbitmq/rabbitmq-server/commit/a540dcb
> 
> 15 seconds looks too much for me though.

Perhaps we can suggest to make that 15000 tuneable?

Comment 3 Peter Lemenkov 2016-02-02 14:24:34 UTC
(In reply to Fabio Massimo Di Nitto from comment #2)
> (In reply to Peter Lemenkov from comment #1)
> > Patch which partially fixes this issue is available in upstream's master
> > branch.
> > 
> > https://github.com/rabbitmq/rabbitmq-server/commit/a540dcb
> > 
> > 15 seconds looks too much for me though.
> 
> Perhaps we can suggest to make that 15000 tuneable?

Done.

https://github.com/lemenkov/rabbitmq-server/commit/940d335

I'm cherry-picking both patches now.

Comment 4 Peter Lemenkov 2016-02-03 14:46:47 UTC
Current status.

These recent four patches made the situation much better. Still it's just a workaround - when a queue hangs we will know it after a 15 seconds (instead of waiting forever). The issue is still there and we certainly need more patches to address it fully.

Comment 5 John Eckersberg 2016-02-09 22:11:35 UTC
Moving to MODIFIED since a build exists

Comment 12 Asaf Hirshberg 2016-03-23 11:13:41 UTC
John,

After following Gsantomaggio comment in https://github.com/rabbitmq/rabbitmq-server/issues/581

I ended with listing problem as it hanged for some time.. 

<rabbit.8839.0>	guest	0	0
<rabbit.8851.0>	guest	0	0
<rabbit.9382.0>	guest	0	0
<rabbit.9393.0>	guest	0	0
...done.
blocked
unblocked
START
Listing channels ...

Attaching the script used for reproducing. 

More info:
[root@overcloud-controller-2 ~]# rpm -qa |grep rabbitmq-server-
rabbitmq-server-3.3.5-19.el7ost.noarch

Comment 13 Asaf Hirshberg 2016-03-23 11:14:26 UTC
Created attachment 1139510 [details]
stress_python.py

Comment 14 John Eckersberg 2016-03-24 19:56:12 UTC
(In reply to Asaf Hirshberg from comment #12)
> John,
> 
> After following Gsantomaggio comment in
> https://github.com/rabbitmq/rabbitmq-server/issues/581
> 
> I ended with listing problem as it hanged for some time.. 
> 
> <rabbit.8839.0>	guest	0	0
> <rabbit.8851.0>	guest	0	0
> <rabbit.9382.0>	guest	0	0
> <rabbit.9393.0>	guest	0	0
> ...done.
> blocked
> unblocked
> START
> Listing channels ...
> 
> Attaching the script used for reproducing. 
> 
> More info:
> [root@overcloud-controller-2 ~]# rpm -qa |grep rabbitmq-server-
> rabbitmq-server-3.3.5-19.el7ost.noarch

This may actually be a slightly different hang, same as from https://bugzilla.redhat.com/show_bug.cgi?id=1319334#c13 but I'll try to reproduce myself.  Even better if you can get it to hang and then capture the output of:

rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().'

Comment 15 Asaf Hirshberg 2016-03-30 12:29:23 UTC
Created attachment 1141733 [details]
[root@overcloud-controller-2 ~]# rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().'

Comment 16 Asaf Hirshberg 2016-03-30 12:30:50 UTC
John,
I created a new attachment with the output of:
[root@overcloud-controller-2 ~]# rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().'

Comment 17 John Eckersberg 2016-03-30 12:41:02 UTC
Yep that looks like bug 1319334, there are 16 pids that are stuck in rabbit_amqqueue:with/3.  So for the purposes of this bug, you can ignore it getting stuck that way.

Comment 20 Udi Shkalim 2016-04-10 10:16:40 UTC
Regrading comment #17 - Ignoring bug 1319334, can we verify it since we haven't seen the problem mentioned in the initial report.

Comment 21 John Eckersberg 2016-04-11 20:38:50 UTC
(In reply to Udi Shkalim from comment #20)
> Regrading comment #17 - Ignoring bug 1319334, can we verify it since we
> haven't seen the problem mentioned in the initial report.

Yep that sounds good to me.

Comment 22 Asaf Hirshberg 2016-04-12 05:22:37 UTC
Verified based on comment #20 and following Gsantomaggio comment in https://github.com/rabbitmq/rabbitmq-server/issues/581

Comment 24 errata-xmlrpc 2016-04-15 13:46:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0636.html


Note You need to log in before you can comment on or make changes to this bug.