Bug 1303745
Summary: | Blocked channels and queues using HA | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | John Eckersberg <jeckersb> | ||||||
Component: | rabbitmq-server | Assignee: | Peter Lemenkov <plemenko> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Asaf Hirshberg <ahirshbe> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 8.0 (Liberty) | CC: | apevec, fdinitto, jeckersb, lhh, lnatapov, oblaut, scorcora, ushkalim, yeylon | ||||||
Target Milestone: | ga | Keywords: | TestOnly, ZStream | ||||||
Target Release: | 8.0 (Liberty) | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | rabbitmq-server-3.3.5-16.el7ost | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1303746 1303747 1303748 (view as bug list) | Environment: | |||||||
Last Closed: | 2016-04-15 13:46:48 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1303746, 1303747, 1303748 | ||||||||
Attachments: |
|
Description
John Eckersberg
2016-02-01 20:05:52 UTC
Patch which partially fixes this issue is available in upstream's master branch. https://github.com/rabbitmq/rabbitmq-server/commit/a540dcb 15 seconds looks too much for me though. (In reply to Peter Lemenkov from comment #1) > Patch which partially fixes this issue is available in upstream's master > branch. > > https://github.com/rabbitmq/rabbitmq-server/commit/a540dcb > > 15 seconds looks too much for me though. Perhaps we can suggest to make that 15000 tuneable? (In reply to Fabio Massimo Di Nitto from comment #2) > (In reply to Peter Lemenkov from comment #1) > > Patch which partially fixes this issue is available in upstream's master > > branch. > > > > https://github.com/rabbitmq/rabbitmq-server/commit/a540dcb > > > > 15 seconds looks too much for me though. > > Perhaps we can suggest to make that 15000 tuneable? Done. https://github.com/lemenkov/rabbitmq-server/commit/940d335 I'm cherry-picking both patches now. Current status. These recent four patches made the situation much better. Still it's just a workaround - when a queue hangs we will know it after a 15 seconds (instead of waiting forever). The issue is still there and we certainly need more patches to address it fully. Moving to MODIFIED since a build exists John, After following Gsantomaggio comment in https://github.com/rabbitmq/rabbitmq-server/issues/581 I ended with listing problem as it hanged for some time.. <rabbit.8839.0> guest 0 0 <rabbit.8851.0> guest 0 0 <rabbit.9382.0> guest 0 0 <rabbit.9393.0> guest 0 0 ...done. blocked unblocked START Listing channels ... Attaching the script used for reproducing. More info: [root@overcloud-controller-2 ~]# rpm -qa |grep rabbitmq-server- rabbitmq-server-3.3.5-19.el7ost.noarch Created attachment 1139510 [details]
stress_python.py
(In reply to Asaf Hirshberg from comment #12) > John, > > After following Gsantomaggio comment in > https://github.com/rabbitmq/rabbitmq-server/issues/581 > > I ended with listing problem as it hanged for some time.. > > <rabbit.8839.0> guest 0 0 > <rabbit.8851.0> guest 0 0 > <rabbit.9382.0> guest 0 0 > <rabbit.9393.0> guest 0 0 > ...done. > blocked > unblocked > START > Listing channels ... > > Attaching the script used for reproducing. > > More info: > [root@overcloud-controller-2 ~]# rpm -qa |grep rabbitmq-server- > rabbitmq-server-3.3.5-19.el7ost.noarch This may actually be a slightly different hang, same as from https://bugzilla.redhat.com/show_bug.cgi?id=1319334#c13 but I'll try to reproduce myself. Even better if you can get it to hang and then capture the output of: rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().' Created attachment 1141733 [details]
[root@overcloud-controller-2 ~]# rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().'
John, I created a new attachment with the output of: [root@overcloud-controller-2 ~]# rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().' Yep that looks like bug 1319334, there are 16 pids that are stuck in rabbit_amqqueue:with/3. So for the purposes of this bug, you can ignore it getting stuck that way. Regrading comment #17 - Ignoring bug 1319334, can we verify it since we haven't seen the problem mentioned in the initial report. (In reply to Udi Shkalim from comment #20) > Regrading comment #17 - Ignoring bug 1319334, can we verify it since we > haven't seen the problem mentioned in the initial report. Yep that sounds good to me. Verified based on comment #20 and following Gsantomaggio comment in https://github.com/rabbitmq/rabbitmq-server/issues/581 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0636.html |