https://github.com/rabbitmq/rabbitmq-server/issues/581 Likely a problem in all OSP versions. This would explain a lot of weird partition-related hangs and such.
Patch which partially fixes this issue is available in upstream's master branch. https://github.com/rabbitmq/rabbitmq-server/commit/a540dcb 15 seconds looks too much for me though.
(In reply to Peter Lemenkov from comment #1) > Patch which partially fixes this issue is available in upstream's master > branch. > > https://github.com/rabbitmq/rabbitmq-server/commit/a540dcb > > 15 seconds looks too much for me though. Perhaps we can suggest to make that 15000 tuneable?
(In reply to Fabio Massimo Di Nitto from comment #2) > (In reply to Peter Lemenkov from comment #1) > > Patch which partially fixes this issue is available in upstream's master > > branch. > > > > https://github.com/rabbitmq/rabbitmq-server/commit/a540dcb > > > > 15 seconds looks too much for me though. > > Perhaps we can suggest to make that 15000 tuneable? Done. https://github.com/lemenkov/rabbitmq-server/commit/940d335 I'm cherry-picking both patches now.
Current status. These recent four patches made the situation much better. Still it's just a workaround - when a queue hangs we will know it after a 15 seconds (instead of waiting forever). The issue is still there and we certainly need more patches to address it fully.
Moving to MODIFIED since a build exists
John, After following Gsantomaggio comment in https://github.com/rabbitmq/rabbitmq-server/issues/581 I ended with listing problem as it hanged for some time.. <rabbit.8839.0> guest 0 0 <rabbit.8851.0> guest 0 0 <rabbit.9382.0> guest 0 0 <rabbit.9393.0> guest 0 0 ...done. blocked unblocked START Listing channels ... Attaching the script used for reproducing. More info: [root@overcloud-controller-2 ~]# rpm -qa |grep rabbitmq-server- rabbitmq-server-3.3.5-19.el7ost.noarch
Created attachment 1139510 [details] stress_python.py
(In reply to Asaf Hirshberg from comment #12) > John, > > After following Gsantomaggio comment in > https://github.com/rabbitmq/rabbitmq-server/issues/581 > > I ended with listing problem as it hanged for some time.. > > <rabbit.8839.0> guest 0 0 > <rabbit.8851.0> guest 0 0 > <rabbit.9382.0> guest 0 0 > <rabbit.9393.0> guest 0 0 > ...done. > blocked > unblocked > START > Listing channels ... > > Attaching the script used for reproducing. > > More info: > [root@overcloud-controller-2 ~]# rpm -qa |grep rabbitmq-server- > rabbitmq-server-3.3.5-19.el7ost.noarch This may actually be a slightly different hang, same as from https://bugzilla.redhat.com/show_bug.cgi?id=1319334#c13 but I'll try to reproduce myself. Even better if you can get it to hang and then capture the output of: rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().'
Created attachment 1141733 [details] [root@overcloud-controller-2 ~]# rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().'
John, I created a new attachment with the output of: [root@overcloud-controller-2 ~]# rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().'
Yep that looks like bug 1319334, there are 16 pids that are stuck in rabbit_amqqueue:with/3. So for the purposes of this bug, you can ignore it getting stuck that way.
Regrading comment #17 - Ignoring bug 1319334, can we verify it since we haven't seen the problem mentioned in the initial report.
(In reply to Udi Shkalim from comment #20) > Regrading comment #17 - Ignoring bug 1319334, can we verify it since we > haven't seen the problem mentioned in the initial report. Yep that sounds good to me.
Verified based on comment #20 and following Gsantomaggio comment in https://github.com/rabbitmq/rabbitmq-server/issues/581
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0636.html