Bug 1387474
Summary: | Queue master process terminates in rabbit_mirror_queue_master:stop_all_slaves on promotion | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | James Biao <jbiao> | |
Component: | rabbitmq-server | Assignee: | Peter Lemenkov <plemenko> | |
Status: | CLOSED ERRATA | QA Contact: | Asaf Hirshberg <ahirshbe> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 5.0 (RHEL 7) | CC: | apevec, dmaley, fdinitto, jbiao, jeckersb, jthomas, lhh, plemenko, srevivo | |
Target Milestone: | async | Keywords: | ZStream | |
Target Release: | 5.0 (RHEL 7) | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | rabbitmq-server-3.3.5-25.el7ost | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1391186 1391188 1391190 (view as bug list) | Environment: | ||
Last Closed: | 2017-01-19 13:33:13 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1319334 | |||
Bug Blocks: |
Description
James Biao
2016-10-21 02:02:20 UTC
Please provide all sosreports from the environment. Before backporting any fix we need to make sure that there are no other causes that are triggering this problem. Fix applied in rabbitmq-server-3.3.5-23.el7ost. As soon as we verify that this is indeed caused by GH issue no. 812, we'll propose this build as a fix. Interestingly, but I've found another one issue after inspecting SOS logs. Namely this one: https://github.com/rabbitmq/rabbitmq-server/issues/255. This log message points to that issue: ================================= =SUPERVISOR REPORT==== 18-Oct-2016::17:12:52 === Supervisor: {local,rabbit_amqqueue_sup} Context: child_terminated Reason: {{case_clause,{empty,{[],[]}}}, [{rabbit_queue_consumers,subtract_acks,4, [{file,"src/rabbit_queue_consumers.erl"},{line,274}]}, {rabbit_queue_consumers,subtract_acks,3, [{file,"src/rabbit_queue_consumers.erl"},{line,252}]}, {rabbit_amqqueue_process,subtract_acks,4, [{file,"src/rabbit_amqqueue_process.erl"},{line,660}]}, {rabbit_amqqueue_process,handle_cast,2, [{file,"src/rabbit_amqqueue_process.erl"},{line,1082}]}, {gen_server2,handle_msg,2, [{file,"src/gen_server2.erl"},{line,1022}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,239}]}]} Offender: [{pid,<0.26511.0>}, {name,rabbit_amqqueue}, {mfargs,{rabbit_amqqueue_process,start_link,undefined}}, {restart_type,temporary}, {shutdown,4294967295}, {child_type,worker}] ================================= Patch available. Customer has a further question. It is observed that the rabbitmq cluster was non responsive at all several hours after rabbit node-002 started to log "rabbit_mirror_queue_master:stop_all_slaves". Can this issue cause the cluster not responding to the messages? Regarding the second bug, Unknown acks (e.g. after network partition heals) should be handled gracefully, is there any plan to backport this? Provided sosreports from 2 sites that was having the issue. The second site got into partition and saw the same error. (In reply to James Biao from comment #6) > Customer has a further question. It is observed that the rabbitmq cluster > was non responsive at all several hours after rabbit node-002 started to log > "rabbit_mirror_queue_master:stop_all_slaves". Can this issue cause the > cluster not responding to the messages? > > Regarding the second bug, Unknown acks (e.g. after network partition heals) > should be handled gracefully, is there any plan to backport this? > > Provided sosreports from 2 sites that was having the issue. The second site > got into partition and saw the same error. James, if the customer is observing two bugs, please file 2 separate bugzillas. Here we will track only this specific one. (In reply to Fabio Massimo Di Nitto from comment #7) Sure. I'll open a new bz. Let's focus on the oringinal issue on this one. Ok, this issue (see comment 1) is addressed in rabbitmq-server-3.3.5-23.el7ost. For details regarding another issue mentioned here (GH#255) see bug 1387988. (In reply to James Biao from comment #6) > Customer has a further question. It is observed that the rabbitmq cluster > was non responsive at all several hours after rabbit node-002 started to log > "rabbit_mirror_queue_master:stop_all_slaves". Can this issue cause the > cluster not responding to the messages? Short answer is yes. Please try this package - rabbitmq-server-3.3.5-25.el7ost It should fully address this issue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0167.html |