Description of problem: rabbitmq server starts to fail often during HA stress tests of OSP9. It seems to be caused by regression already reported in upstream and targeted for upstream 36.3 release. Upstream bug: https://github.com/rabbitmq/rabbitmq-server/issues/812 Version-Release number of selected component (if applicable): rabbitmq-server-3.6.2-3.el7ost.noarch How reproducible: Often Steps to Reproduce: 1. non-gracefully reset a controller of HA OPS9 environment 2. repeat the 1. step multiple times until one of the rabbitmq servers crashes 3. Additional info: ** Generic server <0.21044.0> terminating ** Last message in was {maybe_expire,4} ** When Server state == {q, {amqqueue, {resource,<<"/">>,queue, <<"heat-engine-listener_fanout_67a9b88061cb4a7d93cb4381fe86ec7f">>}, false,false,none, [{<<"x-expires">>,signedint,600000}, {<<"x-ha-policy">>,longstr,<<"all">>}], <0.21044.0>, [<24587.4110.0>,<24588.1171.2>], [<24587.4110.0>], ['rabbit@overcloud-controller-0', 'rabbit@overcloud-controller-2'], [{vhost,<<"/">>}, {name,<<"ha-all">>}, {pattern,<<"^(?!amq\\.).*">>}, {'apply-to',<<"all">>}, {definition,[{<<"ha-mode">>,<<"all">>}]}, {priority,0}], [{<24588.2786.2>,<24588.1171.2>}, {<24587.4111.0>,<24587.4110.0>}, {<0.21048.0>,<0.21044.0>}], [],live}, none,false,rabbit_mirror_queue_master, {state, {resource,<<"/">>,queue, <<"heat-engine-listener_fanout_67a9b88061cb4a7d93cb4381fe86ec7f">>}, <0.21048.0>,<0.3691.1>,rabbit_priority_queue, {passthrough,rabbit_variable_queue, {vqstate, {0,{[],[]}}, {0,{[],[]}}, {delta,undefined,0,undefined}, {0,{[],[]}}, {0,{[],[]}}, 0, {0,nil}, {0,nil}, {0,nil}, {qistate, "/var/lib/rabbitmq/mnesia/rabbit@overcloud-controller-1/queues/6OOTXLJWFMOE5W8XW53XWFYJT", {{dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}}}, []}, undefined,0,32768, #Fun<rabbit_variable_queue.2.131658179>, #Fun<rabbit_variable_queue.3.131658179>, {0,nil}, {0,nil}, [],[]}, {undefined, {client_msstate,msg_store_transient, <<254,88,125,220,97,216,175,80,36,32,92,89,170, 80,191,154>>, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}}}, {state,933956, "/var/lib/rabbitmq/mnesia/rabbit@overcloud-controller-1/msg_store_transient"}, rabbit_msg_store_ets_index, "/var/lib/rabbitmq/mnesia/rabbit@overcloud-controller-1/msg_store_transient", <0.827.0>,938051,929813,942146,946240, {2000,500}}}, false,0,4096,0,0,0,0,0,infinity,0,0,0,0,0,0, {rates,0.0,0.0,0.0,0.0,-576458772371924324}, {0,nil}, {0,nil}, {0,nil}, {0,nil}, 0,0,0,0,2048,default}}, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}}}, [], {set,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}}}, undefined}, {state,{queue,[],[],0},{active,-576458772925912,1.0}}, 600000,undefined,undefined, {erlang,#Ref<0.0.4.54411>}, {state,none,5000,undefined}, {0,nil}, undefined,undefined,undefined, {state, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}}}, delegate}, undefined,undefined,undefined,undefined,4,running} ** Reason for termination == ** {timeout_value, [{rabbit_mirror_queue_master,'-stop_all_slaves/2-lc$^1/1-1-',3, [{file,"src/rabbit_mirror_queue_master.erl"},{line,217}]}, {rabbit_mirror_queue_master,stop_all_slaves,2, [{file,"src/rabbit_mirror_queue_master.erl"},{line,217}]}, {rabbit_mirror_queue_master,delete_and_terminate,2, [{file,"src/rabbit_mirror_queue_master.erl"},{line,205}]}, {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6, [{file,"src/rabbit_amqqueue_process.erl"},{line,252}]}, {rabbit_amqqueue_process,terminate_shutdown,2, [{file,"src/rabbit_amqqueue_process.erl"},{line,277}]}, {gen_server2,terminate,3,[{file,"src/gen_server2.erl"},{line,1146}]}, {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,250}]}]} ** In 'terminate' callback with reason == ** normal
This build should fix the issue. Marian, please test.
(In reply to Peter Lemenkov from comment #2) > This build should fix the issue. Marian, please test. This particular bug seems to be fixed by the build, I was not able to reproduce it on setup with the updated build. We should push the package into puddle.
Verified based on comment #4
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-1597.html