Bug 1348276 - Queue master process terminates in rabbit_mirror_queue_master:stop_all_slaves on promotion
Summary: Queue master process terminates in rabbit_mirror_queue_master:stop_all_slaves...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rabbitmq-server
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ga
: 9.0 (Mitaka)
Assignee: Peter Lemenkov
QA Contact: Leonid Natapov
URL:
Whiteboard:
Depends On:
Blocks: 1319334
TreeView+ depends on / blocked
 
Reported: 2016-06-20 16:03 UTC by Marian Krcmarik
Modified: 2016-11-02 15:51 UTC (History)
9 users (show)

Fixed In Version: rabbitmq-server-3.6.2-4.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-11 12:27:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github rabbitmq rabbitmq-server issues 812 0 None None None 2016-11-01 12:10:35 UTC
Red Hat Product Errata RHEA-2016:1597 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 9 Release Candidate Advisory 2016-08-11 16:06:52 UTC

Internal Links: 1387474 1391186 1391188 1391190

Description Marian Krcmarik 2016-06-20 16:03:28 UTC
Description of problem:
rabbitmq server starts to fail often during HA stress tests of OSP9. It seems to be caused by regression already reported in upstream and targeted for upstream 36.3 release.

Upstream bug: https://github.com/rabbitmq/rabbitmq-server/issues/812

Version-Release number of selected component (if applicable):
rabbitmq-server-3.6.2-3.el7ost.noarch

How reproducible:
Often

Steps to Reproduce:
1. non-gracefully reset a controller of HA OPS9 environment
2. repeat the 1. step multiple times until one of the rabbitmq servers crashes
3.

Additional info:

** Generic server <0.21044.0> terminating
** Last message in was {maybe_expire,4}
** When Server state == {q,
                         {amqqueue,
                          {resource,<<"/">>,queue,
                           <<"heat-engine-listener_fanout_67a9b88061cb4a7d93cb4381fe86ec7f">>},
                          false,false,none,
                          [{<<"x-expires">>,signedint,600000},
                           {<<"x-ha-policy">>,longstr,<<"all">>}],
                          <0.21044.0>,
                          [<24587.4110.0>,<24588.1171.2>],
                          [<24587.4110.0>],
                          ['rabbit@overcloud-controller-0',
                           'rabbit@overcloud-controller-2'],
                          [{vhost,<<"/">>},
                           {name,<<"ha-all">>},
                           {pattern,<<"^(?!amq\\.).*">>},
                           {'apply-to',<<"all">>},
                           {definition,[{<<"ha-mode">>,<<"all">>}]},
                           {priority,0}],
                          [{<24588.2786.2>,<24588.1171.2>},
                           {<24587.4111.0>,<24587.4110.0>},
                           {<0.21048.0>,<0.21044.0>}],
                          [],live},
                         none,false,rabbit_mirror_queue_master,
                         {state,
                          {resource,<<"/">>,queue,
                           <<"heat-engine-listener_fanout_67a9b88061cb4a7d93cb4381fe86ec7f">>},
                          <0.21048.0>,<0.3691.1>,rabbit_priority_queue,
                          {passthrough,rabbit_variable_queue,
                           {vqstate,
                            {0,{[],[]}},
                            {0,{[],[]}},
                            {delta,undefined,0,undefined},
                            {0,{[],[]}},
                            {0,{[],[]}},
                            0,
                            {0,nil},
                            {0,nil},
                            {0,nil},
                            {qistate,
                             "/var/lib/rabbitmq/mnesia/rabbit@overcloud-controller-1/queues/6OOTXLJWFMOE5W8XW53XWFYJT",
                             {{dict,0,16,16,8,80,48,
                               {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []},
                               {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []}}},
                              []},
                             undefined,0,32768,
                             #Fun<rabbit_variable_queue.2.131658179>,
                             #Fun<rabbit_variable_queue.3.131658179>,
                             {0,nil},
                             {0,nil},
                             [],[]},
                            {undefined,
                             {client_msstate,msg_store_transient,
                              <<254,88,125,220,97,216,175,80,36,32,92,89,170,
                                80,191,154>>,
                              {dict,0,16,16,8,80,48,
                               {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []},
                               {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []}}},
                              {state,933956,
                               "/var/lib/rabbitmq/mnesia/rabbit@overcloud-controller-1/msg_store_transient"},
                              rabbit_msg_store_ets_index,
                              "/var/lib/rabbitmq/mnesia/rabbit@overcloud-controller-1/msg_store_transient",
                              <0.827.0>,938051,929813,942146,946240,
                              {2000,500}}},
                            false,0,4096,0,0,0,0,0,infinity,0,0,0,0,0,0,
                            {rates,0.0,0.0,0.0,0.0,-576458772371924324},
                            {0,nil},
                            {0,nil},
                            {0,nil},
                            {0,nil},
                            0,0,0,0,2048,default}},
                          {dict,0,16,16,8,80,48,
                           {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}},
                          [],
                          {set,0,16,16,8,80,48,
                           {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}},
                          undefined},
                         {state,{queue,[],[],0},{active,-576458772925912,1.0}},
                         600000,undefined,undefined,
                         {erlang,#Ref<0.0.4.54411>},
                         {state,none,5000,undefined},
                         {0,nil},
                         undefined,undefined,undefined,
                         {state,
                          {dict,0,16,16,8,80,48,
                           {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}},
                          delegate},
                         undefined,undefined,undefined,undefined,4,running}
** Reason for termination ==
** {timeout_value,
       [{rabbit_mirror_queue_master,'-stop_all_slaves/2-lc$^1/1-1-',3,
            [{file,"src/rabbit_mirror_queue_master.erl"},{line,217}]},
        {rabbit_mirror_queue_master,stop_all_slaves,2,
            [{file,"src/rabbit_mirror_queue_master.erl"},{line,217}]},
        {rabbit_mirror_queue_master,delete_and_terminate,2,
            [{file,"src/rabbit_mirror_queue_master.erl"},{line,205}]},
        {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6,
            [{file,"src/rabbit_amqqueue_process.erl"},{line,252}]},
        {rabbit_amqqueue_process,terminate_shutdown,2,
            [{file,"src/rabbit_amqqueue_process.erl"},{line,277}]},
        {gen_server2,terminate,3,[{file,"src/gen_server2.erl"},{line,1146}]},
        {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,250}]}]}
** In 'terminate' callback with reason ==
** normal

Comment 2 Peter Lemenkov 2016-06-29 15:42:13 UTC
This build should fix the issue. Marian, please test.

Comment 4 Marian Krcmarik 2016-07-01 22:43:24 UTC
(In reply to Peter Lemenkov from comment #2)
> This build should fix the issue. Marian, please test.

This particular bug seems to be fixed by the build, I was not able to reproduce it on setup with the updated build. We should push the package into puddle.

Comment 6 Udi Shkalim 2016-07-20 12:59:33 UTC
Verified based on comment #4

Comment 8 errata-xmlrpc 2016-08-11 12:27:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1597.html


Note You need to log in before you can comment on or make changes to this bug.