Bug 1597245 - rabbitmq terminated with `{badmatch,{error,not_found}}
Summary: rabbitmq terminated with `{badmatch,{error,not_found}}
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rabbitmq-server
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z9
: 10.0 (Newton)
Assignee: Peter Lemenkov
QA Contact: Udi Shkalim
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-02 11:12 UTC by Eduard Barrera
Modified: 2019-12-29 22:21 UTC (History)
9 users (show)

Fixed In Version: rabbitmq-server-3.6.3-10.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-17 17:00:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github rabbitmq rabbitmq-server issues 1035 0 None closed Termination of queue master fails with `{badmatch,{error,not_found}}` 2021-02-08 07:42:33 UTC
Github rabbitmq rabbitmq-server issues 950 0 None closed GM - group deleted during partial partitions 2021-02-08 07:42:33 UTC
Github rabbitmq rabbitmq-server issues 953 0 None closed Amqqueue record deleted during partial partitions using HA 2021-02-08 07:42:33 UTC
Red Hat Product Errata RHBA-2018:2671 0 None None None 2018-09-17 17:01:37 UTC

Description Eduard Barrera 2018-07-02 11:12:03 UTC
Description of problem:

Customer got a rabbitmq issue where the logs grew really huge and exhausted root partition space.

I found this trace on the logs, among much other traces

Things seems to start here  
Mirrored queue 'q-reports-plugin_fanout_fe6e03d722504b54983d5b369ee9f545' in vhost '/': Promoting slave <rabbit.1064.0> to master

=INFO REPORT==== 28-Jun-2018::17:32:43 ===
Mirrored queue 'compute.ock00001i2.cor00005.cni.ukcloud.com' in vhost '/': Slave <rabbit.9587.0> saw deaths of mirrors <rabbit.8080.0>

=INFO REPORT==== 28-Jun-2018::17:32:43 ===
Mirrored queue 'compute.ock00001i2.cor00005.cni.ukcloud.com' in vhost '/': Promoting slave <rabbit.9587.0> to master

=INFO REPORT==== 28-Jun-2018::17:32:43 ===
Mirrored queue 'reply_2478c84f57c0400bbc70ccd3d1e06d29' in vhost '/': Slave <rabbit.5786.8> saw deaths of mirrors <rabbit.2934.9>

=INFO REPORT==== 28-Jun-2018::17:32:43 ===
node rabbit@occ00002i2 down: etimedout

=INFO REPORT==== 28-Jun-2018::17:32:43 ===
node rabbit@occ00002i2 up

ERROR REPORT==== 28-Jun-2018::17:32:43 ===
Partial partition detected:
 * We saw DOWN from rabbit@occ00002i2
 * We can still see rabbit@occ00003i2 which can see rabbit@occ00002i2
 * pause_minority mode enabled
We will therefore pause until the *entire* cluster recovers

=WARNING REPORT==== 28-Jun-2018::17:32:43 ===
Cluster minority/secondary status detected - awaiting recovery

=INFO REPORT==== 28-Jun-2018::17:32:43 ===
Stopping RabbitMQ

=INFO REPORT==== 28-Jun-2018::17:32:43 ===
stopped TCP Listener on 10.41.54.20:5672

=ERROR REPORT==== 28-Jun-2018::17:32:43 ===
Error on AMQP connection <0.11773.1> (10.41.54.19:58070 -> 10.41.54.20:5672, vhost: '/', user: 'guest', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

=ERROR REPORT==== 28-Jun-2018::17:32:43 ===
Error on AMQP connection <0.17892.24> (10.41.54.15:40896 -> 10.41.54.20:5672, vhost: '/', user: 'guest', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
..
..
=ERROR REPORT==== 28-Jun-2018::17:32:44 ===
** Generic server <0.28123.24> terminating
** Last message in was {'$gen_cast',join}
** When Server state == {state,
                            {4,<0.28123.24>},
                            {{4,<0.28123.24>},undefined},
                            {{4,<0.28123.24>},undefined},
                            {resource,<<"/">>,queue,
                                <<"reply_093474841b384c55833c3da39c8561e6">>},
                            rabbit_mirror_queue_slave,undefined,-1,undefined,
                            [<0.28122.24>],
                            {[],[]},
                            [],0,undefined,
                            #Fun<rabbit_misc.execute_mnesia_transaction.1>,
                            false}
** Reason for termination ==
** {bad_return_value,
       {error,
           {{badmatch,{error,not_found}},
            [{gm,'-record_dead_member_in_group/5-fun-1-',4,
                 [{file,"src/gm.erl"},{line,1140}]},
             {mnesia_tm,apply_fun,3,[{file,"mnesia_tm.erl"},{line,833}]},
             {mnesia_tm,execute_transaction,5,
                 [{file,"mnesia_tm.erl"},{line,808}]},
             {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1,
                 [{file,"src/rabbit_misc.erl"},{line,532}]},
             {worker_pool_worker,'-run/2-fun-0-',3,
                 [{file,"src/worker_pool_worker.erl"},{line,77}]}]}}}

...
...
more traces
...
...

And then this trace that seems [1] or [2]

=ERROR REPORT==== 28-Jun-2018::17:32:44 ===


** Reason for termination ==
** {{badmatch,{error,not_found}},
    [{rabbit_mirror_queue_master,stop_all_slaves,2,
                                 [{file,"src/rabbit_mirror_queue_master.erl"},
                                  {line,214}]},
     {rabbit_mirror_queue_master,delete_and_terminate,2,
                                 [{file,"src/rabbit_mirror_queue_master.erl"},
                                  {line,210}]},
     {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6,
                              [{file,"src/rabbit_amqqueue_process.erl"},
                               {line,283}]},
     {rabbit_amqqueue_process,terminate_shutdown,2,
                              [{file,"src/rabbit_amqqueue_process.erl"},
                               {line,308}]},
     {gen_server2,terminate,3,[{file,"src/gen_server2.erl"},{line,1129}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}
** In 'terminate' callback with reason ==
** normal

Version-Release number of selected component (if applicable):
OSP 9
rabbitmq-server-3.6.3-6.el7ost


[1] https://github.com/rabbitmq/rabbitmq-server/issues/953
[2] https://github.com/rabbitmq/rabbitmq-server/issues/1035

Comment 1 Peter Lemenkov 2018-07-02 12:59:19 UTC
(In reply to Eduard Barrera from comment #0)

> ** {bad_return_value,
>        {error,
>            {{badmatch,{error,not_found}},
>             [{gm,'-record_dead_member_in_group/5-fun-1-',4,
>                  [{file,"src/gm.erl"},{line,1140}]},
>              {mnesia_tm,apply_fun,3,[{file,"mnesia_tm.erl"},{line,833}]},
>              {mnesia_tm,execute_transaction,5,
>                  [{file,"mnesia_tm.erl"},{line,808}]},
>              {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1,
>                  [{file,"src/rabbit_misc.erl"},{line,532}]},
>              {worker_pool_worker,'-run/2-fun-0-',3,
>                  [{file,"src/worker_pool_worker.erl"},{line,77}]}]}}}
> 

^^^ This issue was fixed in this upstream's ticket:

https://github.com/rabbitmq/rabbitmq-server/issues/950

The fix is available only in RHOS-13 and higher.

> ** {{badmatch,{error,not_found}},
>     [{rabbit_mirror_queue_master,stop_all_slaves,2,
>                                 
> [{file,"src/rabbit_mirror_queue_master.erl"},
>                                   {line,214}]},
>      {rabbit_mirror_queue_master,delete_and_terminate,2,
>                                 
> [{file,"src/rabbit_mirror_queue_master.erl"},
>                                   {line,210}]},
>      {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6,
>                               [{file,"src/rabbit_amqqueue_process.erl"},
>                                {line,283}]},
>      {rabbit_amqqueue_process,terminate_shutdown,2,
>                               [{file,"src/rabbit_amqqueue_process.erl"},
>                                {line,308}]},
>      {gen_server2,terminate,3,[{file,"src/gen_server2.erl"},{line,1129}]},
>      {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}

yes this issue is addressed in these two tickets (fixed completely in rabbitmq-server-3.6.3-10.el7ost):

> [1] https://github.com/rabbitmq/rabbitmq-server/issues/953
> [2] https://github.com/rabbitmq/rabbitmq-server/issues/1035

Comment 5 Alex McLeod 2018-09-03 07:57:47 UTC
Hi there,

If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field.

The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Thanks,
Alex

Comment 6 Peter Lemenkov 2018-09-03 12:49:44 UTC
I don't think this particular ticket requires any documentation. So I'm going to sed requires_doc_text to '-'.

Comment 8 errata-xmlrpc 2018-09-17 17:00:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2671


Note You need to log in before you can comment on or make changes to this bug.