Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1597245

Summary: rabbitmq terminated with `{badmatch,{error,not_found}}
Product: Red Hat OpenStack Reporter: Eduard Barrera <ebarrera>
Component: rabbitmq-serverAssignee: Peter Lemenkov <plemenko>
Status: CLOSED ERRATA QA Contact: Udi Shkalim <ushkalim>
Severity: high Docs Contact:
Priority: high    
Version: 9.0 (Mitaka)CC: apevec, ebarrera, jeckersb, lhh, michele, mkrcmari, pkomarov, plemenko, srevivo
Target Milestone: z9Keywords: Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rabbitmq-server-3.6.3-10.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-17 17:00:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eduard Barrera 2018-07-02 11:12:03 UTC
Description of problem:

Customer got a rabbitmq issue where the logs grew really huge and exhausted root partition space.

I found this trace on the logs, among much other traces

Things seems to start here  
Mirrored queue 'q-reports-plugin_fanout_fe6e03d722504b54983d5b369ee9f545' in vhost '/': Promoting slave <rabbit.1064.0> to master

=INFO REPORT==== 28-Jun-2018::17:32:43 ===
Mirrored queue 'compute.ock00001i2.cor00005.cni.ukcloud.com' in vhost '/': Slave <rabbit.9587.0> saw deaths of mirrors <rabbit.8080.0>

=INFO REPORT==== 28-Jun-2018::17:32:43 ===
Mirrored queue 'compute.ock00001i2.cor00005.cni.ukcloud.com' in vhost '/': Promoting slave <rabbit.9587.0> to master

=INFO REPORT==== 28-Jun-2018::17:32:43 ===
Mirrored queue 'reply_2478c84f57c0400bbc70ccd3d1e06d29' in vhost '/': Slave <rabbit.5786.8> saw deaths of mirrors <rabbit.2934.9>

=INFO REPORT==== 28-Jun-2018::17:32:43 ===
node rabbit@occ00002i2 down: etimedout

=INFO REPORT==== 28-Jun-2018::17:32:43 ===
node rabbit@occ00002i2 up

ERROR REPORT==== 28-Jun-2018::17:32:43 ===
Partial partition detected:
 * We saw DOWN from rabbit@occ00002i2
 * We can still see rabbit@occ00003i2 which can see rabbit@occ00002i2
 * pause_minority mode enabled
We will therefore pause until the *entire* cluster recovers

=WARNING REPORT==== 28-Jun-2018::17:32:43 ===
Cluster minority/secondary status detected - awaiting recovery

=INFO REPORT==== 28-Jun-2018::17:32:43 ===
Stopping RabbitMQ

=INFO REPORT==== 28-Jun-2018::17:32:43 ===
stopped TCP Listener on 10.41.54.20:5672

=ERROR REPORT==== 28-Jun-2018::17:32:43 ===
Error on AMQP connection <0.11773.1> (10.41.54.19:58070 -> 10.41.54.20:5672, vhost: '/', user: 'guest', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

=ERROR REPORT==== 28-Jun-2018::17:32:43 ===
Error on AMQP connection <0.17892.24> (10.41.54.15:40896 -> 10.41.54.20:5672, vhost: '/', user: 'guest', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
..
..
=ERROR REPORT==== 28-Jun-2018::17:32:44 ===
** Generic server <0.28123.24> terminating
** Last message in was {'$gen_cast',join}
** When Server state == {state,
                            {4,<0.28123.24>},
                            {{4,<0.28123.24>},undefined},
                            {{4,<0.28123.24>},undefined},
                            {resource,<<"/">>,queue,
                                <<"reply_093474841b384c55833c3da39c8561e6">>},
                            rabbit_mirror_queue_slave,undefined,-1,undefined,
                            [<0.28122.24>],
                            {[],[]},
                            [],0,undefined,
                            #Fun<rabbit_misc.execute_mnesia_transaction.1>,
                            false}
** Reason for termination ==
** {bad_return_value,
       {error,
           {{badmatch,{error,not_found}},
            [{gm,'-record_dead_member_in_group/5-fun-1-',4,
                 [{file,"src/gm.erl"},{line,1140}]},
             {mnesia_tm,apply_fun,3,[{file,"mnesia_tm.erl"},{line,833}]},
             {mnesia_tm,execute_transaction,5,
                 [{file,"mnesia_tm.erl"},{line,808}]},
             {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1,
                 [{file,"src/rabbit_misc.erl"},{line,532}]},
             {worker_pool_worker,'-run/2-fun-0-',3,
                 [{file,"src/worker_pool_worker.erl"},{line,77}]}]}}}

...
...
more traces
...
...

And then this trace that seems [1] or [2]

=ERROR REPORT==== 28-Jun-2018::17:32:44 ===


** Reason for termination ==
** {{badmatch,{error,not_found}},
    [{rabbit_mirror_queue_master,stop_all_slaves,2,
                                 [{file,"src/rabbit_mirror_queue_master.erl"},
                                  {line,214}]},
     {rabbit_mirror_queue_master,delete_and_terminate,2,
                                 [{file,"src/rabbit_mirror_queue_master.erl"},
                                  {line,210}]},
     {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6,
                              [{file,"src/rabbit_amqqueue_process.erl"},
                               {line,283}]},
     {rabbit_amqqueue_process,terminate_shutdown,2,
                              [{file,"src/rabbit_amqqueue_process.erl"},
                               {line,308}]},
     {gen_server2,terminate,3,[{file,"src/gen_server2.erl"},{line,1129}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}
** In 'terminate' callback with reason ==
** normal

Version-Release number of selected component (if applicable):
OSP 9
rabbitmq-server-3.6.3-6.el7ost


[1] https://github.com/rabbitmq/rabbitmq-server/issues/953
[2] https://github.com/rabbitmq/rabbitmq-server/issues/1035

Comment 1 Peter Lemenkov 2018-07-02 12:59:19 UTC
(In reply to Eduard Barrera from comment #0)

> ** {bad_return_value,
>        {error,
>            {{badmatch,{error,not_found}},
>             [{gm,'-record_dead_member_in_group/5-fun-1-',4,
>                  [{file,"src/gm.erl"},{line,1140}]},
>              {mnesia_tm,apply_fun,3,[{file,"mnesia_tm.erl"},{line,833}]},
>              {mnesia_tm,execute_transaction,5,
>                  [{file,"mnesia_tm.erl"},{line,808}]},
>              {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1,
>                  [{file,"src/rabbit_misc.erl"},{line,532}]},
>              {worker_pool_worker,'-run/2-fun-0-',3,
>                  [{file,"src/worker_pool_worker.erl"},{line,77}]}]}}}
> 

^^^ This issue was fixed in this upstream's ticket:

https://github.com/rabbitmq/rabbitmq-server/issues/950

The fix is available only in RHOS-13 and higher.

> ** {{badmatch,{error,not_found}},
>     [{rabbit_mirror_queue_master,stop_all_slaves,2,
>                                 
> [{file,"src/rabbit_mirror_queue_master.erl"},
>                                   {line,214}]},
>      {rabbit_mirror_queue_master,delete_and_terminate,2,
>                                 
> [{file,"src/rabbit_mirror_queue_master.erl"},
>                                   {line,210}]},
>      {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6,
>                               [{file,"src/rabbit_amqqueue_process.erl"},
>                                {line,283}]},
>      {rabbit_amqqueue_process,terminate_shutdown,2,
>                               [{file,"src/rabbit_amqqueue_process.erl"},
>                                {line,308}]},
>      {gen_server2,terminate,3,[{file,"src/gen_server2.erl"},{line,1129}]},
>      {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}

yes this issue is addressed in these two tickets (fixed completely in rabbitmq-server-3.6.3-10.el7ost):

> [1] https://github.com/rabbitmq/rabbitmq-server/issues/953
> [2] https://github.com/rabbitmq/rabbitmq-server/issues/1035

Comment 5 Alex McLeod 2018-09-03 07:57:47 UTC
Hi there,

If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field.

The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Thanks,
Alex

Comment 6 Peter Lemenkov 2018-09-03 12:49:44 UTC
I don't think this particular ticket requires any documentation. So I'm going to sed requires_doc_text to '-'.

Comment 8 errata-xmlrpc 2018-09-17 17:00:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2671