Bug 1357991

Summary: rabbitmq: HA-Config crash with "exception exit" with multiple error
Product: Red Hat OpenStack Reporter: Peter Lemenkov <plemenko>
Component: rabbitmq-serverAssignee: Peter Lemenkov <plemenko>
Status: CLOSED ERRATA QA Contact: Asaf Hirshberg <ahirshbe>
Severity: urgent Docs Contact:
Priority: high    
Version: 8.0 (Liberty)CC: adhingra, apevec, chlong, cmedeiro, dmaley, fahmed, ggillies, jeckersb, lhh, pbandark, pcaruana, plemenko, srevivo, ushkalim
Target Milestone: asyncKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rabbitmq-server-3.3.5-23.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1350073
: 1370082 1387985 (view as bug list) Environment:
Last Closed: 2016-08-31 17:37:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1311180, 1319334    
Bug Blocks:    

Comment 3 Asaf Hirshberg 2016-08-25 08:38:33 UTC
Testing on OSPD-8 with the desired rpm I ran some automation(running Rally, reboots for controllers..) now I got some crash reports like:

=CRASH REPORT==== 24-Aug-2016::17:57:07 ===
  crasher:
    initial call: gen:init_it/6
    pid: <0.623.0>
    registered_name: []
    exception exit: {undef,
                        [{rabbit_misc,get_env,
                             [rabbit,slave_wait_timeout,15000],
                             []},
                         {rabbit_mirror_queue_master,
                             promote_backing_queue_state,8,
                             [{file,"src/rabbit_mirror_queue_master.erl"},
                              {line,452}]},
                         {rabbit_mirror_queue_slave,promote_me,2,
                             [{file,"src/rabbit_mirror_queue_slave.erl"},
                              {line,615}]},
                         {rabbit_mirror_queue_slave,handle_call,3,
                             [{file,"src/rabbit_mirror_queue_slave.erl"},
                              {line,220}]},
                         {gen_server2,handle_msg,2,
                             [{file,"src/gen_server2.erl"},{line,1001}]},
                         {proc_lib,wake_up,3,
                             [{file,"proc_lib.erl"},{line,249}]}]}
      in function  gen_server2:terminate/3 (src/gen_server2.erl, line 1133)
    ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.105.0>]
    messages: [{'$gen_cast',policy_changed}]

But I not sure what are the success/fail criteria. is there something specific I should  look for? How can I now if the crash is not related to a reboot of a controller? Is there any "reproduce steps"?

Comment 4 Peter Lemenkov 2016-08-25 09:16:05 UTC
(In reply to Asaf Hirshberg from comment #3)
> Testing on OSPD-8 with the desired rpm I ran some automation(running Rally,
> reboots for controllers..) now I got some crash reports like:
> 
> =CRASH REPORT==== 24-Aug-2016::17:57:07 ===
>   crasher:
>     initial call: gen:init_it/6
>     pid: <0.623.0>
>     registered_name: []
>     exception exit: {undef,
>                         [{rabbit_misc,get_env,
>                              [rabbit,slave_wait_timeout,15000],
>                              []},
>                          {rabbit_mirror_queue_master,
>                              promote_backing_queue_state,8,
>                              [{file,"src/rabbit_mirror_queue_master.erl"},
>                               {line,452}]},
>                          {rabbit_mirror_queue_slave,promote_me,2,
>                              [{file,"src/rabbit_mirror_queue_slave.erl"},
>                               {line,615}]},
>                          {rabbit_mirror_queue_slave,handle_call,3,
>                              [{file,"src/rabbit_mirror_queue_slave.erl"},
>                               {line,220}]},
>                          {gen_server2,handle_msg,2,
>                              [{file,"src/gen_server2.erl"},{line,1001}]},
>                          {proc_lib,wake_up,3,
>                              [{file,"proc_lib.erl"},{line,249}]}]}
>       in function  gen_server2:terminate/3 (src/gen_server2.erl, line 1133)
>     ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.105.0>]
>     messages: [{'$gen_cast',policy_changed}]
> 
> But I not sure what are the success/fail criteria. is there something
> specific I should  look for? How can I now if the crash is not related to a
> reboot of a controller? Is there any "reproduce steps"?

That's another (unrelated) issue. It was introduced during backporting (application calls non-existing function added later). I'll provide a build shortly.

Comment 5 Asaf Hirshberg 2016-08-25 11:56:26 UTC
[root@overcloud-controller-0 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@overcloud-controller-0' ...
[{nodes,[{disc,['rabbit@overcloud-controller-0',
                'rabbit@overcloud-controller-1',
                'rabbit@overcloud-controller-2']}]},
 {running_nodes,['rabbit@overcloud-controller-2',
                 'rabbit@overcloud-controller-1',
                 'rabbit@overcloud-controller-0']},
 {cluster_name,<<"rabbit">>},
 {partitions,[]}]
...done.
[root@overcloud-controller-0 ~]# rabbitmqctl status 
Status of node 'rabbit@overcloud-controller-0' ...
[{pid,5939},
 {running_applications,[{rabbit,"RabbitMQ","3.3.5"},
                        {mnesia,"MNESIA  CXC 138 12","4.11"},
                        {os_mon,"CPO  CXC 138 46","2.2.14"},
                        {xmerl,"XML parser","1.3.6"},
                        {sasl,"SASL  CXC 138 11","2.3.4"},
                        {stdlib,"ERTS  CXC 138 10","1.19.4"},
                        {kernel,"ERTS  CXC 138 10","2.16.4"}]},
 {os,{unix,linux}},
 {erlang_version,"Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:12:12] [async-threads:30] [hipe] [kernel-poll:true]\n"},
 {memory,[{total,315832296},
          {connection_procs,11743080},
          {queue_procs,9151048},
          {plugins,0},
          {other_proc,14356040},
          {mnesia,1660976},
          {mgmt_db,0},
          {msg_index,295536},
          {other_ets,1482912},
          {binary,248563960},
          {code,16705858},
          {atom,654217},
          {other_system,11218669}]},
 {alarms,[]},
 {listeners,[{clustering,35672,"::"},{amqp,5672,"10.35.174.13"}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,13423173632},
 {disk_free_limit,50000000},
 {disk_free,466386874368},
 {file_descriptors,[{total_limit,65436},
                    {total_used,227},
                    {sockets_limit,58890},
                    {sockets_used,225}]},
 {processes,[{limit,1048576},{used,3646}]},
 {run_queue,0},
 {uptime,2524}]
...done.
[root@overcloud-controller-0 ~]#

Comment 7 errata-xmlrpc 2016-08-31 17:37:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1792.html