Bug 1357991
| Summary: | rabbitmq: HA-Config crash with "exception exit" with multiple error | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Peter Lemenkov <plemenko> | |
| Component: | rabbitmq-server | Assignee: | Peter Lemenkov <plemenko> | |
| Status: | CLOSED ERRATA | QA Contact: | Asaf Hirshberg <ahirshbe> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | high | |||
| Version: | 8.0 (Liberty) | CC: | adhingra, apevec, chlong, cmedeiro, dmaley, fahmed, ggillies, jeckersb, lhh, pbandark, pcaruana, plemenko, srevivo, ushkalim | |
| Target Milestone: | async | Keywords: | ZStream | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | rabbitmq-server-3.3.5-23.el7ost | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1350073 | |||
| : | 1370082 1387985 (view as bug list) | Environment: | ||
| Last Closed: | 2016-08-31 17:37:59 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1311180, 1319334 | |||
| Bug Blocks: | ||||
(In reply to Asaf Hirshberg from comment #3) > Testing on OSPD-8 with the desired rpm I ran some automation(running Rally, > reboots for controllers..) now I got some crash reports like: > > =CRASH REPORT==== 24-Aug-2016::17:57:07 === > crasher: > initial call: gen:init_it/6 > pid: <0.623.0> > registered_name: [] > exception exit: {undef, > [{rabbit_misc,get_env, > [rabbit,slave_wait_timeout,15000], > []}, > {rabbit_mirror_queue_master, > promote_backing_queue_state,8, > [{file,"src/rabbit_mirror_queue_master.erl"}, > {line,452}]}, > {rabbit_mirror_queue_slave,promote_me,2, > [{file,"src/rabbit_mirror_queue_slave.erl"}, > {line,615}]}, > {rabbit_mirror_queue_slave,handle_call,3, > [{file,"src/rabbit_mirror_queue_slave.erl"}, > {line,220}]}, > {gen_server2,handle_msg,2, > [{file,"src/gen_server2.erl"},{line,1001}]}, > {proc_lib,wake_up,3, > [{file,"proc_lib.erl"},{line,249}]}]} > in function gen_server2:terminate/3 (src/gen_server2.erl, line 1133) > ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.105.0>] > messages: [{'$gen_cast',policy_changed}] > > But I not sure what are the success/fail criteria. is there something > specific I should look for? How can I now if the crash is not related to a > reboot of a controller? Is there any "reproduce steps"? That's another (unrelated) issue. It was introduced during backporting (application calls non-existing function added later). I'll provide a build shortly. [root@overcloud-controller-0 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@overcloud-controller-0' ...
[{nodes,[{disc,['rabbit@overcloud-controller-0',
'rabbit@overcloud-controller-1',
'rabbit@overcloud-controller-2']}]},
{running_nodes,['rabbit@overcloud-controller-2',
'rabbit@overcloud-controller-1',
'rabbit@overcloud-controller-0']},
{cluster_name,<<"rabbit">>},
{partitions,[]}]
...done.
[root@overcloud-controller-0 ~]# rabbitmqctl status
Status of node 'rabbit@overcloud-controller-0' ...
[{pid,5939},
{running_applications,[{rabbit,"RabbitMQ","3.3.5"},
{mnesia,"MNESIA CXC 138 12","4.11"},
{os_mon,"CPO CXC 138 46","2.2.14"},
{xmerl,"XML parser","1.3.6"},
{sasl,"SASL CXC 138 11","2.3.4"},
{stdlib,"ERTS CXC 138 10","1.19.4"},
{kernel,"ERTS CXC 138 10","2.16.4"}]},
{os,{unix,linux}},
{erlang_version,"Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:12:12] [async-threads:30] [hipe] [kernel-poll:true]\n"},
{memory,[{total,315832296},
{connection_procs,11743080},
{queue_procs,9151048},
{plugins,0},
{other_proc,14356040},
{mnesia,1660976},
{mgmt_db,0},
{msg_index,295536},
{other_ets,1482912},
{binary,248563960},
{code,16705858},
{atom,654217},
{other_system,11218669}]},
{alarms,[]},
{listeners,[{clustering,35672,"::"},{amqp,5672,"10.35.174.13"}]},
{vm_memory_high_watermark,0.4},
{vm_memory_limit,13423173632},
{disk_free_limit,50000000},
{disk_free,466386874368},
{file_descriptors,[{total_limit,65436},
{total_used,227},
{sockets_limit,58890},
{sockets_used,225}]},
{processes,[{limit,1048576},{used,3646}]},
{run_queue,0},
{uptime,2524}]
...done.
[root@overcloud-controller-0 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1792.html |
Testing on OSPD-8 with the desired rpm I ran some automation(running Rally, reboots for controllers..) now I got some crash reports like: =CRASH REPORT==== 24-Aug-2016::17:57:07 === crasher: initial call: gen:init_it/6 pid: <0.623.0> registered_name: [] exception exit: {undef, [{rabbit_misc,get_env, [rabbit,slave_wait_timeout,15000], []}, {rabbit_mirror_queue_master, promote_backing_queue_state,8, [{file,"src/rabbit_mirror_queue_master.erl"}, {line,452}]}, {rabbit_mirror_queue_slave,promote_me,2, [{file,"src/rabbit_mirror_queue_slave.erl"}, {line,615}]}, {rabbit_mirror_queue_slave,handle_call,3, [{file,"src/rabbit_mirror_queue_slave.erl"}, {line,220}]}, {gen_server2,handle_msg,2, [{file,"src/gen_server2.erl"},{line,1001}]}, {proc_lib,wake_up,3, [{file,"proc_lib.erl"},{line,249}]}]} in function gen_server2:terminate/3 (src/gen_server2.erl, line 1133) ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.105.0>] messages: [{'$gen_cast',policy_changed}] But I not sure what are the success/fail criteria. is there something specific I should look for? How can I now if the crash is not related to a reboot of a controller? Is there any "reproduce steps"?