Testing on OSPD-8 with the desired rpm I ran some automation(running Rally, reboots for controllers..) now I got some crash reports like: =CRASH REPORT==== 24-Aug-2016::17:57:07 === crasher: initial call: gen:init_it/6 pid: <0.623.0> registered_name: [] exception exit: {undef, [{rabbit_misc,get_env, [rabbit,slave_wait_timeout,15000], []}, {rabbit_mirror_queue_master, promote_backing_queue_state,8, [{file,"src/rabbit_mirror_queue_master.erl"}, {line,452}]}, {rabbit_mirror_queue_slave,promote_me,2, [{file,"src/rabbit_mirror_queue_slave.erl"}, {line,615}]}, {rabbit_mirror_queue_slave,handle_call,3, [{file,"src/rabbit_mirror_queue_slave.erl"}, {line,220}]}, {gen_server2,handle_msg,2, [{file,"src/gen_server2.erl"},{line,1001}]}, {proc_lib,wake_up,3, [{file,"proc_lib.erl"},{line,249}]}]} in function gen_server2:terminate/3 (src/gen_server2.erl, line 1133) ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.105.0>] messages: [{'$gen_cast',policy_changed}] But I not sure what are the success/fail criteria. is there something specific I should look for? How can I now if the crash is not related to a reboot of a controller? Is there any "reproduce steps"?
(In reply to Asaf Hirshberg from comment #3) > Testing on OSPD-8 with the desired rpm I ran some automation(running Rally, > reboots for controllers..) now I got some crash reports like: > > =CRASH REPORT==== 24-Aug-2016::17:57:07 === > crasher: > initial call: gen:init_it/6 > pid: <0.623.0> > registered_name: [] > exception exit: {undef, > [{rabbit_misc,get_env, > [rabbit,slave_wait_timeout,15000], > []}, > {rabbit_mirror_queue_master, > promote_backing_queue_state,8, > [{file,"src/rabbit_mirror_queue_master.erl"}, > {line,452}]}, > {rabbit_mirror_queue_slave,promote_me,2, > [{file,"src/rabbit_mirror_queue_slave.erl"}, > {line,615}]}, > {rabbit_mirror_queue_slave,handle_call,3, > [{file,"src/rabbit_mirror_queue_slave.erl"}, > {line,220}]}, > {gen_server2,handle_msg,2, > [{file,"src/gen_server2.erl"},{line,1001}]}, > {proc_lib,wake_up,3, > [{file,"proc_lib.erl"},{line,249}]}]} > in function gen_server2:terminate/3 (src/gen_server2.erl, line 1133) > ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.105.0>] > messages: [{'$gen_cast',policy_changed}] > > But I not sure what are the success/fail criteria. is there something > specific I should look for? How can I now if the crash is not related to a > reboot of a controller? Is there any "reproduce steps"? That's another (unrelated) issue. It was introduced during backporting (application calls non-existing function added later). I'll provide a build shortly.
[root@overcloud-controller-0 ~]# rabbitmqctl cluster_status Cluster status of node 'rabbit@overcloud-controller-0' ... [{nodes,[{disc,['rabbit@overcloud-controller-0', 'rabbit@overcloud-controller-1', 'rabbit@overcloud-controller-2']}]}, {running_nodes,['rabbit@overcloud-controller-2', 'rabbit@overcloud-controller-1', 'rabbit@overcloud-controller-0']}, {cluster_name,<<"rabbit">>}, {partitions,[]}] ...done. [root@overcloud-controller-0 ~]# rabbitmqctl status Status of node 'rabbit@overcloud-controller-0' ... [{pid,5939}, {running_applications,[{rabbit,"RabbitMQ","3.3.5"}, {mnesia,"MNESIA CXC 138 12","4.11"}, {os_mon,"CPO CXC 138 46","2.2.14"}, {xmerl,"XML parser","1.3.6"}, {sasl,"SASL CXC 138 11","2.3.4"}, {stdlib,"ERTS CXC 138 10","1.19.4"}, {kernel,"ERTS CXC 138 10","2.16.4"}]}, {os,{unix,linux}}, {erlang_version,"Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:12:12] [async-threads:30] [hipe] [kernel-poll:true]\n"}, {memory,[{total,315832296}, {connection_procs,11743080}, {queue_procs,9151048}, {plugins,0}, {other_proc,14356040}, {mnesia,1660976}, {mgmt_db,0}, {msg_index,295536}, {other_ets,1482912}, {binary,248563960}, {code,16705858}, {atom,654217}, {other_system,11218669}]}, {alarms,[]}, {listeners,[{clustering,35672,"::"},{amqp,5672,"10.35.174.13"}]}, {vm_memory_high_watermark,0.4}, {vm_memory_limit,13423173632}, {disk_free_limit,50000000}, {disk_free,466386874368}, {file_descriptors,[{total_limit,65436}, {total_used,227}, {sockets_limit,58890}, {sockets_used,225}]}, {processes,[{limit,1048576},{used,3646}]}, {run_queue,0}, {uptime,2524}] ...done. [root@overcloud-controller-0 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1792.html