Description of problem: Version-Release number of selected component (if applicable): RabbitMQ 3.3.5 on Erlang R16B03 How reproducible: rabbitmq-server failed to start after fence/reboot of node1 in cluster. Here is stack trace during boot visible in /var/log/messages: rabbitmq-server: RabbitMQ 3.3.5. Copyright (C) 2007-2014 GoPivotal, Inc. rabbitmq-server: ## ## Licensed under the MPL. See http://www.rabbitmq.com/ rabbitmq-server: ## ## rabbitmq-server: ########## Logs: /var/log/rabbitmq/rabbit rabbitmq-server: ###### ## /var/log/rabbitmq/rabbit rabbitmq-server: ########## rabbitmq-server: Starting broker... rabbitmq-server: BOOT FAILED rabbitmq-server: =========== rabbitmq-server: Error description: rabbitmq-server: {error,mnesia_not_running} rabbitmq-server: Log files (may contain more information): rabbitmq-server: /var/log/rabbitmq/rabbit rabbitmq-server: /var/log/rabbitmq/rabbit rabbitmq-server: Stack trace: rabbitmq-server: [{rabbit_mnesia,ensure_mnesia_running,0, rabbitmq-server: [{file,"src/rabbit_mnesia.erl"},{line,485}]}, rabbitmq-server: {rabbit_mnesia,init,0,[{file,"src/rabbit_mnesia.erl"},{line,95}]}, rabbitmq-server: {rabbit,'-run_boot_step/1-lc$^1/1-1-',1, rabbitmq-server: [{file,"src/rabbit.erl"},{line,509}]}, rabbitmq-server: {rabbit,run_boot_step,1,[{file,"src/rabbit.erl"},{line,508}]}, rabbitmq-server: {rabbit,'-start/2-lc$^0/1-0-',1,[{file,"src/rabbit.erl"},{line,475}]}, rabbitmq-server: {rabbit,start,2,[{file,"src/rabbit.erl"},{line,475}]}, rabbitmq-server: {application_master,start_it_old,4, rabbitmq-server: [{file,"application_master.erl"},{line,269}]}] rabbitmq-server: BOOT FAILED rabbitmq-server: =========== rabbitmq-server: Error description: rabbitmq-server: {could_not_start,rabbit, rabbitmq-server: {bad_return, rabbitmq-server: {{rabbit,start,[normal,[]]}, rabbitmq-server: {'EXIT', rabbitmq-server: {rabbit,failure_during_boot,{error,mnesia_not_running}}}}}} rabbitmq-server: Log files (may contain more information): rabbitmq-server: /var/log/rabbitmq/rabbit rabbitmq-server: /var/log/rabbitmq/rabbit rabbitmq-server: {"init terminating in do_boot",{rabbit,failure_during_boot,{could_not_start,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot,{error,mnesia_not_running}}}}}}}} ~ It look like to me there was problem of getting the membership information of mnesia group members. In the /var/log/rabbitmq/rabbit: Mnesia('rabbit@node1-clu'): ** ERROR ** mnesia_late_loader got unexpected message: {'$gen_cast', {'$gm', 60, check_neighbours}} Here is some more report regarding the check_neighbours: ** Generic server mnesia_snmp_sup terminating ** Last message in was {'$gen_cast',{'$gm',60,check_neighbours}} ** When Server state == {state, {local,mnesia_snmp_sup}, simple_one_for_one, [{child,undefined,mnesia_snmp_sup, {mnesia_snmp_hook,start,[]}, transient,3000,worker, [mnesia_snmp_sup,mnesia_snmp_hook, supervisor]}], undefined,0,86400000,[],mnesia_snmp_sup,[]} ** Reason for termination == ** {function_clause, [{supervisor,handle_cast, [{'$gm',60,check_neighbours}, {state, {local,mnesia_snmp_sup}, simple_one_for_one, [{child,undefined,mnesia_snmp_sup, {mnesia_snmp_hook,start,[]}, transient,3000,worker, [mnesia_snmp_sup,mnesia_snmp_hook,supervisor]}], undefined,0,86400000,[],mnesia_snmp_sup,[]}], [{file,"supervisor.erl"},{line,532}]}, {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,604}]}, {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]} Error in process <0.408.0> on node 'rabbit@node1-clu' with exit value: {badarg,[{mnesia_tm,return_abort,3,[{file,"mnesia_tm.erl"},{line,950}]},{mnesia_loader,init_receiver,5,[{file,"mnesia_loader.erl"},{line,269}]},{mnesia_loader,do_get_network_copy,5,[{file,"mnesia_loader.erl"},{line,206}]},{mnesia_controller... ~ To me it seems that it wanted to restart mnesia, however ended up in the loop and reached_max_restart_intensity: Supervisor: {local,mnesia_kernel_sup} Context: shutdown Reason: reached_max_restart_intensity Offender: [{pid,<0.342.0>}, {name,mnesia_snmp_sup}, {mfargs,{mnesia_snmp_sup,start,[]}}, {restart_type,permanent}, {shutdown,infinity}, {child_type,supervisor}] And here is the crasher: crasher: initial call: supervisor:mnesia_snmp_sup/1 pid: <0.342.0> registered_name: mnesia_snmp_sup exception exit: {function_clause, [{supervisor,handle_cast, [{'$gm',60,check_neighbours}, {state, {local,mnesia_snmp_sup}, simple_one_for_one, [{child,undefined,mnesia_snmp_sup, {mnesia_snmp_hook,start,[]}, transient,3000,worker, [mnesia_snmp_sup,mnesia_snmp_hook, supervisor]}], undefined,0,86400000,[],mnesia_snmp_sup,[]}], [{file,"supervisor.erl"},{line,532}]}, {gen_server,handle_msg,5, [{file,"gen_server.erl"},{line,604}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,239}]}]} in function gen_server:terminate/6 (gen_server.erl, line 744) ancestors: [mnesia_kernel_sup,mnesia_sup,<0.56.0>] messages: [] links: [<0.59.0>] dictionary: [] trap_exit: true status: running heap_size: 610 stack_size: 27 reductions: 318 neighbours: I am noticing here that the neighbours: is empty here. Actual results: rabbitmq-server failed to start. Expected results: rabbitmq-server started with return code 0 Additional info: rabbitmq is clustered with pacemaker in total of 3 nodes. not sure if there was a bug in 3.3.5 which could have fix this or is it a new one which is present also in older versions. Regards, Robin Černín
Ok, here is what's going on there. The problem is that RabbitMQ core got stuck in a semi-stopped state. Please next try run: rabbitmqctl stop_app rabbitmqctl start_app on that node. Or just restart the entire node again.
Thank you, closing this BZ as it is resolved now. Thank you Kind Regards, Robin Černín