rhel-osp-director: Reboot the undercloud post 8.0->9.0 upgrade: rabbitmq-server.service fails to start. Environment: openstack-tripleo-heat-templates-liberty-2.0.0-29.el7ost.noarch openstack-puppet-modules-8.1.7-2.el7ost.noarch openstack-tripleo-heat-templates-kilo-0.8.14-16.el7ost.noarch openstack-tripleo-heat-templates-2.0.0-29.el7ost.noarch instack-undercloud-4.0.0-11.el7ost.noarch Steps to reproduce: 1. Deploy 8.0 with: openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --swift-storage-scale 0 --block-storage-scale 0 --neutron-tunnel-types vxlan,gre --neutron-network-type vxlan,gre --neutron-network-vlan-ranges datacentre:118:143 --neutron-bridge-mappings datacentre:br-ex --ntp-server clock.redhat.com --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e /home/stack/ssl-heat-templates/environments/enable-tls.yaml -e /home/stack/ssl-heat-templates/environments/inject-trust-anchor.yaml --ceph-storage-scale 1 2. Popuate the overcloud 3. Upgrade to 9.0 4. Reboot the setup (sanity test to see if it survives a reboot with no issues). Result: ● rabbitmq-server.service loaded failed failed RabbitMQ broker -- Reboot -- Aug 05 15:31:05 instack.localdomain systemd[1]: Starting RabbitMQ broker... Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},"Protocol: ~tp: register/listen error: ~tp~n",["inet_tcp",no_reg_reply_from_epmd]} Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},crash_report,[[{initial_call,{net_kernel,init,['Argument__1']}},{pid,<0.22.0>},{registered_name,[]},{error_info,{exi Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},supervisor_report,[{supervisor,{local,net_sup}},{errorContext,start_error},{reason,{'EXIT',nodistribution}},{offende Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},supervisor_report,[{supervisor,{local,kernel_sup}},{errorContext,start_error},{reason,{shutdown,{failed_to_start_chi Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},crash_report,[[{initial_call,{application_master,init,['Argument__1','Argument__2','Argument__3','Argument__4']}},{p Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},std_info,[{application,kernel},{exited,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,ne Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_ch Aug 05 15:31:57 instack.localdomain rabbitmq-server[1455]: Crash dump is being written to: erl_crash.dump...done Aug 05 15:31:57 instack.localdomain rabbitmq-server[1455]: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_chi Aug 05 15:31:59 instack.localdomain systemd[1]: rabbitmq-server.service: main process exited, code=exited, status=1/FAILURE Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: Stopping and halting node rabbit@instack ... Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: Error: unable to connect to node rabbit@instack: nodedown Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: DIAGNOSTICS Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: =========== Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: attempted to contact: [rabbit@instack] Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: rabbit@instack: Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: * connected to epmd (port 4369) on instack Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: * epmd reports: node 'rabbit' not running at all Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: no other nodes on instack Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: * suggestion: start the node Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: current node details: Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: - node name: 'rabbitmq-cli-13@instack' Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: - home dir: /var/lib/rabbitmq Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: - cookie hash: 75C+x/URa/VdmLmddM5KTA== Aug 05 15:32:14 instack.localdomain systemd[1]: Failed to start RabbitMQ broker. Aug 05 15:32:14 instack.localdomain systemd[1]: Unit rabbitmq-server.service entered failed state. Aug 05 15:32:14 instack.localdomain systemd[1]: rabbitmq-server.service failed.
The issue reproduces.
Running manually "sudo systemctl start rabbitmq-server" works.
At a glance, it looks like maybe the network isn't up yet when rabbitmq starts (I think we've had that problem in the past). RabbitMQ tries to register itself with epmd but fails. The fact that it works later on if you manually start it makes me think it's the network thing during startup.
I've tried to reproduce this by: - install osp8 undercloud via quickstart - upgrade undercloud to osp9 - reboot and the issue did not reproduce for me. Possibly something to do with the more complex network setup in your scenario?
*** This bug has been marked as a duplicate of bug 1348700 ***