Bug 1364587 - rhel-osp-director: Reboot the undercloud post 8.0->9.0 upgrade: rabbitmq-server.service fails to start.
Summary: rhel-osp-director: Reboot the undercloud post 8.0->9.0 upgrade: rabbitmq-se...
Keywords:
Status: CLOSED DUPLICATE of bug 1348700
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ga
: 9.0 (Mitaka)
Assignee: John Eckersberg
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-05 20:29 UTC by Alexander Chuzhoy
Modified: 2016-08-09 10:39 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-09 10:39:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Alexander Chuzhoy 2016-08-05 20:29:55 UTC
rhel-osp-director:   Reboot the undercloud post 8.0->9.0 upgrade: rabbitmq-server.service fails to start.

Environment:
openstack-tripleo-heat-templates-liberty-2.0.0-29.el7ost.noarch
openstack-puppet-modules-8.1.7-2.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.14-16.el7ost.noarch
openstack-tripleo-heat-templates-2.0.0-29.el7ost.noarch
instack-undercloud-4.0.0-11.el7ost.noarch


Steps to reproduce:
1. Deploy 8.0 with:
openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --swift-storage-scale 0 --block-storage-scale 0 --neutron-tunnel-types vxlan,gre --neutron-network-type vxlan,gre --neutron-network-vlan-ranges datacentre:118:143 --neutron-bridge-mappings datacentre:br-ex --ntp-server clock.redhat.com --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e /home/stack/ssl-heat-templates/environments/enable-tls.yaml -e /home/stack/ssl-heat-templates/environments/inject-trust-anchor.yaml --ceph-storage-scale 1


2. Popuate the overcloud
3. Upgrade to 9.0
4. Reboot the setup (sanity test to see if it survives a reboot with no issues).

Result:
● rabbitmq-server.service   loaded failed  failed  RabbitMQ broker


-- Reboot --
Aug 05 15:31:05 instack.localdomain systemd[1]: Starting RabbitMQ broker...
Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},"Protocol: ~tp: register/listen error: ~tp~n",["inet_tcp",no_reg_reply_from_epmd]}
Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},crash_report,[[{initial_call,{net_kernel,init,['Argument__1']}},{pid,<0.22.0>},{registered_name,[]},{error_info,{exi
Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},supervisor_report,[{supervisor,{local,net_sup}},{errorContext,start_error},{reason,{'EXIT',nodistribution}},{offende
Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},supervisor_report,[{supervisor,{local,kernel_sup}},{errorContext,start_error},{reason,{shutdown,{failed_to_start_chi
Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},crash_report,[[{initial_call,{application_master,init,['Argument__1','Argument__2','Argument__3','Argument__4']}},{p
Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},std_info,[{application,kernel},{exited,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,ne
Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_ch
Aug 05 15:31:57 instack.localdomain rabbitmq-server[1455]: Crash dump is being written to: erl_crash.dump...done
Aug 05 15:31:57 instack.localdomain rabbitmq-server[1455]: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_chi
Aug 05 15:31:59 instack.localdomain systemd[1]: rabbitmq-server.service: main process exited, code=exited, status=1/FAILURE
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: Stopping and halting node rabbit@instack ...
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: Error: unable to connect to node rabbit@instack: nodedown
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: DIAGNOSTICS
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: ===========
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: attempted to contact: [rabbit@instack]
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: rabbit@instack:
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: * connected to epmd (port 4369) on instack
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: * epmd reports: node 'rabbit' not running at all
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: no other nodes on instack
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: * suggestion: start the node
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: current node details:
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: - node name: 'rabbitmq-cli-13@instack'
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: - home dir: /var/lib/rabbitmq
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: - cookie hash: 75C+x/URa/VdmLmddM5KTA==
Aug 05 15:32:14 instack.localdomain systemd[1]: Failed to start RabbitMQ broker.
Aug 05 15:32:14 instack.localdomain systemd[1]: Unit rabbitmq-server.service entered failed state.
Aug 05 15:32:14 instack.localdomain systemd[1]: rabbitmq-server.service failed.

Comment 2 Alexander Chuzhoy 2016-08-05 20:30:23 UTC
The issue reproduces.

Comment 3 Alexander Chuzhoy 2016-08-05 20:31:55 UTC
Running manually "sudo systemctl start rabbitmq-server" works.

Comment 4 John Eckersberg 2016-08-08 16:46:50 UTC
At a glance, it looks like maybe the network isn't up yet when rabbitmq starts (I think we've had that problem in the past).  RabbitMQ tries to register itself with epmd but fails.  The fact that it works later on if you manually start it makes me think it's the network thing during startup.

Comment 5 John Eckersberg 2016-08-08 21:24:08 UTC
I've tried to reproduce this by:

- install osp8 undercloud via quickstart
- upgrade undercloud to osp9
- reboot

and the issue did not reproduce for me.

Possibly something to do with the more complex network setup in your scenario?

Comment 6 Fabio Massimo Di Nitto 2016-08-09 10:39:21 UTC

*** This bug has been marked as a duplicate of bug 1348700 ***


Note You need to log in before you can comment on or make changes to this bug.