Bug 1364587

Summary: rhel-osp-director: Reboot the undercloud post 8.0->9.0 upgrade: rabbitmq-server.service fails to start.
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: rhosp-directorAssignee: John Eckersberg <jeckersb>
Status: CLOSED DUPLICATE QA Contact: Omri Hochman <ohochman>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 9.0 (Mitaka)CC: dbecker, fdinitto, jeckersb, mburns, morazi, rhel-osp-director-maint, tvignaud
Target Milestone: ga   
Target Release: 9.0 (Mitaka)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-09 10:39:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alexander Chuzhoy 2016-08-05 20:29:55 UTC
rhel-osp-director:   Reboot the undercloud post 8.0->9.0 upgrade: rabbitmq-server.service fails to start.

Environment:
openstack-tripleo-heat-templates-liberty-2.0.0-29.el7ost.noarch
openstack-puppet-modules-8.1.7-2.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.14-16.el7ost.noarch
openstack-tripleo-heat-templates-2.0.0-29.el7ost.noarch
instack-undercloud-4.0.0-11.el7ost.noarch


Steps to reproduce:
1. Deploy 8.0 with:
openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --swift-storage-scale 0 --block-storage-scale 0 --neutron-tunnel-types vxlan,gre --neutron-network-type vxlan,gre --neutron-network-vlan-ranges datacentre:118:143 --neutron-bridge-mappings datacentre:br-ex --ntp-server clock.redhat.com --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e /home/stack/ssl-heat-templates/environments/enable-tls.yaml -e /home/stack/ssl-heat-templates/environments/inject-trust-anchor.yaml --ceph-storage-scale 1


2. Popuate the overcloud
3. Upgrade to 9.0
4. Reboot the setup (sanity test to see if it survives a reboot with no issues).

Result:
● rabbitmq-server.service   loaded failed  failed  RabbitMQ broker


-- Reboot --
Aug 05 15:31:05 instack.localdomain systemd[1]: Starting RabbitMQ broker...
Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},"Protocol: ~tp: register/listen error: ~tp~n",["inet_tcp",no_reg_reply_from_epmd]}
Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},crash_report,[[{initial_call,{net_kernel,init,['Argument__1']}},{pid,<0.22.0>},{registered_name,[]},{error_info,{exi
Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},supervisor_report,[{supervisor,{local,net_sup}},{errorContext,start_error},{reason,{'EXIT',nodistribution}},{offende
Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},supervisor_report,[{supervisor,{local,kernel_sup}},{errorContext,start_error},{reason,{shutdown,{failed_to_start_chi
Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},crash_report,[[{initial_call,{application_master,init,['Argument__1','Argument__2','Argument__3','Argument__4']}},{p
Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {error_logger,{{2016,8,5},{15,31,30}},std_info,[{application,kernel},{exited,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,ne
Aug 05 15:31:46 instack.localdomain rabbitmq-server[1455]: {"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_ch
Aug 05 15:31:57 instack.localdomain rabbitmq-server[1455]: Crash dump is being written to: erl_crash.dump...done
Aug 05 15:31:57 instack.localdomain rabbitmq-server[1455]: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_chi
Aug 05 15:31:59 instack.localdomain systemd[1]: rabbitmq-server.service: main process exited, code=exited, status=1/FAILURE
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: Stopping and halting node rabbit@instack ...
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: Error: unable to connect to node rabbit@instack: nodedown
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: DIAGNOSTICS
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: ===========
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: attempted to contact: [rabbit@instack]
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: rabbit@instack:
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: * connected to epmd (port 4369) on instack
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: * epmd reports: node 'rabbit' not running at all
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: no other nodes on instack
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: * suggestion: start the node
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: current node details:
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: - node name: 'rabbitmq-cli-13@instack'
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: - home dir: /var/lib/rabbitmq
Aug 05 15:32:14 instack.localdomain rabbitmqctl[5804]: - cookie hash: 75C+x/URa/VdmLmddM5KTA==
Aug 05 15:32:14 instack.localdomain systemd[1]: Failed to start RabbitMQ broker.
Aug 05 15:32:14 instack.localdomain systemd[1]: Unit rabbitmq-server.service entered failed state.
Aug 05 15:32:14 instack.localdomain systemd[1]: rabbitmq-server.service failed.

Comment 2 Alexander Chuzhoy 2016-08-05 20:30:23 UTC
The issue reproduces.

Comment 3 Alexander Chuzhoy 2016-08-05 20:31:55 UTC
Running manually "sudo systemctl start rabbitmq-server" works.

Comment 4 John Eckersberg 2016-08-08 16:46:50 UTC
At a glance, it looks like maybe the network isn't up yet when rabbitmq starts (I think we've had that problem in the past).  RabbitMQ tries to register itself with epmd but fails.  The fact that it works later on if you manually start it makes me think it's the network thing during startup.

Comment 5 John Eckersberg 2016-08-08 21:24:08 UTC
I've tried to reproduce this by:

- install osp8 undercloud via quickstart
- upgrade undercloud to osp9
- reboot

and the issue did not reproduce for me.

Possibly something to do with the more complex network setup in your scenario?

Comment 6 Fabio Massimo Di Nitto 2016-08-09 10:39:21 UTC

*** This bug has been marked as a duplicate of bug 1348700 ***