Bug 1247358 - rhel-osp-director: rabbitmq doesn't start on the controller: deployment consists of single controller+one compute+ceph node
Summary: rhel-osp-director: rabbitmq doesn't start on the controller: deployment consi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: unspecified
Hardware: x86_64
OS: Linux
high
high
Target Milestone: y2
: 7.0 (Kilo)
Assignee: Mike Burns
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-27 19:53 UTC by Alexander Chuzhoy
Modified: 2015-12-21 16:55 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
In rare cases, RabbitMQ fails to start on deployment. As a workaround, manually start RabbitMQ on nodes: [stack@director ~]$ ssh heat-admin@192.168.0.20 [heat-admin@overcloud-controller-0 ~]$ pcs resource debug-start rabbitmq Then rerun the deployment command on the director. The deployment now succeeds.
Clone Of:
Environment:
Last Closed: 2015-12-21 16:55:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs from the controller. (10.42 MB, application/x-gzip)
2015-07-27 19:58 UTC, Alexander Chuzhoy
no flags Details
heat logs from the undercloud machine. (17.27 MB, application/x-gzip)
2015-07-27 20:05 UTC, Alexander Chuzhoy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2651 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OSP 7 director Bug Fix Advisory 2015-12-21 21:50:26 UTC

Description Alexander Chuzhoy 2015-07-27 19:53:55 UTC
rhel-osp-director: rabbitmq doesn't start on the controller: deployment consists of single controller+one compute+ceph node

Environment:
rabbitmq-server-3.3.5-3.el7ost.noarch
instack-undercloud-2.1.2-22.el7ost.noarch

Steps to reproduce:
1. Attempt to deploy overcloud consisting of 1 controller +1 compute +1 ceph node.

Result:
The deployment fails.
Debugging the machine revealed that the rabbitmq didn't start on the controller.

Attempt to update the overcloud with:
openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e <file>
failed.

Manually started the rabbitmq with ' pcs resource debug-start rabbitmq '

Then re-ran:
openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e <file>  - completed successfully.



Expected result:
The deployment should complete.

Comment 3 Alexander Chuzhoy 2015-07-27 19:58:38 UTC
Created attachment 1056736 [details]
logs from the controller.

Comment 4 Alexander Chuzhoy 2015-07-27 20:05:23 UTC
Created attachment 1056738 [details]
heat logs from the undercloud machine.

Comment 5 John Eckersberg 2015-07-27 21:13:26 UTC
Looking at the rabbitmq logs, it appears it only started one time.  E.g. there wasn't a failed start and then a successful start, it never started at all as expected the first time.  When it did start, it was up and accepting connections within two seconds:

=INFO REPORT==== 27-Jul-2015::13:34:57 ===
Starting RabbitMQ 3.3.5 on Erlang R16B03-1

[snip ...]

=INFO REPORT==== 27-Jul-2015::13:34:59 ===
Server startup complete; 0 plugins started.

=INFO REPORT==== 27-Jul-2015::13:34:59 ===
accepting AMQP connection <0.295.0> (10.19.94.12:37716 -> 10.19.94.12:5672)

Comment 6 Ryan Brown 2015-07-28 19:14:15 UTC
If you look earlier in the log (around 10:40) you can see in the pacemaker logs that it attempts to start rabbit then stops after concluding "rabbit:0 has failed INFINITY times". 

Does that mean pacemaker is attempting to start rabbit but the request never gets far enough for the rabbit daemon to start/log a failure?

Comment 7 Ronelle Landy 2015-07-29 21:16:37 UTC
We just hit this in CI - took forever for the overcloud deployment to fail.

Comment 8 Marios Andreou 2015-07-31 15:21:27 UTC
Couldn't recreate this today on a virt env... has it only been seen on BM?

Comment 10 chris alfonso 2015-08-21 16:25:56 UTC
Sasha, is this only on baremetal, and does it happen every time?

Comment 11 Alexander Chuzhoy 2015-08-21 16:28:45 UTC
Chris,
this doesn't happen every time. 
Didn't see it happening on VM, though I don't think it matters.

Comment 12 Mike Burns 2015-10-15 11:44:24 UTC
I suspect this won't happen anymore, so marking it testonly.

Comment 14 Alexander Chuzhoy 2015-11-30 23:07:38 UTC
Verified:

Environment:
instack-0.0.7-2.el7ost.noarch
instack-undercloud-2.1.2-34.el7ost.noarch


Ran an overcloud depoyment: "openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 --compute-flavor compute --control-flavor control --ceph-storage-flavor ceph"


Completed successfully.
The rabbitmq is UP.

Comment 16 errata-xmlrpc 2015-12-21 16:55:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:2651


Note You need to log in before you can comment on or make changes to this bug.