Bug 1247358 - rhel-osp-director: rabbitmq doesn't start on the controller: deployment consists of single controller+one compute+ceph node
rhel-osp-director: rabbitmq doesn't start on the controller: deployment consi...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director (Show other bugs)
unspecified
x86_64 Linux
high Severity high
: y2
: 7.0 (Kilo)
Assigned To: Mike Burns
Alexander Chuzhoy
: TestOnly, Triaged, ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-07-27 15:53 EDT by Alexander Chuzhoy
Modified: 2015-12-21 11:55 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
In rare cases, RabbitMQ fails to start on deployment. As a workaround, manually start RabbitMQ on nodes: [stack@director ~]$ ssh heat-admin@192.168.0.20 [heat-admin@overcloud-controller-0 ~]$ pcs resource debug-start rabbitmq Then rerun the deployment command on the director. The deployment now succeeds.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-21 11:55:03 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs from the controller. (10.42 MB, application/x-gzip)
2015-07-27 15:58 EDT, Alexander Chuzhoy
no flags Details
heat logs from the undercloud machine. (17.27 MB, application/x-gzip)
2015-07-27 16:05 EDT, Alexander Chuzhoy
no flags Details

  None (edit)
Description Alexander Chuzhoy 2015-07-27 15:53:55 EDT
rhel-osp-director: rabbitmq doesn't start on the controller: deployment consists of single controller+one compute+ceph node

Environment:
rabbitmq-server-3.3.5-3.el7ost.noarch
instack-undercloud-2.1.2-22.el7ost.noarch

Steps to reproduce:
1. Attempt to deploy overcloud consisting of 1 controller +1 compute +1 ceph node.

Result:
The deployment fails.
Debugging the machine revealed that the rabbitmq didn't start on the controller.

Attempt to update the overcloud with:
openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e <file>
failed.

Manually started the rabbitmq with ' pcs resource debug-start rabbitmq '

Then re-ran:
openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e <file>  - completed successfully.



Expected result:
The deployment should complete.
Comment 3 Alexander Chuzhoy 2015-07-27 15:58:38 EDT
Created attachment 1056736 [details]
logs from the controller.
Comment 4 Alexander Chuzhoy 2015-07-27 16:05:23 EDT
Created attachment 1056738 [details]
heat logs from the undercloud machine.
Comment 5 John Eckersberg 2015-07-27 17:13:26 EDT
Looking at the rabbitmq logs, it appears it only started one time.  E.g. there wasn't a failed start and then a successful start, it never started at all as expected the first time.  When it did start, it was up and accepting connections within two seconds:

=INFO REPORT==== 27-Jul-2015::13:34:57 ===
Starting RabbitMQ 3.3.5 on Erlang R16B03-1

[snip ...]

=INFO REPORT==== 27-Jul-2015::13:34:59 ===
Server startup complete; 0 plugins started.

=INFO REPORT==== 27-Jul-2015::13:34:59 ===
accepting AMQP connection <0.295.0> (10.19.94.12:37716 -> 10.19.94.12:5672)
Comment 6 Ryan Brown 2015-07-28 15:14:15 EDT
If you look earlier in the log (around 10:40) you can see in the pacemaker logs that it attempts to start rabbit then stops after concluding "rabbit:0 has failed INFINITY times". 

Does that mean pacemaker is attempting to start rabbit but the request never gets far enough for the rabbit daemon to start/log a failure?
Comment 7 Ronelle Landy 2015-07-29 17:16:37 EDT
We just hit this in CI - took forever for the overcloud deployment to fail.
Comment 8 Marios Andreou 2015-07-31 11:21:27 EDT
Couldn't recreate this today on a virt env... has it only been seen on BM?
Comment 10 chris alfonso 2015-08-21 12:25:56 EDT
Sasha, is this only on baremetal, and does it happen every time?
Comment 11 Alexander Chuzhoy 2015-08-21 12:28:45 EDT
Chris,
this doesn't happen every time. 
Didn't see it happening on VM, though I don't think it matters.
Comment 12 Mike Burns 2015-10-15 07:44:24 EDT
I suspect this won't happen anymore, so marking it testonly.
Comment 14 Alexander Chuzhoy 2015-11-30 18:07:38 EST
Verified:

Environment:
instack-0.0.7-2.el7ost.noarch
instack-undercloud-2.1.2-34.el7ost.noarch


Ran an overcloud depoyment: "openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 --compute-flavor compute --control-flavor control --ceph-storage-flavor ceph"


Completed successfully.
The rabbitmq is UP.
Comment 16 errata-xmlrpc 2015-12-21 11:55:03 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:2651

Note You need to log in before you can comment on or make changes to this bug.