Bug 1247358

Summary: rhel-osp-director: rabbitmq doesn't start on the controller: deployment consists of single controller+one compute+ceph node
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: rhosp-directorAssignee: Mike Burns <mburns>
Status: CLOSED ERRATA QA Contact: Alexander Chuzhoy <sasha>
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: calfonso, dmacpher, jeckersb, mandreou, mburns, rhel-osp-director-maint, rlandy, sasha
Target Milestone: y2Keywords: TestOnly, Triaged, ZStream
Target Release: 7.0 (Kilo)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
In rare cases, RabbitMQ fails to start on deployment. As a workaround, manually start RabbitMQ on nodes: [stack@director ~]$ ssh heat-admin@192.168.0.20 [heat-admin@overcloud-controller-0 ~]$ pcs resource debug-start rabbitmq Then rerun the deployment command on the director. The deployment now succeeds.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-21 16:55:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs from the controller.
none
heat logs from the undercloud machine. none

Description Alexander Chuzhoy 2015-07-27 19:53:55 UTC
rhel-osp-director: rabbitmq doesn't start on the controller: deployment consists of single controller+one compute+ceph node

Environment:
rabbitmq-server-3.3.5-3.el7ost.noarch
instack-undercloud-2.1.2-22.el7ost.noarch

Steps to reproduce:
1. Attempt to deploy overcloud consisting of 1 controller +1 compute +1 ceph node.

Result:
The deployment fails.
Debugging the machine revealed that the rabbitmq didn't start on the controller.

Attempt to update the overcloud with:
openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e <file>
failed.

Manually started the rabbitmq with ' pcs resource debug-start rabbitmq '

Then re-ran:
openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e <file>  - completed successfully.



Expected result:
The deployment should complete.

Comment 3 Alexander Chuzhoy 2015-07-27 19:58:38 UTC
Created attachment 1056736 [details]
logs from the controller.

Comment 4 Alexander Chuzhoy 2015-07-27 20:05:23 UTC
Created attachment 1056738 [details]
heat logs from the undercloud machine.

Comment 5 John Eckersberg 2015-07-27 21:13:26 UTC
Looking at the rabbitmq logs, it appears it only started one time.  E.g. there wasn't a failed start and then a successful start, it never started at all as expected the first time.  When it did start, it was up and accepting connections within two seconds:

=INFO REPORT==== 27-Jul-2015::13:34:57 ===
Starting RabbitMQ 3.3.5 on Erlang R16B03-1

[snip ...]

=INFO REPORT==== 27-Jul-2015::13:34:59 ===
Server startup complete; 0 plugins started.

=INFO REPORT==== 27-Jul-2015::13:34:59 ===
accepting AMQP connection <0.295.0> (10.19.94.12:37716 -> 10.19.94.12:5672)

Comment 6 Ryan Brown 2015-07-28 19:14:15 UTC
If you look earlier in the log (around 10:40) you can see in the pacemaker logs that it attempts to start rabbit then stops after concluding "rabbit:0 has failed INFINITY times". 

Does that mean pacemaker is attempting to start rabbit but the request never gets far enough for the rabbit daemon to start/log a failure?

Comment 7 Ronelle Landy 2015-07-29 21:16:37 UTC
We just hit this in CI - took forever for the overcloud deployment to fail.

Comment 8 Marios Andreou 2015-07-31 15:21:27 UTC
Couldn't recreate this today on a virt env... has it only been seen on BM?

Comment 10 chris alfonso 2015-08-21 16:25:56 UTC
Sasha, is this only on baremetal, and does it happen every time?

Comment 11 Alexander Chuzhoy 2015-08-21 16:28:45 UTC
Chris,
this doesn't happen every time. 
Didn't see it happening on VM, though I don't think it matters.

Comment 12 Mike Burns 2015-10-15 11:44:24 UTC
I suspect this won't happen anymore, so marking it testonly.

Comment 14 Alexander Chuzhoy 2015-11-30 23:07:38 UTC
Verified:

Environment:
instack-0.0.7-2.el7ost.noarch
instack-undercloud-2.1.2-34.el7ost.noarch


Ran an overcloud depoyment: "openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 --compute-flavor compute --control-flavor control --ceph-storage-flavor ceph"


Completed successfully.
The rabbitmq is UP.

Comment 16 errata-xmlrpc 2015-12-21 16:55:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:2651