Bug 1247358

Summary:

rhel-osp-director: rabbitmq doesn't start on the controller: deployment consists of single controller+one compute+ceph node

Product:

Red Hat OpenStack

Reporter:

Alexander Chuzhoy <sasha>

Component:

rhosp-director

Assignee:

Mike Burns <mburns>

Status:

CLOSED ERRATA

QA Contact:

Alexander Chuzhoy <sasha>

Severity:

high

Docs Contact:

Priority:

high

Version:

unspecified

CC:

calfonso, dmacpher, jeckersb, mandreou, mburns, rhel-osp-director-maint, rlandy, sasha

Target Milestone:

Keywords:

TestOnly, Triaged, ZStream

Target Release:

7.0 (Kilo)

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Known Issue

Doc Text:

In rare cases, RabbitMQ fails to start on deployment. As a workaround, manually start RabbitMQ on nodes: [stack@director ~]$ ssh heat-admin@192.168.0.20 [heat-admin@overcloud-controller-0 ~]$ pcs resource debug-start rabbitmq Then rerun the deployment command on the director. The deployment now succeeds.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-12-21 16:55:03 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
logs from the controller.	none
heat logs from the undercloud machine.	none

Description Alexander Chuzhoy 2015-07-27 19:53:55 UTC

rhel-osp-director: rabbitmq doesn't start on the controller: deployment consists of single controller+one compute+ceph node

Environment:
rabbitmq-server-3.3.5-3.el7ost.noarch
instack-undercloud-2.1.2-22.el7ost.noarch

Steps to reproduce:
1. Attempt to deploy overcloud consisting of 1 controller +1 compute +1 ceph node.

Result:
The deployment fails.
Debugging the machine revealed that the rabbitmq didn't start on the controller.

Attempt to update the overcloud with:
openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e <file>
failed.

Manually started the rabbitmq with ' pcs resource debug-start rabbitmq '

Then re-ran:
openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e <file>  - completed successfully.



Expected result:
The deployment should complete.

Comment 3 Alexander Chuzhoy 2015-07-27 19:58:38 UTC

Created attachment 1056736 [details]
logs from the controller.

Comment 4 Alexander Chuzhoy 2015-07-27 20:05:23 UTC

Created attachment 1056738 [details]
heat logs from the undercloud machine.

Comment 5 John Eckersberg 2015-07-27 21:13:26 UTC

Looking at the rabbitmq logs, it appears it only started one time.  E.g. there wasn't a failed start and then a successful start, it never started at all as expected the first time.  When it did start, it was up and accepting connections within two seconds:

=INFO REPORT==== 27-Jul-2015::13:34:57 ===
Starting RabbitMQ 3.3.5 on Erlang R16B03-1

[snip ...]

=INFO REPORT==== 27-Jul-2015::13:34:59 ===
Server startup complete; 0 plugins started.

=INFO REPORT==== 27-Jul-2015::13:34:59 ===
accepting AMQP connection <0.295.0> (10.19.94.12:37716 -> 10.19.94.12:5672)

Comment 6 Ryan Brown 2015-07-28 19:14:15 UTC

If you look earlier in the log (around 10:40) you can see in the pacemaker logs that it attempts to start rabbit then stops after concluding "rabbit:0 has failed INFINITY times". 

Does that mean pacemaker is attempting to start rabbit but the request never gets far enough for the rabbit daemon to start/log a failure?

Comment 7 Ronelle Landy 2015-07-29 21:16:37 UTC

We just hit this in CI - took forever for the overcloud deployment to fail.

Comment 8 Marios Andreou 2015-07-31 15:21:27 UTC

Couldn't recreate this today on a virt env... has it only been seen on BM?

Comment 10 chris alfonso 2015-08-21 16:25:56 UTC

Sasha, is this only on baremetal, and does it happen every time?

Comment 11 Alexander Chuzhoy 2015-08-21 16:28:45 UTC

Chris,
this doesn't happen every time. 
Didn't see it happening on VM, though I don't think it matters.

Comment 12 Mike Burns 2015-10-15 11:44:24 UTC

I suspect this won't happen anymore, so marking it testonly.

Comment 14 Alexander Chuzhoy 2015-11-30 23:07:38 UTC

Verified:

Environment:
instack-0.0.7-2.el7ost.noarch
instack-undercloud-2.1.2-34.el7ost.noarch


Ran an overcloud depoyment: "openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 --compute-flavor compute --control-flavor control --ceph-storage-flavor ceph"


Completed successfully.
The rabbitmq is UP.

Comment 16 errata-xmlrpc 2015-12-21 16:55:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:2651