rhel-osp-director: rabbitmq doesn't start on the controller: deployment consists of single controller+one compute+ceph node Environment: rabbitmq-server-3.3.5-3.el7ost.noarch instack-undercloud-2.1.2-22.el7ost.noarch Steps to reproduce: 1. Attempt to deploy overcloud consisting of 1 controller +1 compute +1 ceph node. Result: The deployment fails. Debugging the machine revealed that the rabbitmq didn't start on the controller. Attempt to update the overcloud with: openstack overcloud deploy --plan overcloud --control-scale 1 --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e <file> failed. Manually started the rabbitmq with ' pcs resource debug-start rabbitmq ' Then re-ran: openstack overcloud deploy --plan overcloud --control-scale 1 --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e <file> - completed successfully. Expected result: The deployment should complete.
Created attachment 1056736 [details] logs from the controller.
Created attachment 1056738 [details] heat logs from the undercloud machine.
Looking at the rabbitmq logs, it appears it only started one time. E.g. there wasn't a failed start and then a successful start, it never started at all as expected the first time. When it did start, it was up and accepting connections within two seconds: =INFO REPORT==== 27-Jul-2015::13:34:57 === Starting RabbitMQ 3.3.5 on Erlang R16B03-1 [snip ...] =INFO REPORT==== 27-Jul-2015::13:34:59 === Server startup complete; 0 plugins started. =INFO REPORT==== 27-Jul-2015::13:34:59 === accepting AMQP connection <0.295.0> (10.19.94.12:37716 -> 10.19.94.12:5672)
If you look earlier in the log (around 10:40) you can see in the pacemaker logs that it attempts to start rabbit then stops after concluding "rabbit:0 has failed INFINITY times". Does that mean pacemaker is attempting to start rabbit but the request never gets far enough for the rabbit daemon to start/log a failure?
We just hit this in CI - took forever for the overcloud deployment to fail.
Couldn't recreate this today on a virt env... has it only been seen on BM?
Sasha, is this only on baremetal, and does it happen every time?
Chris, this doesn't happen every time. Didn't see it happening on VM, though I don't think it matters.
I suspect this won't happen anymore, so marking it testonly.
Verified: Environment: instack-0.0.7-2.el7ost.noarch instack-undercloud-2.1.2-34.el7ost.noarch Ran an overcloud depoyment: "openstack overcloud deploy --plan overcloud --control-scale 1 --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 --compute-flavor compute --control-flavor control --ceph-storage-flavor ceph" Completed successfully. The rabbitmq is UP.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2015:2651