Bug 1247358
| Summary: | rhel-osp-director: rabbitmq doesn't start on the controller: deployment consists of single controller+one compute+ceph node | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> | ||||||
| Component: | rhosp-director | Assignee: | Mike Burns <mburns> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Alexander Chuzhoy <sasha> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | unspecified | CC: | calfonso, dmacpher, jeckersb, mandreou, mburns, rhel-osp-director-maint, rlandy, sasha | ||||||
| Target Milestone: | y2 | Keywords: | TestOnly, Triaged, ZStream | ||||||
| Target Release: | 7.0 (Kilo) | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Known Issue | |||||||
| Doc Text: |
In rare cases, RabbitMQ fails to start on deployment. As a workaround, manually start RabbitMQ on nodes:
[stack@director ~]$ ssh heat-admin@192.168.0.20
[heat-admin@overcloud-controller-0 ~]$ pcs resource debug-start rabbitmq
Then rerun the deployment command on the director. The deployment now succeeds.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2015-12-21 16:55:03 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Alexander Chuzhoy
2015-07-27 19:53:55 UTC
Created attachment 1056736 [details]
logs from the controller.
Created attachment 1056738 [details]
heat logs from the undercloud machine.
Looking at the rabbitmq logs, it appears it only started one time. E.g. there wasn't a failed start and then a successful start, it never started at all as expected the first time. When it did start, it was up and accepting connections within two seconds: =INFO REPORT==== 27-Jul-2015::13:34:57 === Starting RabbitMQ 3.3.5 on Erlang R16B03-1 [snip ...] =INFO REPORT==== 27-Jul-2015::13:34:59 === Server startup complete; 0 plugins started. =INFO REPORT==== 27-Jul-2015::13:34:59 === accepting AMQP connection <0.295.0> (10.19.94.12:37716 -> 10.19.94.12:5672) If you look earlier in the log (around 10:40) you can see in the pacemaker logs that it attempts to start rabbit then stops after concluding "rabbit:0 has failed INFINITY times". Does that mean pacemaker is attempting to start rabbit but the request never gets far enough for the rabbit daemon to start/log a failure? We just hit this in CI - took forever for the overcloud deployment to fail. Couldn't recreate this today on a virt env... has it only been seen on BM? Sasha, is this only on baremetal, and does it happen every time? Chris, this doesn't happen every time. Didn't see it happening on VM, though I don't think it matters. I suspect this won't happen anymore, so marking it testonly. Verified: Environment: instack-0.0.7-2.el7ost.noarch instack-undercloud-2.1.2-34.el7ost.noarch Ran an overcloud depoyment: "openstack overcloud deploy --plan overcloud --control-scale 1 --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 --compute-flavor compute --control-flavor control --ceph-storage-flavor ceph" Completed successfully. The rabbitmq is UP. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2015:2651 |