Bug 1258192
| Summary: | heat non-functional in pacemaker cluster deployed by OSP 7 director | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | jliberma <jliberma> |
| Component: | rhosp-director | Assignee: | James Slagle <jslagle> |
| Status: | CLOSED ERRATA | QA Contact: | Amit Ugol <augol> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 7.0 (Kilo) | CC: | ebagdasa, hbrock, jcoufal, mburns, rhel-osp-director-maint |
| Target Milestone: | ga | Keywords: | TestOnly, Triaged |
| Target Release: | 8.0 (Liberty) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-04-07 21:39:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
jliberma@redhat.com
2015-08-30 06:37:28 UTC
More investigation:
Deployed and then changed
openstack-config --set /etc/heat/heat.conf DEFAULT engine_life_check_timeout 30
openstack-config --set /etc/heat/heat.conf DEFAULT rpc_response_timeout 600
openstack-config --set /etc/heat/heat.conf DEFAULT debug true
on controllers and restarted heat-{engine,api}
Heat stack-create successful but 2 of 6 instances do not execute cloud-init, no ssh key injection. After create completes heat stack-list returns:
ERROR: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
Error in /var/log/messages on compute nodes:
Aug 31 00:52:40 localhost journal: internal error: missing storage backend for network files using rbd protocol
Aug 31 00:52:40 localhost ceilometer-agent-compute: libvirt: Storage Driver error : internal error: missing storage backend for network files using rbd protocol
Problem may be related to ephemeral storage on ceph.
Also numerous AMQP errors in /var/log/nova/nova-compute.log on compute nodes:
2015-08-31 00:07:01.718 17670 ERROR oslo_messaging._drivers.impl_rabbit [req-47f2e2fc-ab09-4324-b4cd-5f5a8c5743fa - - - - -] AMQP server on 172.16.1.18:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-31 00:07:02.741 17670 ERROR oslo_messaging._drivers.impl_rabbit [req-47f2e2fc-ab09-4324-b4cd-5f5a8c5743fa - - - - -] AMQP server on 172.16.1.17:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 2 seconds.
2015-08-31 00:07:04.760 17670 ERROR oslo_messaging._drivers.impl_rabbit [req-47f2e2fc-ab09-4324-b4cd-5f5a8c5743fa - - - - -] AMQP server on 172.16.1.17:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-31 00:07:05.778 17670 ERROR oslo_messaging._drivers.impl_rabbit [req-47f2e2fc-ab09-4324-b4cd-5f5a8c5743fa - - - - -] AMQP server on 172.16.1.17:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
However, rabbitmq seems to be running and other services are not affected.
Redeploying without ceph to test. My previous deployments with single pacemaker controller and LVM backend are 100% successful.
asked slagle to test in scale lab with my eap 6 nested heat templates zaneb asked me to try increasing the HAproxy connection timeout values along with the heat parameters. 1. deployed overcloud 2. configured the following on all controller nodes: sed -i "/heat/a \ timeout connect 30s" /etc/haproxy/haproxy.cfg openstack-config --set /etc/heat/heat.conf DEFAULT engine_life_check_timeout 30 openstack-config --set /etc/heat/heat.conf DEFAULT rpc_response_timeout 600 openstack-config --set /etc/heat/heat.conf DEFAULT verbose true openstack-config --get /etc/heat/heat.conf DEFAULT engine_life_check_timeout openstack-config --get /etc/heat/heat.conf DEFAULT rpc_response_timeout openstack-config --get /etc/heat/heat.conf DEFAULT verbose pcs resource restart haproxy-clone pcs resource restart openstack-heat-api-clone pcs resource restart openstack-heat-engine-clone 3. deployed EAP6 stack, failed with same unreachable errors [stack@rhos0 ~(demo_member)]$ source demorc [stack@rhos0 ~(demo_member)]$ heat stack-list ERROR: <html><body><h1>504 Gateway Time-out</h1> The server didn't respond in time. </body></html> unable to reproduce Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0604.html |