Bug 1258192
Summary: | heat non-functional in pacemaker cluster deployed by OSP 7 director | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | jliberma <jliberma> |
Component: | rhosp-director | Assignee: | James Slagle <jslagle> |
Status: | CLOSED ERRATA | QA Contact: | Amit Ugol <augol> |
Severity: | unspecified | Docs Contact: | |
Priority: | urgent | ||
Version: | 7.0 (Kilo) | CC: | ebagdasa, hbrock, jcoufal, mburns, rhel-osp-director-maint |
Target Milestone: | ga | Keywords: | TestOnly, Triaged |
Target Release: | 8.0 (Liberty) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-04-07 21:39:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
jliberma@redhat.com
2015-08-30 06:37:28 UTC
More investigation: Deployed and then changed openstack-config --set /etc/heat/heat.conf DEFAULT engine_life_check_timeout 30 openstack-config --set /etc/heat/heat.conf DEFAULT rpc_response_timeout 600 openstack-config --set /etc/heat/heat.conf DEFAULT debug true on controllers and restarted heat-{engine,api} Heat stack-create successful but 2 of 6 instances do not execute cloud-init, no ssh key injection. After create completes heat stack-list returns: ERROR: <html><body><h1>504 Gateway Time-out</h1> The server didn't respond in time. </body></html> Error in /var/log/messages on compute nodes: Aug 31 00:52:40 localhost journal: internal error: missing storage backend for network files using rbd protocol Aug 31 00:52:40 localhost ceilometer-agent-compute: libvirt: Storage Driver error : internal error: missing storage backend for network files using rbd protocol Problem may be related to ephemeral storage on ceph. Also numerous AMQP errors in /var/log/nova/nova-compute.log on compute nodes: 2015-08-31 00:07:01.718 17670 ERROR oslo_messaging._drivers.impl_rabbit [req-47f2e2fc-ab09-4324-b4cd-5f5a8c5743fa - - - - -] AMQP server on 172.16.1.18:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds. 2015-08-31 00:07:02.741 17670 ERROR oslo_messaging._drivers.impl_rabbit [req-47f2e2fc-ab09-4324-b4cd-5f5a8c5743fa - - - - -] AMQP server on 172.16.1.17:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 2 seconds. 2015-08-31 00:07:04.760 17670 ERROR oslo_messaging._drivers.impl_rabbit [req-47f2e2fc-ab09-4324-b4cd-5f5a8c5743fa - - - - -] AMQP server on 172.16.1.17:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds. 2015-08-31 00:07:05.778 17670 ERROR oslo_messaging._drivers.impl_rabbit [req-47f2e2fc-ab09-4324-b4cd-5f5a8c5743fa - - - - -] AMQP server on 172.16.1.17:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds. However, rabbitmq seems to be running and other services are not affected. Redeploying without ceph to test. My previous deployments with single pacemaker controller and LVM backend are 100% successful. asked slagle to test in scale lab with my eap 6 nested heat templates zaneb asked me to try increasing the HAproxy connection timeout values along with the heat parameters. 1. deployed overcloud 2. configured the following on all controller nodes: sed -i "/heat/a \ timeout connect 30s" /etc/haproxy/haproxy.cfg openstack-config --set /etc/heat/heat.conf DEFAULT engine_life_check_timeout 30 openstack-config --set /etc/heat/heat.conf DEFAULT rpc_response_timeout 600 openstack-config --set /etc/heat/heat.conf DEFAULT verbose true openstack-config --get /etc/heat/heat.conf DEFAULT engine_life_check_timeout openstack-config --get /etc/heat/heat.conf DEFAULT rpc_response_timeout openstack-config --get /etc/heat/heat.conf DEFAULT verbose pcs resource restart haproxy-clone pcs resource restart openstack-heat-api-clone pcs resource restart openstack-heat-engine-clone 3. deployed EAP6 stack, failed with same unreachable errors [stack@rhos0 ~(demo_member)]$ source demorc [stack@rhos0 ~(demo_member)]$ heat stack-list ERROR: <html><body><h1>504 Gateway Time-out</h1> The server didn't respond in time. </body></html> unable to reproduce Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0604.html |