Bug 1468256
Summary: | rhosp-director: HA Overcloud deployment with SSL fails: Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> |
Component: | puppet-tripleo | Assignee: | RHOS Maint <rhos-maint> |
Status: | CLOSED ERRATA | QA Contact: | Alexander Chuzhoy <sasha> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 12.0 (Pike) | CC: | aschultz, dbecker, dprince, jjoyce, jschluet, m.andre, mburns, mcornea, morazi, ohochman, rhel-osp-director-maint, slinaber, tvignaud |
Target Milestone: | beta | Keywords: | Triaged |
Target Release: | 12.0 (Pike) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | puppet-tripleo-7.1.1-0.20170715004705.el7ost openstack-tripleo-heat-templates-7.0.0-0.20170715081739.el7ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-12-13 21:39:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Alexander Chuzhoy
2017-07-06 13:26:35 UTC
Retried the deployment including /home/stack/tripleo-heat-templates/environments/low-memory-usage.yaml Same result. So in the past the db-sync processes are very sensitive to IO performance of the underlying disks. If the database is containerized and the environment is on a VM this may cause problems. That being said, Sasha mentioned that this only seems to be when ssl is enabled, so I'm also wondering about the performance of the database if TLS is enabled. Might want to check that as well. In the past we usually hit this with nova or neutron syncs so I'm not sure if heat/cinder db sync timeouts are touched by the setting in low-memory-usage.yaml. With OSP12 Heat should be executing the 'heat-manage db_sync' command via docker-cmd like this: http://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/docker/services/heat-engine.yaml#n110 The stack trace here shows that it is a Puppet resources that is failing. I would like to understand more about why this is happening since Puppet should not be trying to execute the DB syncs unless Heat is running on barematal. Took a look with sasha at the raw puppet manifest which is failing at step 3 during deployment. It shows this is included in the deployment: include ::tripleo::profile::base::heat::api_cloudwatch --- AFAIK the cloudwatch API is deprecated. We haven't containerized it, nor do we have plans to I think. So perhaps this is something we need to "stub out" for the containerized effort so that users including the old cloudwatch role get handled gracefully for containers? Tried few more times to deploy with and without SSL. HA deployment constantly fails with the same error with SSL and successfully passes without SSL. We've debugged with Damien and Omri and identified that haproxy container fails to start because it's missing /etc/pki/tls/private/overcloud_endpoint.pem. We need to add the bind mount to puppet-tripleo similar to what https://review.openstack.org/#/c/473854/ does for the non-ha case. All fixes merged upstream. Verified: Environment: puppet-tripleo-7.4.2-0.20171007035632.195db7c.el7ost.noarch openstack-tripleo-heat-templates-7.0.2-0.20171007062244.el7ost.noarch The reported issue doesn't reproduce. Was able to deploy overcloud with SSL. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462 |