rhosp-director: HA Overcloud deployment with SSL fails: Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout Environment: openstack-tripleo-heat-templates-7.0.0-0.20170628002128.el7ost.noarch openstack-puppet-modules-10.0.0-0.20170315222135.0333c73.el7.1.noarch instack-undercloud-7.1.1-0.20170623182135.el7ost.noarch Steps to reproduce: Attempt an HA deployment with SSL. Result: The deployment fails: After breaking the very long one line output into many lines and grepping it for errors: Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... TASK [Write the config_step hieradata] clouds.yaml compute.yaml container_images.yaml controller.yaml core_puddle_version customization.yaml debug.yaml enable-tls.yaml errors errors2 f hascript.sh inject-trust-anchor.yaml instackenv.json ironic-python-agent.initramfs ironic-python-agent.kernel martin network-environment.yaml overcloud_deploy.sh overcloud-full.initrd overcloud-full.qcow2 overcloud-full-rpm.manifest overcloud-full-signature.manifest overcloud-full.vmlinuz overcloudrc overcloudrc.v3 pacemaker r roles sasha stackrc tempest-deployer-input.conf tripleo tripleo-heat-templates undercloud.conf undercloud_deploy.sh undercloud_install.log undercloud-passwords.conf upgrade changed: [localhost] TASK [Run puppet host configuration for step 3] clouds.yaml compute.yaml container_images.yaml controller.yaml core_puddle_version customization.yaml debug.yaml enable-tls.yaml errors errors2 f hascript.sh inject-trust-anchor.yaml instackenv.json ironic-python-agent.initramfs ironic-python-agent.kernel martin network-environment.yaml overcloud_deploy.sh overcloud-full.initrd overcloud-full.qcow2 overcloud-full-rpm.manifest overcloud-full-signature.manifest overcloud-full.vmlinuz overcloudrc overcloudrc.v3 pacemaker r roles sasha stackrc tempest-deployer-input.conf tripleo tripleo-heat-templates undercloud.conf undercloud_deploy.sh undercloud_install.log undercloud-passwords.conf upgrade fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "/usr/bin/timeout -s 9 30m /usr/bin/puppet apply --detailed-exitcodes --no-noop /var/lib/tripleo-config/puppet_step_config.pp failed with return code: 6", "rc": 6, "stderr": "exception: connect failed Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout", " Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout"], "stdout": "Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Failed to call refresh: Command exceeded timeout Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Command exceeded timeout Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Failed to call refresh: Command exceeded timeout", " Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Command exceeded timeout", " Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout", " Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout"], "stdout": "Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend Error: resources[2]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... TASK [Write the config_step hieradata] clouds.yaml compute.yaml container_images.yaml controller.yaml core_puddle_version customization.yaml debug.yaml enable-tls.yaml errors errors2 f hascript.sh inject-trust-anchor.yaml instackenv.json ironic-python-agent.initramfs ironic-python-agent.kernel martin network-environment.yaml overcloud_deploy.sh overcloud-full.initrd overcloud-full.qcow2 overcloud-full-rpm.manifest overcloud-full-signature.manifest overcloud-full.vmlinuz overcloudrc overcloudrc.v3 pacemaker r roles sasha stackrc tempest-deployer-input.conf tripleo tripleo-heat-templates undercloud.conf undercloud_deploy.sh undercloud_install.log undercloud-passwords.conf upgrade changed: [localhost] TASK [Run puppet host configuration for step 3] clouds.yaml compute.yaml container_images.yaml controller.yaml core_puddle_version customization.yaml debug.yaml enable-tls.yaml errors errors2 f hascript.sh inject-trust-anchor.yaml instackenv.json ironic-python-agent.initramfs ironic-python-agent.kernel martin network-environment.yaml overcloud_deploy.sh overcloud-full.initrd overcloud-full.qcow2 overcloud-full-rpm.manifest overcloud-full-signature.manifest overcloud-full.vmlinuz overcloudrc overcloudrc.v3 pacemaker r roles sasha stackrc tempest-deployer-input.conf tripleo tripleo-heat-templates undercloud.conf undercloud_deploy.sh undercloud_install.log undercloud-passwords.conf upgrade fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "/usr/bin/timeout -s 9 30m /usr/bin/puppet apply --detailed-exitcodes --no-noop /var/lib/tripleo-config/puppet_step_config.pp failed with return code: 6", "rc": 6, "stderr": "exception: connect failed Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout", " Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout"], "stdout": "Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend Checking os-collect-config on controller: Jul 06 03:04:25 overcloud-controller-0.redhat.local os-collect-config[3169]: module list --tree' to see information about modules\n (file & line not available)\u001b[0m\n\u001b[1;33mWarning: ModuleLoader: module 'mysql' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules\n (file & line not available)\u001b[0m\n\u001b[1;31mError: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Failed to call refresh: Command exceeded timeout\u001b[0m\n\u001b[1;31mError: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Command exceeded timeout\u001b[0m\n\u001b[1;31mError: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout\u001b[0m\n\u001b[1;31mError: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout\u001b[0m\n", "stderr_lines": ["exception: connect failed", "\u001b[1;33mWarning: Facter: Could not retrieve fact='rabbitmq_nodename', resolution='<anonymous>': undefined method `[]' for nil:NilClass\u001b[0m", "\u001b[1;33mWarning: Undefined variable 'deploy_config_name'; ", " (file & line not available)\u001b[0m", "\u001b[1;33mWarning: ModuleLoader: module 'openstacklib' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules", " (file & line not available)\u001b[0m", "\u001b[1;33mWarning: This method is deprecated, please use the stdlib validate_legacy function, with Pattern[]. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/cinder/manifests/db.pp\", 64]:[\"/etc/puppet/modules/cinder/manifests/init.pp\", 385]", " (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')\u001b[0m", "\u001b[1;33mWarning: Scope(Class[Cinder]): host is deprecated, has no effect and will be removed in a future release, use backend_host instead\u001b[0m", "\u001b[1;33mWarning: Scope(Class[Cinder]): cinder::rabbit_host, cinder::rabbit_hosts, cinder::rabbit_password, cinder::rabbit_port, cinder Jul 06 03:04:25 overcloud-controller-0.redhat.local os-collect-config[3169]: e_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 76]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')\u001b[0m", "\u001b[1;33mWarning: ModuleLoader: module 'ssh' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules", " (file & line not available)\u001b[0m", "\u001b[1;33mWarning: ModuleLoader: module 'timezone' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules", " (file & line not available)\u001b[0m", "\u001b[1;33mWarning: ModuleLoader: module 'mysql' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules", " (file & line not available)\u001b[0m", "\u001b[1;31mError: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Failed to call refresh: Command exceeded timeout\u001b[0m", "\u001b[1;31mError: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Command exceeded timeout\u001b[0m", "\u001b[1;31mError: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout\u001b[0m", "\u001b[1;31mError: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout\u001b[0m"], "stdout": "\u001b[mNotice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend\u001b[0m\n\u001b[mNotice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend\u001b[0m\n\u001b[mNotice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.\u001b[0m\n\u001b[mNotice: Compiled catalog for overcloud-controller-0.redhat.local in environment production in 5.32 seconds\u001b[0m\n\u001b[mNotice: /Stage[main]/Cinder/Cinder_config[DEFAULT/api_paste_config]/ensure: created\u00 Jul 06 03:04:25 overcloud-controller-0.redhat.local os-collect-config[3169]: [2017-07-06 03:04:25,638] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-ansible/793083e2-ffdd-4c9d-9a8e-3293d80312a9_playbook.yaml. [2] Jul 06 03:04:27 overcloud-controller-0.redhat.local os-collect-config[3169]: [2017-07-06 03:04:27,080] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None [root@overcloud-controller-0 ~]# cat /var/lib/heat-config/heat-config-ansible/793083e2-ffdd-4c9d-9a8e-3293d80312a9_playbook.yaml - hosts: localhost connection: local tasks: ##################################################### # Per step puppet configuration of the baremetal host ##################################################### - name: Write the config_step hieradata copy: content="{{dict(step=step|int)|to_json}}" dest=/etc/puppet/hieradata/config_step.json force=true - name: Run puppet host configuration for step {{step}} # FIXME: modulepath requires ansible 2.4, our builds currently only have 2.3 # puppet: manifest=/var/lib/tripleo-config/puppet_step_config.pp modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules puppet: manifest=/var/lib/tripleo-config/puppet_step_config.pp ###################################### # Generate config via docker-puppet.py ###################################### - name: Run docker-puppet tasks (generate config) shell: python /var/lib/docker-puppet/docker-puppet.py environment: NET_HOST: 'true' DEBUG: '{{docker_puppet_debug}}' when: step == "1" changed_when: false check_mode: no ################################################## # Per step starting of the containers using paunch ################################################## - name: Check if /var/lib/hashed-tripleo-config/docker-container-startup-config-step_{{step}}.json exists stat: path: /var/lib/tripleo-config/hashed-docker-container-startup-config-step_{{step}}.json register: docker_config_json # Note docker-puppet.py generates the hashed-*.json file, which is a copy of # the *step_n.json with a hash of the generated external config added # This acts as a salt to enable restarting the container if config changes - name: Start containers for step {{step}} command: paunch --debug apply --file /var/lib/tripleo-config/hashed-docker-container-startup-config-step_{{step}}.json --config-id tripleo_step{{step}} --managed-by tripleo-{{role_name}} when: docker_config_json.stat.exists changed_when: false check_mode: no ######################################################## # Bootstrap tasks, only performed on bootstrap_server_id ######################################################## - name: Run docker-puppet tasks (bootstrap tasks) shell: python /var/lib/docker-puppet/docker-puppet.py environment: CONFIG: /var/lib/docker-puppet/docker-puppet-tasks{{step}}.json NET_HOST: "true" NO_ARCHIVE: "true" STEP: "{{step}}" when: deploy_server_id == bootstrap_server_id changed_when: false check_mode: no
Retried the deployment including /home/stack/tripleo-heat-templates/environments/low-memory-usage.yaml Same result.
So in the past the db-sync processes are very sensitive to IO performance of the underlying disks. If the database is containerized and the environment is on a VM this may cause problems. That being said, Sasha mentioned that this only seems to be when ssl is enabled, so I'm also wondering about the performance of the database if TLS is enabled. Might want to check that as well. In the past we usually hit this with nova or neutron syncs so I'm not sure if heat/cinder db sync timeouts are touched by the setting in low-memory-usage.yaml.
With OSP12 Heat should be executing the 'heat-manage db_sync' command via docker-cmd like this: http://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/docker/services/heat-engine.yaml#n110 The stack trace here shows that it is a Puppet resources that is failing. I would like to understand more about why this is happening since Puppet should not be trying to execute the DB syncs unless Heat is running on barematal.
Took a look with sasha at the raw puppet manifest which is failing at step 3 during deployment. It shows this is included in the deployment: include ::tripleo::profile::base::heat::api_cloudwatch --- AFAIK the cloudwatch API is deprecated. We haven't containerized it, nor do we have plans to I think. So perhaps this is something we need to "stub out" for the containerized effort so that users including the old cloudwatch role get handled gracefully for containers?
Tried few more times to deploy with and without SSL. HA deployment constantly fails with the same error with SSL and successfully passes without SSL.
We've debugged with Damien and Omri and identified that haproxy container fails to start because it's missing /etc/pki/tls/private/overcloud_endpoint.pem. We need to add the bind mount to puppet-tripleo similar to what https://review.openstack.org/#/c/473854/ does for the non-ha case.
All fixes merged upstream.
Verified: Environment: puppet-tripleo-7.4.2-0.20171007035632.195db7c.el7ost.noarch openstack-tripleo-heat-templates-7.0.2-0.20171007062244.el7ost.noarch The reported issue doesn't reproduce. Was able to deploy overcloud with SSL.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462