Bug 1468256 - rhosp-director: HA Overcloud deployment with SSL fails: Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout
rhosp-director: HA Overcloud deployment with SSL fails: Error: /Stage[main]/H...
Status: ON_QA
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo (Show other bugs)
12.0 (Pike)
Unspecified Unspecified
urgent Severity high
: rc
: 12.0 (Pike)
Assigned To: RHOS Maint
Alexander Chuzhoy
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-06 09:26 EDT by Alexander Chuzhoy
Modified: 2017-09-05 20:36 EDT (History)
13 users (show)

See Also:
Fixed In Version: puppet-tripleo-7.1.1-0.20170715004705.el7ost openstack-tripleo-heat-templates-7.0.0-0.20170715081739.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 482102 None None None 2017-07-10 09:17 EDT
OpenStack gerrit 482135 None None None 2017-07-10 09:16 EDT
OpenStack gerrit 483325 None None None 2017-07-17 10:00 EDT

  None (edit)
Description Alexander Chuzhoy 2017-07-06 09:26:35 EDT
rhosp-director: HA Overcloud deployment with SSL fails: Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout

Environment:
openstack-tripleo-heat-templates-7.0.0-0.20170628002128.el7ost.noarch
openstack-puppet-modules-10.0.0-0.20170315222135.0333c73.el7.1.noarch
instack-undercloud-7.1.1-0.20170623182135.el7ost.noarch


Steps to reproduce:
Attempt an HA  deployment with SSL.


Result:
The deployment fails:

After breaking the very long one line output into many lines and grepping it for errors:
Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... TASK [Write the config_step hieradata] clouds.yaml compute.yaml container_images.yaml controller.yaml core_puddle_version customization.yaml debug.yaml enable-tls.yaml errors errors2 f hascript.sh inject-trust-anchor.yaml instackenv.json ironic-python-agent.initramfs ironic-python-agent.kernel martin network-environment.yaml overcloud_deploy.sh overcloud-full.initrd overcloud-full.qcow2 overcloud-full-rpm.manifest overcloud-full-signature.manifest overcloud-full.vmlinuz overcloudrc overcloudrc.v3 pacemaker r roles sasha stackrc tempest-deployer-input.conf tripleo tripleo-heat-templates undercloud.conf undercloud_deploy.sh undercloud_install.log undercloud-passwords.conf upgrade changed: [localhost] TASK [Run puppet host configuration for step 3] clouds.yaml compute.yaml container_images.yaml controller.yaml core_puddle_version customization.yaml debug.yaml enable-tls.yaml errors errors2 f hascript.sh inject-trust-anchor.yaml instackenv.json ironic-python-agent.initramfs ironic-python-agent.kernel martin network-environment.yaml overcloud_deploy.sh overcloud-full.initrd overcloud-full.qcow2 overcloud-full-rpm.manifest overcloud-full-signature.manifest overcloud-full.vmlinuz overcloudrc overcloudrc.v3 pacemaker r roles sasha stackrc tempest-deployer-input.conf tripleo tripleo-heat-templates undercloud.conf undercloud_deploy.sh undercloud_install.log undercloud-passwords.conf upgrade fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "/usr/bin/timeout -s 9 30m /usr/bin/puppet apply --detailed-exitcodes --no-noop /var/lib/tripleo-config/puppet_step_config.pp failed with return code: 6", "rc": 6, "stderr": "exception: connect failed
Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout
Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout
Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout", "
Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout"], "stdout": "Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend
Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Failed to call refresh: Command exceeded timeout
Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Command exceeded timeout
Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout
Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout
Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Failed to call refresh: Command exceeded timeout", "
Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Command exceeded timeout", "
Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout", "
Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout"], "stdout": "Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend
Error: resources[2]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... TASK [Write the config_step hieradata] clouds.yaml compute.yaml container_images.yaml controller.yaml core_puddle_version customization.yaml debug.yaml enable-tls.yaml errors errors2 f hascript.sh inject-trust-anchor.yaml instackenv.json ironic-python-agent.initramfs ironic-python-agent.kernel martin network-environment.yaml overcloud_deploy.sh overcloud-full.initrd overcloud-full.qcow2 overcloud-full-rpm.manifest overcloud-full-signature.manifest overcloud-full.vmlinuz overcloudrc overcloudrc.v3 pacemaker r roles sasha stackrc tempest-deployer-input.conf tripleo tripleo-heat-templates undercloud.conf undercloud_deploy.sh undercloud_install.log undercloud-passwords.conf upgrade changed: [localhost] TASK [Run puppet host configuration for step 3] clouds.yaml compute.yaml container_images.yaml controller.yaml core_puddle_version customization.yaml debug.yaml enable-tls.yaml errors errors2 f hascript.sh inject-trust-anchor.yaml instackenv.json ironic-python-agent.initramfs ironic-python-agent.kernel martin network-environment.yaml overcloud_deploy.sh overcloud-full.initrd overcloud-full.qcow2 overcloud-full-rpm.manifest overcloud-full-signature.manifest overcloud-full.vmlinuz overcloudrc overcloudrc.v3 pacemaker r roles sasha stackrc tempest-deployer-input.conf tripleo tripleo-heat-templates undercloud.conf undercloud_deploy.sh undercloud_install.log undercloud-passwords.conf upgrade fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "/usr/bin/timeout -s 9 30m /usr/bin/puppet apply --detailed-exitcodes --no-noop /var/lib/tripleo-config/puppet_step_config.pp failed with return code: 6", "rc": 6, "stderr": "exception: connect failed
Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout
Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout
Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout", "
Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout"], "stdout": "Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend







Checking os-collect-config on controller:
Jul 06 03:04:25 overcloud-controller-0.redhat.local os-collect-config[3169]: module list --tree' to see information about modules\n   (file & line not available)\u001b[0m\n\u001b[1;33mWarning: ModuleLoader: module 'mysql' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules\n   (file & line not available)\u001b[0m\n\u001b[1;31mError: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Failed to call refresh: Command exceeded timeout\u001b[0m\n\u001b[1;31mError: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Command exceeded timeout\u001b[0m\n\u001b[1;31mError: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout\u001b[0m\n\u001b[1;31mError: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout\u001b[0m\n", "stderr_lines": ["exception: connect failed", "\u001b[1;33mWarning: Facter: Could not retrieve fact='rabbitmq_nodename', resolution='<anonymous>': undefined method `[]' for nil:NilClass\u001b[0m", "\u001b[1;33mWarning: Undefined variable 'deploy_config_name'; ", "   (file & line not available)\u001b[0m", "\u001b[1;33mWarning: ModuleLoader: module 'openstacklib' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules", "   (file & line not available)\u001b[0m", "\u001b[1;33mWarning: This method is deprecated, please use the stdlib validate_legacy function, with Pattern[]. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/cinder/manifests/db.pp\", 64]:[\"/etc/puppet/modules/cinder/manifests/init.pp\", 385]", "   (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')\u001b[0m", "\u001b[1;33mWarning: Scope(Class[Cinder]): host is deprecated, has no effect and will be removed in a future release, use backend_host instead\u001b[0m", "\u001b[1;33mWarning: Scope(Class[Cinder]): cinder::rabbit_host, cinder::rabbit_hosts, cinder::rabbit_password, cinder::rabbit_port, cinder
Jul 06 03:04:25 overcloud-controller-0.redhat.local os-collect-config[3169]: e_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 76]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", "   (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')\u001b[0m", "\u001b[1;33mWarning: ModuleLoader: module 'ssh' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules", "   (file & line not available)\u001b[0m", "\u001b[1;33mWarning: ModuleLoader: module 'timezone' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules", "   (file & line not available)\u001b[0m", "\u001b[1;33mWarning: ModuleLoader: module 'mysql' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules", "   (file & line not available)\u001b[0m", "\u001b[1;31mError: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Failed to call refresh: Command exceeded timeout\u001b[0m", "\u001b[1;31mError: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Command exceeded timeout\u001b[0m", "\u001b[1;31mError: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Command exceeded timeout\u001b[0m", "\u001b[1;31mError: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Command exceeded timeout\u001b[0m"], "stdout": "\u001b[mNotice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend\u001b[0m\n\u001b[mNotice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend\u001b[0m\n\u001b[mNotice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.\u001b[0m\n\u001b[mNotice: Compiled catalog for overcloud-controller-0.redhat.local in environment production in 5.32 seconds\u001b[0m\n\u001b[mNotice: /Stage[main]/Cinder/Cinder_config[DEFAULT/api_paste_config]/ensure: created\u00
Jul 06 03:04:25 overcloud-controller-0.redhat.local os-collect-config[3169]: [2017-07-06 03:04:25,638] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-ansible/793083e2-ffdd-4c9d-9a8e-3293d80312a9_playbook.yaml. [2]
Jul 06 03:04:27 overcloud-controller-0.redhat.local os-collect-config[3169]: [2017-07-06 03:04:27,080] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None



[root@overcloud-controller-0 ~]# cat /var/lib/heat-config/heat-config-ansible/793083e2-ffdd-4c9d-9a8e-3293d80312a9_playbook.yaml
- hosts: localhost
  connection: local
  tasks:
    #####################################################
    # Per step puppet configuration of the baremetal host
    #####################################################
    - name: Write the config_step hieradata
      copy: content="{{dict(step=step|int)|to_json}}" dest=/etc/puppet/hieradata/config_step.json force=true
    - name: Run puppet host configuration for step {{step}}
      # FIXME: modulepath requires ansible 2.4, our builds currently only have 2.3
      # puppet: manifest=/var/lib/tripleo-config/puppet_step_config.pp modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules
      puppet: manifest=/var/lib/tripleo-config/puppet_step_config.pp
    ######################################
    # Generate config via docker-puppet.py
    ######################################
    - name: Run docker-puppet tasks (generate config)
      shell: python /var/lib/docker-puppet/docker-puppet.py
      environment:
        NET_HOST: 'true'
        DEBUG: '{{docker_puppet_debug}}'
      when: step == "1"
      changed_when: false
      check_mode: no
    ##################################################
    # Per step starting of the containers using paunch
    ##################################################
    - name: Check if /var/lib/hashed-tripleo-config/docker-container-startup-config-step_{{step}}.json exists
      stat:
        path: /var/lib/tripleo-config/hashed-docker-container-startup-config-step_{{step}}.json
      register: docker_config_json
    # Note docker-puppet.py generates the hashed-*.json file, which is a copy of
    # the *step_n.json with a hash of the generated external config added
    # This acts as a salt to enable restarting the container if config changes
    - name: Start containers for step {{step}}
      command: paunch --debug apply --file /var/lib/tripleo-config/hashed-docker-container-startup-config-step_{{step}}.json --config-id tripleo_step{{step}} --managed-by tripleo-{{role_name}}
      when: docker_config_json.stat.exists
      changed_when: false
      check_mode: no
    ########################################################
    # Bootstrap tasks, only performed on bootstrap_server_id
    ########################################################
    - name: Run docker-puppet tasks (bootstrap tasks)
      shell: python /var/lib/docker-puppet/docker-puppet.py
      environment:
        CONFIG: /var/lib/docker-puppet/docker-puppet-tasks{{step}}.json
        NET_HOST: "true"
        NO_ARCHIVE: "true"
        STEP: "{{step}}"
      when: deploy_server_id == bootstrap_server_id
      changed_when: false
      check_mode: no
Comment 2 Alexander Chuzhoy 2017-07-06 11:25:14 EDT
Retried the deployment including /home/stack/tripleo-heat-templates/environments/low-memory-usage.yaml


Same result.
Comment 3 Alex Schultz 2017-07-06 11:51:05 EDT
So in the past the db-sync processes are very sensitive to IO performance of the underlying disks. If the database is containerized and the environment is on a VM this may cause problems. That being said, Sasha mentioned that this only seems to be when ssl is enabled, so I'm also wondering about the performance of the database if TLS is enabled. Might want to check that as well.  In the past we usually hit this with nova or neutron syncs so I'm not sure if heat/cinder db sync timeouts are touched by the setting in low-memory-usage.yaml.
Comment 4 Dan Prince 2017-07-06 13:36:54 EDT
With OSP12 Heat should be executing the 'heat-manage db_sync' command via docker-cmd like this:

http://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/docker/services/heat-engine.yaml#n110

The stack trace here shows that it is a Puppet resources that is failing. I would like to understand more about why this is happening since Puppet should not be trying to execute the DB syncs unless Heat is running on barematal.
Comment 5 Dan Prince 2017-07-06 14:27:10 EDT
Took a look with sasha at the raw puppet manifest which is failing at step 3 during deployment. It shows this is included in the deployment:

include ::tripleo::profile::base::heat::api_cloudwatch

---

AFAIK the cloudwatch API is deprecated. We haven't containerized it, nor do we have plans to I think. So perhaps this is something we need to "stub out" for the containerized effort so that users including the old cloudwatch role get handled gracefully for containers?
Comment 6 Alexander Chuzhoy 2017-07-06 17:11:39 EDT
Tried few more times to deploy with and without SSL.

HA deployment constantly fails with the same error with SSL and successfully passes without SSL.
Comment 7 Martin André 2017-07-10 04:27:01 EDT
We've debugged with Damien and Omri and identified that haproxy container fails to start because it's missing /etc/pki/tls/private/overcloud_endpoint.pem. We need to add the bind mount to puppet-tripleo similar to what https://review.openstack.org/#/c/473854/ does for the non-ha case.
Comment 8 Martin André 2017-07-17 10:00:46 EDT
All fixes merged upstream.

Note You need to log in before you can comment on or make changes to this bug.