Description of problem: This is coming from a customer's case. I was able to reproduce the issue. On a clean and fresh OSP13 installation I followed the procedure to install and configure TLS on the overcloud: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/advanced_overcloud_customization/sect-enabling_ssltls_on_the_overcloud The deployment fails with this: 2019-04-11 16:35:51Z [overcloud-AllNodesDeploySteps-mue3k3oadw6g-CephStorageDeployment_Step3-c6abx2pr42ag]: UPDATE_COMPLETE Stack UPDATE completed successfully 2019-04-11 16:35:51Z [overcloud-AllNodesDeploySteps-mue3k3oadw6g.CephStorageDeployment_Step3]: UPDATE_COMPLETE state changed 2019-04-11 16:36:09Z [overcloud-AllNodesDeploySteps-mue3k3oadw6g-ControllerDeployment_Step3-s7kyvaruayay.2]: SIGNAL_IN_PROGRESS Signal: deployment f6dec962-30d0-4434-ab2b-895b813850a5 succeeded 2019-04-11 16:36:10Z [overcloud-AllNodesDeploySteps-mue3k3oadw6g-ControllerDeployment_Step3-s7kyvaruayay.2]: UPDATE_COMPLETE state changed 2019-04-11 16:36:14Z [overcloud-AllNodesDeploySteps-mue3k3oadw6g-ControllerDeployment_Step3-s7kyvaruayay.1]: SIGNAL_IN_PROGRESS Signal: deployment 6faa301b-5485-464d-bcce-085475f1bdd6 succeeded 2019-04-11 16:36:14Z [overcloud-AllNodesDeploySteps-mue3k3oadw6g-ControllerDeployment_Step3-s7kyvaruayay.1]: UPDATE_COMPLETE state changed 2019-04-11 17:25:59Z [AllNodesDeploySteps]: UPDATE_FAILED UPDATE aborted (Task update from TemplateResource "AllNodesDeploySteps" [5d539de6-0f6b-4725-bb0b-1f01612db057] Stack "overcloud" [b8920176-a660-447e-9426-d364d169edf1] Timed out) 2019-04-11 17:26:00Z [overcloud-AllNodesDeploySteps-mue3k3oadw6g]: UPDATE_FAILED Stack UPDATE cancelled 2019-04-11 17:26:00Z [overcloud]: UPDATE_FAILED Timed out 2019-04-11 17:26:00Z [overcloud-AllNodesDeploySteps-mue3k3oadw6g-ControllerDeployment_Step3-s7kyvaruayay]: UPDATE_FAILED Stack UPDATE cancelled 2019-04-11 17:26:01Z [overcloud-AllNodesDeploySteps-mue3k3oadw6g.ControllerDeployment_Step3]: UPDATE_FAILED resources.ControllerDeployment_Step3: Stack UPDATE cancelled 2019-04-11 17:26:01Z [overcloud-AllNodesDeploySteps-mue3k3oadw6g]: UPDATE_FAILED Resource UPDATE failed: resources.ControllerDeployment_Step3: Stack UPDATE cancelled Stack overcloud UPDATE_FAILED overcloud.AllNodesDeploySteps.ControllerDeployment_Step3: resource_type: OS::TripleO::DeploymentSteps physical_resource_id: 63fc1f75-7b7b-4e71-9e40-4601d47af22c status: UPDATE_FAILED status_reason: | resources.ControllerDeployment_Step3: Stack UPDATE cancelled Heat Stack update failed. Heat Stack update failed. The "openstack software deployment show 70049cee-a8b2-4bbc-b69b-6586355f8a6b" that failed: +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Field | Value +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | id | 70049cee-a8b2-4bbc-b69b-6586355f8a6b | server_id | bfe73e5e-d7ad-4481-8ad0-eca26a005e08 | config_id | d058abec-38fe-40c6-bdd6-2ac5b2cdfc7d | creation_time | 2019-04-10T19:45:44Z | updated_time | 2019-04-11T18:14:43Z | status | FAILED | status_reason | Deployment cancelled. | input_values | {u'role_data_docker_config': {u'step_1': {u'cinder_volume_image_tag': {u'start_order': 1, u'image': u'192.168.24.1:8787/rhosp13/openstack-cinder-volume:2019-02-28.1', u'co | action | UPDATE +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- On the controllers, the haproxy-bundle container is gone (the were all the and running fine before the update, I made sure of that): [root@controller-0 ~]# docker ps --all|grep haproxy a7051a82532f 192.168.24.1:8787/rhosp13/openstack-haproxy:2019-02-28.1 "/docker_puppet_ap..." 2 hours ago Exited (0) 2 hours ago haproxy_init_bundle 123c4c3c834b 192.168.24.1:8787/rhosp13/openstack-haproxy:2019-02-28.1 "/usr/bin/bootstra..." 2 hours ago Exited (0) 2 hours ago haproxy_restart_bundle f76af7f46f0a 192.168.24.1:8787/rhosp13/openstack-haproxy:2019-02-28.1 "/bin/bash -c '/us..." 29 hours ago Exited (0) 29 hours ago haproxy_image_tag So haproxy_init_bundle was triggered. Here's the docker logs of that container: Notice: /Stage[main]/Tripleo::Profile::Pacemaker::Haproxy_bundle/Tripleo::Pacemaker::Resource_restart_flag[haproxy-clone]/File[/var/lib/tripleo]/ensure: created Notice: /Stage[main]/Tripleo::Profile::Pacemaker::Haproxy_bundle/Tripleo::Pacemaker::Resource_restart_flag[haproxy-clone]/File[/var/lib/tripleo/pacemaker-restarts]/ensure: created Info: Tripleo::Pacemaker::Resource_restart_flag[haproxy-clone]: Unscheduling all events on Tripleo::Pacemaker::Resource_restart_flag[haproxy-clone] Info: Creating state file /var/lib/puppet/state/state.yaml Notice: Applied catalog in 76.56 seconds Changes: Total: 9 Events: Success: 9 Total: 9 Resources: Total: 250 Skipped: 37 Out of sync: 8 Changed: 8 Time: Concat file: 0.00 File line: 0.00 Concat fragment: 0.01 Firewall: 0.07 File: 0.13 Last run: 1555000471 Pcmk constraint: 21.20 Pcmk resource: 37.04 Pcmk bundle: 6.01 Config retrieval: 7.72 Total: 81.97 Pcmk property: 9.79 Version: Config: 1555000386 Puppet: 4.8.2 And here's the docker container logs of the haproxy_restart_bundle container: Bundle: haproxy-bundle Docker: image=192.168.24.1:8787/rhosp13/openstack-haproxy:pcmklatest network=host options="--user=root --log-driver=journald -e KOLLA_CONFIG_STRATEGY=COPY_ALWAYS" replicas=3 run-command="/bin/bash /usr/local/bin/kolla_start" Storage Mapping: options=ro source-dir=/var/lib/kolla/config_files/haproxy.json target-dir=/var/lib/kolla/config_files/config.json (haproxy-cfg-files) options=ro source-dir=/var/lib/config-data/puppet-generated/haproxy/ target-dir=/var/lib/kolla/config_files/src (haproxy-cfg-data) options=ro source-dir=/etc/hosts target-dir=/etc/hosts (haproxy-hosts) options=ro source-dir=/etc/localtime target-dir=/etc/localtime (haproxy-localtime) options=rw source-dir=/var/lib/haproxy target-dir=/var/lib/haproxy (haproxy-var-lib) options=ro source-dir=/etc/pki/ca-trust/extracted target-dir=/etc/pki/ca-trust/extracted (haproxy-pki-extracted) options=ro source-dir=/etc/pki/tls/certs/ca-bundle.crt target-dir=/etc/pki/tls/certs/ca-bundle.crt (haproxy-pki-ca-bundle-crt) options=ro source-dir=/etc/pki/tls/certs/ca-bundle.trust.crt target-dir=/etc/pki/tls/certs/ca-bundle.trust.crt (haproxy-pki-ca-bundle-trust-crt) options=ro source-dir=/etc/pki/tls/cert.pem target-dir=/etc/pki/tls/cert.pem (haproxy-pki-cert) options=rw source-dir=/dev/log target-dir=/dev/log (haproxy-dev-log) haproxy-bundle successfully restarted haproxy-bundle restart invoked Here you have pacemaker info about haproxy: [root@controller-0 ~]# pcs status|grep haproxy Docker container set: haproxy-bundle [192.168.24.1:8787/rhosp13/openstack-haproxy:pcmklatest] haproxy-bundle-docker-0 (ocf::heartbeat:docker): Stopped haproxy-bundle-docker-1 (ocf::heartbeat:docker): Stopped haproxy-bundle-docker-2 (ocf::heartbeat:docker): Stopped * haproxy-bundle-docker-0_start_0 on controller-2 'unknown error' (1): call=145, status=complete, exitreason='Newly created docker container exited after start', * haproxy-bundle-docker-1_start_0 on controller-2 'unknown error' (1): call=149, status=complete, exitreason='Newly created docker container exited after start', * haproxy-bundle-docker-2_start_0 on controller-2 'unknown error' (1): call=133, status=complete, exitreason='Newly created docker container exited after start', * haproxy-bundle-docker-0_start_0 on controller-0 'unknown error' (1): call=139, status=complete, exitreason='Newly created docker container exited after start', * haproxy-bundle-docker-1_start_0 on controller-0 'unknown error' (1): call=143, status=complete, exitreason='Newly created docker container exited after start', * haproxy-bundle-docker-2_start_0 on controller-0 'unknown error' (1): call=145, status=complete, exitreason='Newly created docker container exited after start', * haproxy-bundle-docker-0_start_0 on controller-1 'unknown error' (1): call=137, status=complete, exitreason='Newly created docker container exited after start', * haproxy-bundle-docker-1_start_0 on controller-1 'unknown error' (1): call=133, status=complete, exitreason='Newly created docker container exited after start', * haproxy-bundle-docker-2_start_0 on controller-1 'unknown error' (1): call=149, status=complete, exitreason='Newly created docker container exited after start', So the containers were started but were killed because of something I still can't identify. My deploy command (only thing I added are the 3 new yaml files): (undercloud) [stack@undercloud-0 ~]$ cat overcloud_deploy_TLS.sh #!/bin/bash openstack overcloud deploy \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack overcloud \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -e /home/stack/virt/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/hostnames.yml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e /home/stack/virt/extra_templates.yaml \ -e /home/stack/virt/docker-images.yaml \ -e /home/stack/virt/enable-tls.yaml \ -e /home/stack/virt/inject-trust-anchor.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ --log-file overcloud_deployment_48.log I need your help to troubleshoot this. I have an environment you can have access if you want to troubleshoot it quickly. Just poke me (IRC). Version-Release number of selected component (if applicable): Most recent OSP13 installation. How reproducible: 100% Steps to Reproduce: 1. Install OSP13 without TLS 2. Follow the procedure: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/advanced_overcloud_customization/sect-enabling_ssltls_on_the_overcloud 3. deployment will timeout and haproxy containers disappear from each controller node. Actual results: openstack overcloud deploy command timeout and haproxy containers disappear. Expected results: Finish integration of TLS. Additional info:
A working workaround is to apply the following to your templates: parameter_defaults: ControllerExtraConfig: pacemaker::resource::bundle::deep_compare: true pacemaker::resource::ip::deep_compare: true pacemaker::resource::ocf::deep_compare: true And rerun your overcloud deploy command (with your added TLS yaml files of course). Thank you Michele Baldessari
You also need this package on your overcloud nodes: puppet-pacemaker-0.7.2-0.20180423212257.el7ost.noarch.rpm