Description of problem: Instance HA deployment on OSP13 via director fails Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. deploy OSP13 (3ctrl+2comp) via IR 2. delete the overcloud 3. re-deploy the overcloud with compute-instanceha.yaml, fencing.yaml & roles_data.yaml ( for Controller and IHA role) Actual results: Failed with below error:- ~~~ Stack overcloud CREATE_FAILED overcloud.AllNodesDeploySteps.ComputeInstanceHADeployment_Step2.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 3b9d347b-cf80-4dd6-8386-75abfeb065bf status: CREATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... " with Stdlib::Compat::Hash. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ssh/manifests/server.pp\", 12]:[\"/var/lib/tripleo-config/puppet_step_config.pp\", 41]", "Error: unable to get cib", "Error: /Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-0-compute-instanceha-role]: Could not evaluate: backup_cib: Running: /usr/sbin/pcs cluster cib /var/lib/pacemaker/cib/puppet-cib-backup20180703-18740-1uwe4us failed with code: 1 -> " ] } to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/1b9b1b44-509a-45b0-b2f1-663a8f41bc7d_playbook.retry PLAY RECAP ********************************************************************* localhost : ok=4 changed=1 unreachable=0 failed=1 (truncated, view all with --long) deploy_stderr: | ~~~ Expected results: Deployment should pass Additional info: 1) Deployment Script ~~~ #!/bin/bash nohup openstack overcloud deploy \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack overcloud \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -r /home/stack/virt/roles_data.yaml -e /home/stack/virt/config_lvm.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e /home/stack/virt/docker-images.yaml -e /home/stack/virt/fencing.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/compute-instanceha.yaml \ --log-file overcloud_deployment_37.log & ~~~ 2) openstack stack failures list overcloud --long >> failure.logs http://pastebin.test.redhat.com/612917 ~~~ [stack@undercloud-0 ~]$ tail -20 failure.logs " (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:28:in `deprecation')", "Warning: This method is deprecated, please use the stdlib validate_legacy function,", " with Stdlib::Compat::Bool. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 54]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " with Stdlib::Compat::Absolute_Path. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 55]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " with Stdlib::Compat::String. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 56]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " with Stdlib::Compat::Array. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 66]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " with Pattern[]. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 68]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " with Stdlib::Compat::Numeric. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 76]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " with Stdlib::Compat::Hash. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ssh/manifests/server.pp\", 12]:[\"/var/lib/tripleo-config/puppet_step_config.pp\", 41]", "Error: unable to get cib", "Error: /Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-0-compute-instanceha-role]: Could not evaluate: backup_cib: Running: /usr/sbin/pcs cluster cib /var/lib/pacemaker/cib/puppet-cib-backup20180703-18740-1uwe4us failed with code: 1 -> " ] } to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/1b9b1b44-509a-45b0-b2f1-663a8f41bc7d_playbook.retry PLAY RECAP ********************************************************************* localhost : ok=4 changed=1 unreachable=0 failed=1 deploy_stderr: | [stack@undercloud-0 ~]$ ~~~
*** Bug 1579469 has been marked as a duplicate of this bug. ***
Verified , Tested : https://url.corp.redhat.com/05c732c
This bug is marked for inclusion in the errata but does not currently contain draft documentation text. To ensure the timely release of this advisory please provide draft documentation text for this bug as soon as possible. If you do not think this bug requires errata documentation, set the requires_doc_text flag to "-". To add draft documentation text: * Select the documentation type from the "Doc Type" drop down field. * A template will be provided in the "Doc Text" field based on the "Doc Type" value selected. Enter draft text in the "Doc Text" field.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2574
There are 2 persons having this issue again - so opening the bug again.
it seems that just re-deploying over the failed environment without changing anything heals the stack
Let's track this race condition (likely introduced due to another fix related to reconnect_interval) over here https://bugzilla.redhat.com/show_bug.cgi?id=1624441 This BZ has sailed.