Description of problem: (undercloud) [stack@undercloud-0 ~]$ openstack stack failures list overcloud overcloud.ControllerServiceChain: resource_type: OS::TripleO::Services physical_resource_id: d57f3fcf-ef85-4b04-8839-4f4712af5d80 status: UPDATE_FAILED status_reason: | UPDATE aborted (Task update from TemplateResource "ControllerServiceChain" [d57f3fcf-ef85-4b04-8839-4f4712af5d80] Stack "overcloud" [4199f63f-fa82-4814-89c4-e7f0a92298c5] Timed out) Controller replacement failed after executing overcloud deploy command with replace_controller.yaml (undercloud) [stack@undercloud-0 ~]$ cat overcloud_replace.sh #!/bin/bash openstack overcloud deploy \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack overcloud \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -e /home/stack/virt/config_lvm.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/enable-tls.yaml \ -e /home/stack/virt/inject-trust-anchor.yaml \ -e /home/stack/virt/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/nodes_data.yaml \ -e /home/stack/virt/extra_templates.yaml \ -e /home/stack/virt/docker-images.yaml \ -e /home/stack/remove-controller.yaml \ --log-file overcloud_deployment_14.log (undercloud) [stack@undercloud-0 ~]$ cat remove-controller.yaml parameters: ControllerRemovalPolicies: [{'resource_list': ['1']}] parameter_defaults: CorosyncSettleTries: 5 step 11.4.3. Node Replacement from https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/director_installation_and_usage/#sect-Replacing_Controller_Nodes Version-Release number of selected component (if applicable): OSP14 puddle - 2018-09-06.1 openstack-tripleo-heat-templates-9.0.0-0.20180831204457.17bb71e.0rc1.el7ost.noarch openstack-tripleo-validations-9.3.1-0.20180831205305.fbfd253.el7ost.noarch python2-tripleo-common-9.3.1-0.20180831204016.bb0582a.el7ost.noarch python-tripleoclient-heat-installer-10.5.1-0.20180901082351.6d7aa74.el7ost.noarch openstack-tripleo-image-elements-9.0.0-0.20180831210308.2dc678a.el7ost.noarch ansible-role-tripleo-modify-image-1.0.1-0.20180903052248.40521ee.el7ost.noarch openstack-tripleo-heat-templates-9.0.0-0.20180831204457.17bb71e.0rc1.el7ost.noarch ansible-tripleo-ipsec-9.0.1-0.20180827143021.d2b9234.el7ost.noarch puppet-tripleo-9.3.1-0.20180831202649.8ec6c86.el7ost.noarch openstack-tripleo-common-9.3.1-0.20180831204016.bb0582a.el7ost.noarch python-tripleoclient-10.5.1-0.20180901082351.6d7aa74.el7ost.noarch openstack-tripleo-puppet-elements-9.0.0-0.20180831205939.0641fdc.el7ost.noarch openstack-tripleo-common-containers-9.3.1-0.20180831204016.bb0582a.el7ost.noarch How reproducible: Steps to Reproduce: 1. Deploy OSP14 overcloud with 3 controllers 2. Configure fencing 3. Corrupt controller node(corrupt disk) 4. Check that overcloud is operable 5. Try to replace controller using official documentation Actual results: UPDATE aborted (Task update from TemplateResource "ControllerServiceChain" [d57f3fcf-ef85-4b04-8839-4f4712af5d80] Stack "overcloud" [4199f63f-fa82-4814-89c4-e7f0a92298c5] Timed out) Expected results: Replacement failed on Controller.deployment stage Additional info:
sosreport: http://rhos-release.virt.bos.redhat.com/log/bz1632461
also I got that issue when tried to configure fencing for overcloud nodes 2018-09-25 09:52:04Z [ControllerServiceChain]: UPDATE_FAILED UPDATE aborted (Task update from TemplateResource "ControllerServiceChain" [98d77a5e-5ac8-4526-be5e-295187ee647c] Stack "overcloud" [7ec2d8f7-d3d8-442e-b0b6-55159753f7d7] Timed out) 2018-09-25 09:52:04Z [overcloud-ControllerServiceChain-fydg2oycsm3d]: UPDATE_FAILED Stack UPDATE cancelled 2018-09-25 09:52:04Z [overcloud]: UPDATE_FAILED Timed out 2018-09-25 09:52:04Z [overcloud-ControllerServiceChain-fydg2oycsm3d-ServiceChain-6efl5grn3t7t]: UPDATE_FAILED Stack UPDATE cancelled 2018-09-25 09:52:06Z [overcloud-ControllerServiceChain-fydg2oycsm3d-ServiceChain-6efl5grn3t7t.1]: UPDATE_FAILED resources[1]: Stack UPDATE cancelled 2018-09-25 09:52:06Z [overcloud-ControllerServiceChain-fydg2oycsm3d-ServiceChain-6efl5grn3t7t]: UPDATE_FAILED Resource UPDATE failed: resources[1]: Stack UPDATE cancelled Stack overcloud/7ec2d8f7-d3d8-442e-b0b6-55159753f7d7 UPDATE_FAILED overcloud.ControllerServiceChain: resource_type: OS::TripleO::Services physical_resource_id: 98d77a5e-5ac8-4526-be5e-295187ee647c status: UPDATE_FAILED status_reason: | UPDATE aborted (Task update from TemplateResource "ControllerServiceChain" [98d77a5e-5ac8-4526-be5e-295187ee647c] Stack "overcloud" [7ec2d8f7-d3d8-442e-b0b6-55159753f7d7] Timed out) Heat Stack update failed. Heat Stack update failed. (undercloud) [stack@undercloud-0 ~]$ cat overcloud_deploy.sh #!/bin/bash openstack overcloud deploy \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack overcloud \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -e /home/stack/virt/config_lvm.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/enable-tls.yaml \ -e /home/stack/virt/inject-trust-anchor.yaml \ -e /home/stack/virt/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/nodes_data.yaml \ -e /home/stack/virt/extra_templates.yaml \ -e /home/stack/virt/docker-images.yaml \ -e /home/stack/fencing.yaml \ --log-file overcloud_deployment_14.log (undercloud) [stack@undercloud-0 ~]$ cat fencing.yaml parameter_defaults: EnableFencing: true FencingConfig: devices: - agent: fence_ipmilan host_mac: 52:54:00:f1:1b:9c params: ipaddr: 172.16.0.1 ipport: '6234' lanplus: true login: admin passwd: password pcmk_host_list: compute-0 privlvl: administrator - agent: fence_ipmilan host_mac: 52:54:00:4d:07:a9 params: ipaddr: 172.16.0.1 ipport: '6233' lanplus: true login: admin passwd: password pcmk_host_list: controller-2 privlvl: administrator - agent: fence_ipmilan host_mac: 52:54:00:c9:c5:6b params: ipaddr: 172.16.0.1 ipport: '6232' lanplus: true login: admin passwd: password pcmk_host_list: controller-1 privlvl: administrator - agent: fence_ipmilan host_mac: 52:54:00:3f:f2:81 params: ipaddr: 172.16.0.1 ipport: '6230' lanplus: true login: admin passwd: password pcmk_host_list: controller-0 privlvl: administrator
please provide Heat logs from the undercloud
I'm thinking this is probably a symptom of the same cause as https://bugzilla.redhat.com/show_bug.cgi?id=1629062
(In reply to James Slagle from comment #6) > please provide Heat logs from the undercloud sosreport: http://rhos-release.virt.bos.redhat.com/log/bz1632461
based on the error and the data we have, i'm marking this one a duplicate of bug 1629062. If you feel that is incorrect, and you are able to still reproduce the issue after increasing undercloud resources, please reopen it with that additional data. *** This bug has been marked as a duplicate of bug 1629062 ***