Bug 1632461 - [OSP14] Overcloud stack update failed : "UPDATE aborted (Task update from TemplateResource "ControllerServiceChain" time out
Summary: [OSP14] Overcloud stack update failed : "UPDATE aborted (Task update from Tem...
Keywords:
Status: CLOSED DUPLICATE of bug 1629062
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 14.0 (Rocky)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: James Slagle
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-24 20:37 UTC by Artem Hrechanychenko
Modified: 2020-01-08 18:03 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-16 21:22:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Artem Hrechanychenko 2018-09-24 20:37:39 UTC
Description of problem:

(undercloud) [stack@undercloud-0 ~]$ openstack stack  failures list overcloud
overcloud.ControllerServiceChain:
  resource_type: OS::TripleO::Services
  physical_resource_id: d57f3fcf-ef85-4b04-8839-4f4712af5d80
  status: UPDATE_FAILED
  status_reason: |
    UPDATE aborted (Task update from TemplateResource "ControllerServiceChain" [d57f3fcf-ef85-4b04-8839-4f4712af5d80] Stack "overcloud" [4199f63f-fa82-4814-89c4-e7f0a92298c5] Timed out)

Controller replacement failed after executing overcloud deploy command with replace_controller.yaml 

(undercloud) [stack@undercloud-0 ~]$ cat overcloud_replace.sh 
#!/bin/bash

openstack overcloud deploy \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /home/stack/virt/config_lvm.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/nodes_data.yaml \
-e /home/stack/virt/extra_templates.yaml \
-e /home/stack/virt/docker-images.yaml \
-e /home/stack/remove-controller.yaml \
--log-file overcloud_deployment_14.log
(undercloud) [stack@undercloud-0 ~]$ cat remove-controller.yaml 
parameters:
  ControllerRemovalPolicies:
    [{'resource_list': ['1']}]
parameter_defaults:
  CorosyncSettleTries: 5


step 11.4.3. Node Replacement from 
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/director_installation_and_usage/#sect-Replacing_Controller_Nodes 

Version-Release number of selected component (if applicable):
OSP14 puddle - 2018-09-06.1

openstack-tripleo-heat-templates-9.0.0-0.20180831204457.17bb71e.0rc1.el7ost.noarch
openstack-tripleo-validations-9.3.1-0.20180831205305.fbfd253.el7ost.noarch
python2-tripleo-common-9.3.1-0.20180831204016.bb0582a.el7ost.noarch
python-tripleoclient-heat-installer-10.5.1-0.20180901082351.6d7aa74.el7ost.noarch
openstack-tripleo-image-elements-9.0.0-0.20180831210308.2dc678a.el7ost.noarch
ansible-role-tripleo-modify-image-1.0.1-0.20180903052248.40521ee.el7ost.noarch
openstack-tripleo-heat-templates-9.0.0-0.20180831204457.17bb71e.0rc1.el7ost.noarch
ansible-tripleo-ipsec-9.0.1-0.20180827143021.d2b9234.el7ost.noarch
puppet-tripleo-9.3.1-0.20180831202649.8ec6c86.el7ost.noarch
openstack-tripleo-common-9.3.1-0.20180831204016.bb0582a.el7ost.noarch
python-tripleoclient-10.5.1-0.20180901082351.6d7aa74.el7ost.noarch
openstack-tripleo-puppet-elements-9.0.0-0.20180831205939.0641fdc.el7ost.noarch
openstack-tripleo-common-containers-9.3.1-0.20180831204016.bb0582a.el7ost.noarch


How reproducible:


Steps to Reproduce:
1. Deploy OSP14 overcloud with 3 controllers
2. Configure fencing
3. Corrupt controller node(corrupt disk)
4. Check that overcloud is operable
5. Try to replace controller using official documentation 

Actual results:
 UPDATE aborted (Task update from TemplateResource "ControllerServiceChain" [d57f3fcf-ef85-4b04-8839-4f4712af5d80] Stack "overcloud" [4199f63f-fa82-4814-89c4-e7f0a92298c5] Timed out)


Expected results:
Replacement failed on Controller.deployment stage

Additional info:

Comment 1 Artem Hrechanychenko 2018-09-24 20:57:47 UTC
sosreport:

http://rhos-release.virt.bos.redhat.com/log/bz1632461

Comment 2 Artem Hrechanychenko 2018-09-25 10:44:07 UTC
also I got that issue when tried to configure fencing for overcloud nodes

	2018-09-25 09:52:04Z [ControllerServiceChain]: UPDATE_FAILED  UPDATE aborted (Task update from TemplateResource "ControllerServiceChain" [98d77a5e-5ac8-4526-be5e-295187ee647c] Stack "overcloud" [7ec2d8f7-d3d8-442e-b0b6-55159753f7d7] Timed out)
2018-09-25 09:52:04Z [overcloud-ControllerServiceChain-fydg2oycsm3d]: UPDATE_FAILED  Stack UPDATE cancelled
2018-09-25 09:52:04Z [overcloud]: UPDATE_FAILED  Timed out
2018-09-25 09:52:04Z [overcloud-ControllerServiceChain-fydg2oycsm3d-ServiceChain-6efl5grn3t7t]: UPDATE_FAILED  Stack UPDATE cancelled
2018-09-25 09:52:06Z [overcloud-ControllerServiceChain-fydg2oycsm3d-ServiceChain-6efl5grn3t7t.1]: UPDATE_FAILED  resources[1]: Stack UPDATE cancelled
2018-09-25 09:52:06Z [overcloud-ControllerServiceChain-fydg2oycsm3d-ServiceChain-6efl5grn3t7t]: UPDATE_FAILED  Resource UPDATE failed: resources[1]: Stack UPDATE cancelled

 Stack overcloud/7ec2d8f7-d3d8-442e-b0b6-55159753f7d7 UPDATE_FAILED 

overcloud.ControllerServiceChain:
  resource_type: OS::TripleO::Services
  physical_resource_id: 98d77a5e-5ac8-4526-be5e-295187ee647c
  status: UPDATE_FAILED
  status_reason: |
    UPDATE aborted (Task update from TemplateResource "ControllerServiceChain" [98d77a5e-5ac8-4526-be5e-295187ee647c] Stack "overcloud" [7ec2d8f7-d3d8-442e-b0b6-55159753f7d7] Timed out)
Heat Stack update failed.
Heat Stack update failed.

(undercloud) [stack@undercloud-0 ~]$ cat overcloud_deploy.sh 
#!/bin/bash

openstack overcloud deploy \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /home/stack/virt/config_lvm.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/nodes_data.yaml \
-e /home/stack/virt/extra_templates.yaml \
-e /home/stack/virt/docker-images.yaml \
-e /home/stack/fencing.yaml \
--log-file overcloud_deployment_14.log


(undercloud) [stack@undercloud-0 ~]$ cat fencing.yaml 
parameter_defaults:
  EnableFencing: true
  FencingConfig:
    devices:
    - agent: fence_ipmilan
      host_mac: 52:54:00:f1:1b:9c
      params:
        ipaddr: 172.16.0.1
        ipport: '6234'
        lanplus: true
        login: admin
        passwd: password
        pcmk_host_list: compute-0
        privlvl: administrator
    - agent: fence_ipmilan
      host_mac: 52:54:00:4d:07:a9
      params:
        ipaddr: 172.16.0.1
        ipport: '6233'
        lanplus: true
        login: admin
        passwd: password
        pcmk_host_list: controller-2
        privlvl: administrator
    - agent: fence_ipmilan
      host_mac: 52:54:00:c9:c5:6b
      params:
        ipaddr: 172.16.0.1
        ipport: '6232'
        lanplus: true
        login: admin
        passwd: password
        pcmk_host_list: controller-1
        privlvl: administrator
    - agent: fence_ipmilan
      host_mac: 52:54:00:3f:f2:81
      params:
        ipaddr: 172.16.0.1
        ipport: '6230'
        lanplus: true
        login: admin
        passwd: password
        pcmk_host_list: controller-0
        privlvl: administrator

Comment 6 James Slagle 2018-09-27 18:21:30 UTC
please provide Heat logs from the undercloud

Comment 7 James Slagle 2018-09-27 20:52:46 UTC
I'm thinking this is probably a symptom of the same cause as https://bugzilla.redhat.com/show_bug.cgi?id=1629062

Comment 8 Artem Hrechanychenko 2018-09-28 09:31:02 UTC
(In reply to James Slagle from comment #6)
> please provide Heat logs from the undercloud

sosreport:
http://rhos-release.virt.bos.redhat.com/log/bz1632461

Comment 10 James Slagle 2018-10-16 21:22:12 UTC
based on the error and the data we have, i'm marking this one a duplicate of bug 1629062.

If you feel that is incorrect, and you are able to still reproduce the issue after increasing undercloud resources, please reopen it with that additional data.

*** This bug has been marked as a duplicate of bug 1629062 ***


Note You need to log in before you can comment on or make changes to this bug.