Description of problem: While troubleshooting Bug 1857298, we identified issues with the scale down playbook including the common playbook that is not properly skipped if all the nodes are unavailable. Additionally we have some expections in the common playbook that all nodes targeted will be available when they may not be on scale down. Additionally the dynamic any_error_fatal setting does not appear to be honored. Version-Release number of selected component (if applicable): python3-tripleo-common-11.3.3-0.20200611110655.f7715be.el8ost.noarch openstack-tripleo-common-11.3.3-0.20200611110655.f7715be.el8ost.noarch openstack-tripleo-validations-11.3.2-0.20200611115252.08f469d.el8ost.noarch ansible-tripleo-ipa-0.2.1-0.20200611104546.c22fc8d.el8ost.noarch ansible-tripleo-ipsec-9.2.1-0.20200311073016.0c8693c.el8ost.noarch puppet-tripleo-11.5.0-0.20200616033427.8ff1c6a.el8ost.noarch openstack-tripleo-puppet-elements-11.2.2-0.20200527003426.226ce95.el8ost.noarch python3-tripleoclient-12.3.2-0.20200615103427.6f877f6.el8ost.noarch ansible-role-tripleo-modify-image-1.2.1-0.20200527233426.bc21900.el8ost.noarch openstack-tripleo-common-containers-11.3.3-0.20200611110655.f7715be.el8ost.noarch openstack-tripleo-heat-templates-11.3.2-0.20200616081529.396affd.el8ost.noarch tripleo-ansible-0.5.1-0.20200611113655.34b8fcc.el8ost.noarch python3-tripleoclient-heat-installer-12.3.2-0.20200615103427.6f877f6.el8ost.noarch openstack-tripleo-image-elements-10.6.2-0.20200528043425.7dc0fa1.el8ost.noarch How reproducible: Reproducible when all nodes being scaled down are unavailable. Steps to Reproduce: 1. deploy overcloud 2. turn off compute node 3. attempt to scale down compute node Actual results: Failure during scale down action execution Expected results: Down nodes should be ignored. Additional info:
*** Bug 1857004 has been marked as a duplicate of this bug. ***
removing Blocker flag, this has already been approved for 16.1.1
Had deployment with two compute nodes. Shut each node. Deletion of both nodes was successful: TASK [Stop nova-compute healthcheck container] ********************************* Thursday 30 July 2020 12:07:41 -0400 (0:00:04.201) 0:02:50.683 ********* fatal: [compute-1]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"192.168.24.30\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.30 port 22: No route to host\r\n", "skip_reason": "Host compute-1 is unreachable", "unreachable": true} fatal: [compute-2]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"192.168.24.54\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.54 port 22: No route to host\r\n", "skip_reason": "Host compute-2 is unreachable", "unreachable": true} TASK [Stop nova-compute container] ********************************************* Thursday 30 July 2020 12:10:01 -0400 (0:02:20.489) 0:05:11.173 ********* fatal: [compute-2]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"192.168.24.54\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.54 port 22: No route to host\r\n", "skip_reason": "Host compute-2 is unreachable", "unreachable": true} fatal: [compute-1]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"192.168.24.30\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.30 port 22: No route to host\r\n", "skip_reason": "Host compute-1 is unreachable", "unreachable": true} TASK [Delete nova-compute service] ********************************************* Thursday 30 July 2020 12:12:21 -0400 (0:02:19.815) 0:07:30.989 ********* changed: [compute-2] changed: [compute-1] TASK [fail] ******************************************************************** Thursday 30 July 2020 12:12:26 -0400 (0:00:05.145) 0:07:36.134 ********* skipping: [compute-1] skipping: [compute-2] PLAY RECAP ********************************************************************* compute-1 : ok=9 changed=2 unreachable=3 failed=0 skipped=5 rescued=0 ignored=0 compute-2 : ok=8 changed=2 unreachable=3 failed=0 skipped=5 rescued=0 ignored=0 Thursday 30 July 2020 12:12:26 -0400 (0:00:00.110) 0:07:36.245 ********* =============================================================================== Ansible passed. Previous to fix same test showed: Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1 director bug fix advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3542
*** Bug 1856922 has been marked as a duplicate of this bug. ***