Bug 1857365 - scale down of nodes is failing if all nodes are unreachable
Summary: scale down of nodes is failing if all nodes are unreachable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z1
: 16.1 (Train on RHEL 8.2)
Assignee: Emilien Macchi
QA Contact: David Rosenfeld
URL:
Whiteboard:
: 1856922 1857004 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-15 17:36 UTC by Alex Schultz
Modified: 2024-03-25 16:17 UTC (History)
6 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200616081533.396affd
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-27 15:19:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ansible ansible issues 70663 0 None closed any_errors_fatal cannot be configured dynamically 2021-02-12 14:00:31 UTC
Launchpad 1887702 0 None None None 2020-07-15 17:48:14 UTC
OpenStack gerrit 741298 0 None MERGED deploy-steps-playbooks-common: fix logic for scale_ignore_unreachable 2021-02-12 14:00:31 UTC
Red Hat Issue Tracker OSP-31723 0 None None None 2024-03-25 16:17:32 UTC
Red Hat Product Errata RHBA-2020:3542 0 None None None 2020-08-27 15:19:28 UTC

Description Alex Schultz 2020-07-15 17:36:39 UTC
Description of problem:
While troubleshooting Bug 1857298, we identified issues with the scale down playbook including the common playbook that is not properly skipped if all the nodes are unavailable. Additionally we have some expections in the common playbook that all nodes targeted will be available when they may not be on scale down. Additionally the dynamic any_error_fatal setting does not appear to be honored.

Version-Release number of selected component (if applicable):
python3-tripleo-common-11.3.3-0.20200611110655.f7715be.el8ost.noarch
openstack-tripleo-common-11.3.3-0.20200611110655.f7715be.el8ost.noarch
openstack-tripleo-validations-11.3.2-0.20200611115252.08f469d.el8ost.noarch
ansible-tripleo-ipa-0.2.1-0.20200611104546.c22fc8d.el8ost.noarch
ansible-tripleo-ipsec-9.2.1-0.20200311073016.0c8693c.el8ost.noarch
puppet-tripleo-11.5.0-0.20200616033427.8ff1c6a.el8ost.noarch
openstack-tripleo-puppet-elements-11.2.2-0.20200527003426.226ce95.el8ost.noarch
python3-tripleoclient-12.3.2-0.20200615103427.6f877f6.el8ost.noarch
ansible-role-tripleo-modify-image-1.2.1-0.20200527233426.bc21900.el8ost.noarch
openstack-tripleo-common-containers-11.3.3-0.20200611110655.f7715be.el8ost.noarch
openstack-tripleo-heat-templates-11.3.2-0.20200616081529.396affd.el8ost.noarch
tripleo-ansible-0.5.1-0.20200611113655.34b8fcc.el8ost.noarch
python3-tripleoclient-heat-installer-12.3.2-0.20200615103427.6f877f6.el8ost.noarch
openstack-tripleo-image-elements-10.6.2-0.20200528043425.7dc0fa1.el8ost.noarch

How reproducible:
Reproducible when all nodes being scaled down are unavailable.

Steps to Reproduce:
1. deploy overcloud
2. turn off compute node
3. attempt to scale down compute node

Actual results:
Failure during scale down action execution

Expected results:
Down nodes should be ignored.


Additional info:

Comment 1 Alex Schultz 2020-07-17 14:00:16 UTC
*** Bug 1857004 has been marked as a duplicate of this bug. ***

Comment 3 spower 2020-07-22 09:41:54 UTC
removing Blocker flag, this has already been approved for 16.1.1

Comment 6 David Rosenfeld 2020-07-30 16:47:26 UTC
Had deployment with two compute nodes. Shut each node. Deletion of both nodes was successful:

TASK [Stop nova-compute healthcheck container] *********************************
Thursday 30 July 2020  12:07:41 -0400 (0:00:04.201)       0:02:50.683 ********* 
fatal: [compute-1]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"192.168.24.30\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.30 port 22: No route to host\r\n", "skip_reason": "Host compute-1 is unreachable", "unreachable": true}

fatal: [compute-2]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"192.168.24.54\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.54 port 22: No route to host\r\n", "skip_reason": "Host compute-2 is unreachable", "unreachable": true}

TASK [Stop nova-compute container] *********************************************
Thursday 30 July 2020  12:10:01 -0400 (0:02:20.489)       0:05:11.173 ********* 
fatal: [compute-2]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"192.168.24.54\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.54 port 22: No route to host\r\n", "skip_reason": "Host compute-2 is unreachable", "unreachable": true}

fatal: [compute-1]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"192.168.24.30\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.30 port 22: No route to host\r\n", "skip_reason": "Host compute-1 is unreachable", "unreachable": true}

TASK [Delete nova-compute service] *********************************************
Thursday 30 July 2020  12:12:21 -0400 (0:02:19.815)       0:07:30.989 ********* 
changed: [compute-2]
changed: [compute-1]

TASK [fail] ********************************************************************
Thursday 30 July 2020  12:12:26 -0400 (0:00:05.145)       0:07:36.134 ********* 
skipping: [compute-1]
skipping: [compute-2]

PLAY RECAP *********************************************************************
compute-1                  : ok=9    changed=2    unreachable=3    failed=0    skipped=5    rescued=0    ignored=0   
compute-2                  : ok=8    changed=2    unreachable=3    failed=0    skipped=5    rescued=0    ignored=0   

Thursday 30 July 2020  12:12:26 -0400 (0:00:00.110)       0:07:36.245 ********* 
=============================================================================== 

Ansible passed.


Previous to fix same test showed: Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log.

Comment 9 errata-xmlrpc 2020-08-27 15:19:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3542

Comment 10 Alex Schultz 2020-09-09 13:22:56 UTC
*** Bug 1856922 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.