Bug 1857298 - Usage of DeploymentServerBlacklist causes all nova-computes to be removed from an overcloud during node removal
Summary: Usage of DeploymentServerBlacklist causes all nova-computes to be removed fro...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z1
: 16.1 (Train on RHEL 8.2)
Assignee: Emilien Macchi
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-15 15:42 UTC by Alex Schultz
Modified: 2024-10-01 16:42 UTC (History)
7 users (show)

Fixed In Version: openstack-tripleo-common-11.3.3-0.20200611110657.f7715be.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-27 15:19:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 741293 0 None MERGED ansible: limit_hosts now takes precedence over blacklisted_hostnames 2020-12-16 12:48:34 UTC
Red Hat Issue Tracker OSP-5711 0 None None None 2022-08-11 16:02:32 UTC
Red Hat Product Errata RHBA-2020:3542 0 None None None 2020-08-27 15:19:28 UTC

Description Alex Schultz 2020-07-15 15:42:32 UTC
Description of problem:
If DeploymentServerBlacklist is defined in a stack when we construct the ansible playbook execution for the scale down actions, we end up provising two --limits to the ansible playbook which causes the target scale tasks to be run against all nodes not in the DeploymentServerBlacklist instead of the nodes targeted to be removed.

42430     354597  0.0  1.8 5097428 4778888 ?     R    15:15   0:00 /usr/libexec/platform-python /usr/bin/ansible-playbook-3 /var/lib/mistral/overcloud/scale_playbook.yaml --limit overcloud-fc640compute-32,overcloud-fc640compute-20,overcloud-fc640compute-118,overcloud-fc640compute-123,overcloud-fc640compute-165 --become --timeout 600 --inventory-file /var/lib/mistral/overcloud/tripleo-ansible-inventory.yaml --limit !overcloud-fc640compute-20:!overcloud-fc640compute-32:!overcloud-fc640compute-118:!overcloud-fc640compute-123:!overcloud-fc640compute-165

Version-Release number of selected component (if applicable):
python3-tripleo-common-11.3.3-0.20200611110655.f7715be.el8ost.noarch
openstack-tripleo-common-11.3.3-0.20200611110655.f7715be.el8ost.noarch
openstack-tripleo-validations-11.3.2-0.20200611115252.08f469d.el8ost.noarch
ansible-tripleo-ipa-0.2.1-0.20200611104546.c22fc8d.el8ost.noarch
ansible-tripleo-ipsec-9.2.1-0.20200311073016.0c8693c.el8ost.noarch
puppet-tripleo-11.5.0-0.20200616033427.8ff1c6a.el8ost.noarch
openstack-tripleo-puppet-elements-11.2.2-0.20200527003426.226ce95.el8ost.noarch
python3-tripleoclient-12.3.2-0.20200615103427.6f877f6.el8ost.noarch
ansible-role-tripleo-modify-image-1.2.1-0.20200527233426.bc21900.el8ost.noarch
openstack-tripleo-common-containers-11.3.3-0.20200611110655.f7715be.el8ost.noarch
openstack-tripleo-heat-templates-11.3.2-0.20200616081529.396affd.el8ost.noarch
tripleo-ansible-0.5.1-0.20200611113655.34b8fcc.el8ost.noarch
python3-tripleoclient-heat-installer-12.3.2-0.20200615103427.6f877f6.el8ost.noarch
openstack-tripleo-image-elements-10.6.2-0.20200528043425.7dc0fa1.el8ost.noarch

How reproducible:
Always if DeploymentServerBlacklist is defined

Steps to Reproduce:
1. deploy overcloud with multiple compute nodes
2. configure DeploymentServerBlacklist with one of the compute nodes
3. Scale down defined compute nodes


Actual results:
nova-compute service removed from all computes not in the DeploymentServerBlacklist

Expected results:
nova-compute should only be removed from the targeted nodes for removal


Additional info:

Comment 7 David Rosenfeld 2020-07-31 17:59:07 UTC
Deployed 3control,3compute,1ceph
Created blacklist.yaml for compute-1 and updated stack
Performed scale down for compute-0
ansible command only contained one limit command:

ansible-playbook-3 /var/lib/mistral/overcloud/scale_playbook.yaml --limit compute-0 --become --timeout 600 --inventory-f
ile /var/lib/mistral/overcloud/tripleo-ansible-inventory.yaml "$@"

Compute-2 is still seen:
 
openstack compute service list
+--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+
| ID                                   | Binary         | Host                      | Zone     | Status  | State | Updated At                 |
+--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+
| 247c5330-b015-48cd-92c2-70d9e5247033 | nova-conductor | controller-0.redhat.local | internal | enabled | up    | 2020-07-30T20:11:49.000000 |
| 8c5159a7-6527-4a5b-b682-830b027010b8 | nova-conductor | controller-1.redhat.local | internal | enabled | up    | 2020-07-30T20:11:50.000000 |
| 84453328-e2d3-46d0-8ed8-5836f21cdcd7 | nova-conductor | controller-2.redhat.local | internal | enabled | up    | 2020-07-30T20:11:51.000000 |
| 9c51b639-d49b-43dd-b16d-910fca473d2b | nova-scheduler | controller-1.redhat.local | internal | enabled | up    | 2020-07-30T20:11:54.000000 |
| 463b5060-270f-4d56-9a1f-a4fd88e5cb20 | nova-scheduler | controller-0.redhat.local | internal | enabled | up    | 2020-07-30T20:11:54.000000 |
| 487be8ce-8056-4bb0-b858-23266a0fdda6 | nova-scheduler | controller-2.redhat.local | internal | enabled | up    | 2020-07-30T20:11:55.000000 |
| ee584a44-244a-4f1d-b752-b473fa1a1faf | nova-compute   | compute-1.redhat.local    | nova     | enabled | up    | 2020-07-30T20:11:52.000000 |
| 2c2272a5-0bac-4257-9e7b-89912780a7cf | nova-compute   | compute-2.redhat.local    | nova     | enabled | up    | 2020-07-30T20:11:53.000000 |
+--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+

Comment 10 errata-xmlrpc 2020-08-27 15:19:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3542


Note You need to log in before you can comment on or make changes to this bug.