Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1857298

Summary: Usage of DeploymentServerBlacklist causes all nova-computes to be removed from an overcloud during node removal
Product: Red Hat OpenStack Reporter: Alex Schultz <aschultz>
Component: openstack-tripleo-commonAssignee: Emilien Macchi <emacchi>
Status: CLOSED ERRATA QA Contact: David Rosenfeld <drosenfe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.1 (Train)CC: cjeanner, emacchi, lshort, mbarnett, mburns, slinaber, spower
Target Milestone: z1Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-11.3.3-0.20200611110657.f7715be.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-27 15:19:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alex Schultz 2020-07-15 15:42:32 UTC
Description of problem:
If DeploymentServerBlacklist is defined in a stack when we construct the ansible playbook execution for the scale down actions, we end up provising two --limits to the ansible playbook which causes the target scale tasks to be run against all nodes not in the DeploymentServerBlacklist instead of the nodes targeted to be removed.

42430     354597  0.0  1.8 5097428 4778888 ?     R    15:15   0:00 /usr/libexec/platform-python /usr/bin/ansible-playbook-3 /var/lib/mistral/overcloud/scale_playbook.yaml --limit overcloud-fc640compute-32,overcloud-fc640compute-20,overcloud-fc640compute-118,overcloud-fc640compute-123,overcloud-fc640compute-165 --become --timeout 600 --inventory-file /var/lib/mistral/overcloud/tripleo-ansible-inventory.yaml --limit !overcloud-fc640compute-20:!overcloud-fc640compute-32:!overcloud-fc640compute-118:!overcloud-fc640compute-123:!overcloud-fc640compute-165

Version-Release number of selected component (if applicable):
python3-tripleo-common-11.3.3-0.20200611110655.f7715be.el8ost.noarch
openstack-tripleo-common-11.3.3-0.20200611110655.f7715be.el8ost.noarch
openstack-tripleo-validations-11.3.2-0.20200611115252.08f469d.el8ost.noarch
ansible-tripleo-ipa-0.2.1-0.20200611104546.c22fc8d.el8ost.noarch
ansible-tripleo-ipsec-9.2.1-0.20200311073016.0c8693c.el8ost.noarch
puppet-tripleo-11.5.0-0.20200616033427.8ff1c6a.el8ost.noarch
openstack-tripleo-puppet-elements-11.2.2-0.20200527003426.226ce95.el8ost.noarch
python3-tripleoclient-12.3.2-0.20200615103427.6f877f6.el8ost.noarch
ansible-role-tripleo-modify-image-1.2.1-0.20200527233426.bc21900.el8ost.noarch
openstack-tripleo-common-containers-11.3.3-0.20200611110655.f7715be.el8ost.noarch
openstack-tripleo-heat-templates-11.3.2-0.20200616081529.396affd.el8ost.noarch
tripleo-ansible-0.5.1-0.20200611113655.34b8fcc.el8ost.noarch
python3-tripleoclient-heat-installer-12.3.2-0.20200615103427.6f877f6.el8ost.noarch
openstack-tripleo-image-elements-10.6.2-0.20200528043425.7dc0fa1.el8ost.noarch

How reproducible:
Always if DeploymentServerBlacklist is defined

Steps to Reproduce:
1. deploy overcloud with multiple compute nodes
2. configure DeploymentServerBlacklist with one of the compute nodes
3. Scale down defined compute nodes


Actual results:
nova-compute service removed from all computes not in the DeploymentServerBlacklist

Expected results:
nova-compute should only be removed from the targeted nodes for removal


Additional info:

Comment 7 David Rosenfeld 2020-07-31 17:59:07 UTC
Deployed 3control,3compute,1ceph
Created blacklist.yaml for compute-1 and updated stack
Performed scale down for compute-0
ansible command only contained one limit command:

ansible-playbook-3 /var/lib/mistral/overcloud/scale_playbook.yaml --limit compute-0 --become --timeout 600 --inventory-f
ile /var/lib/mistral/overcloud/tripleo-ansible-inventory.yaml "$@"

Compute-2 is still seen:
 
openstack compute service list
+--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+
| ID                                   | Binary         | Host                      | Zone     | Status  | State | Updated At                 |
+--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+
| 247c5330-b015-48cd-92c2-70d9e5247033 | nova-conductor | controller-0.redhat.local | internal | enabled | up    | 2020-07-30T20:11:49.000000 |
| 8c5159a7-6527-4a5b-b682-830b027010b8 | nova-conductor | controller-1.redhat.local | internal | enabled | up    | 2020-07-30T20:11:50.000000 |
| 84453328-e2d3-46d0-8ed8-5836f21cdcd7 | nova-conductor | controller-2.redhat.local | internal | enabled | up    | 2020-07-30T20:11:51.000000 |
| 9c51b639-d49b-43dd-b16d-910fca473d2b | nova-scheduler | controller-1.redhat.local | internal | enabled | up    | 2020-07-30T20:11:54.000000 |
| 463b5060-270f-4d56-9a1f-a4fd88e5cb20 | nova-scheduler | controller-0.redhat.local | internal | enabled | up    | 2020-07-30T20:11:54.000000 |
| 487be8ce-8056-4bb0-b858-23266a0fdda6 | nova-scheduler | controller-2.redhat.local | internal | enabled | up    | 2020-07-30T20:11:55.000000 |
| ee584a44-244a-4f1d-b752-b473fa1a1faf | nova-compute   | compute-1.redhat.local    | nova     | enabled | up    | 2020-07-30T20:11:52.000000 |
| 2c2272a5-0bac-4257-9e7b-89912780a7cf | nova-compute   | compute-2.redhat.local    | nova     | enabled | up    | 2020-07-30T20:11:53.000000 |
+--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+

Comment 10 errata-xmlrpc 2020-08-27 15:19:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3542