Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1968115

Summary: Unable to blacklist faulty compute node during rhosp13 to 16.1 ffu
Product: Red Hat OpenStack Reporter: rbsshasha <rbs.shashank>
Component: openshift-heat-templatesAssignee: RHOS Maint <rhos-maint>
Status: CLOSED NOTABUG QA Contact: RHOS Maint <rhos-maint>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 16.1 (Train)CC: athomas, scollier, vchundur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-30 12:30:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description rbsshasha 2021-06-05 13:18:16 UTC
Description of problem:During fast forward upgrade I am unable to blacklist faulty compute node, I executed upgrade prepare command again and included blacklist.yaml file,But upgrade prepare times out for faulty compute node ip.

Version-Release number of selected component (if applicable):
RHEL Version:Red Hat Enterprise Linux release 8.2 (Ootpa)
RHOSP Version:16.1


Steps to Reproduce:
Below is the command I am using for upgrade prepare:
nohup overcloud upgrade prepare 
--templates /home/stack/openstack-tripleo-heat-templates-rendered_16 
-r /home/stack/templates/roles_data.yaml 
-n /home/stack/templates/network_data.yaml 
-e /home/stack/containers-prepare-parameter.yaml 
-e /home/stack/templates/upgrades-environment.yaml 
-e /home/stack/templates/rhsm.yml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/network-isolation.yaml 
-e /home/stack/templates/network-environment.yaml 
-e /home/stack/templates/node-info.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/services/neutron-sriov.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/services/neutron-ovs.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/ceph-ansible/ceph-ansible.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/cinder-backup.yaml 
-e /home/stack/templates/storage-config.yaml
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/host-config-and-reboot.yaml 
-e /home/stack/templates/blacklist.yaml 
--libvirt-type kvm 
--ntp-server pool.ntp.org -v -y &

Below is the error I am getting during upgrade prepare command execution:
2021-06-04 15:24:51.440 106715 WARNING tripleoclient.plugin [-] Waiting for messages on queue 'tripleo' with no timeout.^[[00m
Overcloud Endpoint: http://192.168.21.127:5000
Overcloud Horizon Dashboard URL: http://192.168.21.127:80/dashboard
Overcloud rc file: /home/stack/templates/overcloudrc
Overcloud Deployed without error
Success
Enabling ssh admin (tripleo-admin) for hosts:
192.168.100.206 192.168.100.223 192.168.100.208 192.168.100.232 192.168.100.204 192.168.100.202 192.168.100.220 192.168.100.222
Using ssh user heat-admin for initial connection.
Using ssh key at /home/stack/.ssh/id_rsa for initial connection.
2021-06-04 15:26:57.296 106715 INFO tripleoclient.v1.overcloud_upgrade.UpgradePrepare [-] ssh-keygen has been run successfully^[[00m
Warning: Permanently added '192.168.100.206' (ECDSA) to the list of known hosts.^M
Warning: Permanently added '192.168.100.223' (ECDSA) to the list of known hosts.^M
Warning: Permanently added '192.168.100.208' (ECDSA) to the list of known hosts.^M
Warning: Permanently added '192.168.100.232' (ECDSA) to the list of known hosts.^M
Inserting TripleO short term key for 192.168.100.206
Inserting TripleO short term key for 192.168.100.223
Inserting TripleO short term key for 192.168.100.208
Inserting TripleO short term key for 192.168.100.232
Removing short term keys locally
2021-06-04 15:37:02.954 106715 ERROR openstack [-] Timed out waiting for port 22 from 192.168.100.204: tripleoclient.exceptions.DeploymentError: Timed out waiting for port 22 from 192.168.100.204^[[00m
2021-06-04 15:37:02.984 106715 INFO osc_lib.shell [-] END return value: 1^[[00m

[stack@manager ~]$ . stackrc
(undercloud) [stack@manager ~]$ openstack server list
+--------------------------------------+--------------------------+---------+--------------------------+----------------+--------------+
| ID                                   | Name                     | Status  | Networks                 | Image          | Flavor       |
+--------------------------------------+--------------------------+---------+--------------------------+----------------+--------------+
| e5f268b2-3f75-4a19-81f6-edd04726c7e9 | overcloud-computesriov-0 | ACTIVE  | ctlplane=192.168.100.232 | overcloud-full | compute      |
| eea9bd9f-3598-41db-8acf-7c207ce5b467 | overcloud-cephstorage-1  | ACTIVE  | ctlplane=192.168.100.220 | overcloud-full | ceph-storage |
| 00e5f7a9-3ac3-4fa4-9fbf-4e88d5bd6a8a | overcloud-cephstorage-2  | ACTIVE  | ctlplane=192.168.100.222 | overcloud-full | ceph-storage |
| 0563456d-7ad7-471d-bf26-c43417030793 | overcloud-controller-2   | ACTIVE  | ctlplane=192.168.100.208 | overcloud-full | control      |
| 860a23a5-842c-434f-a1ab-db3a0b8a493e | overcloud-controller-0   | ACTIVE  | ctlplane=192.168.100.206 | overcloud-full | control      |
| de5441d0-1807-4779-957c-3b93f8ce0b19 | overcloud-controller-1   | ACTIVE  | ctlplane=192.168.100.223 | overcloud-full | control      |
| e5818eb5-9ed1-4b19-8959-5fecbca371f3 | overcloud-computesriov-1 | SHUTOFF | ctlplane=192.168.100.204 | overcloud-full | compute      | <<-- faulty compute node
| f2aa4afd-e5f6-43f9-97d8-43c8967e1773 | overcloud-cephstorage-0  | ACTIVE  | ctlplane=192.168.100.202 | overcloud-full | ceph-storage |
+--------------------------------------+--------------------------+---------+--------------------------+----------------+--------------+

(undercloud) [stack@manager templates]$ cat blacklist.yaml
parameter_defaults:
  DeploymentServerBlacklist:
    - overcloud-computesriov-1

Actual results:
After adding blacklist.yaml file and running upgrade prepare should be successful.
 
Expected results:
upgrade prepare command fails with exit code 1 because of blacklisted compute node ip is not reachable.