Bug 1968115 - Unable to blacklist faulty compute node during rhosp13 to 16.1 ffu
Summary: Unable to blacklist faulty compute node during rhosp13 to 16.1 ffu
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openshift-heat-templates
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: RHOS Maint
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-05 13:18 UTC by rbsshasha
Modified: 2022-08-17 13:49 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-30 12:30:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-4397 0 None None None 2022-08-17 13:49:00 UTC

Description rbsshasha 2021-06-05 13:18:16 UTC
Description of problem:During fast forward upgrade I am unable to blacklist faulty compute node, I executed upgrade prepare command again and included blacklist.yaml file,But upgrade prepare times out for faulty compute node ip.

Version-Release number of selected component (if applicable):
RHEL Version:Red Hat Enterprise Linux release 8.2 (Ootpa)
RHOSP Version:16.1


Steps to Reproduce:
Below is the command I am using for upgrade prepare:
nohup overcloud upgrade prepare 
--templates /home/stack/openstack-tripleo-heat-templates-rendered_16 
-r /home/stack/templates/roles_data.yaml 
-n /home/stack/templates/network_data.yaml 
-e /home/stack/containers-prepare-parameter.yaml 
-e /home/stack/templates/upgrades-environment.yaml 
-e /home/stack/templates/rhsm.yml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/network-isolation.yaml 
-e /home/stack/templates/network-environment.yaml 
-e /home/stack/templates/node-info.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/services/neutron-sriov.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/services/neutron-ovs.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/ceph-ansible/ceph-ansible.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/cinder-backup.yaml 
-e /home/stack/templates/storage-config.yaml
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/host-config-and-reboot.yaml 
-e /home/stack/templates/blacklist.yaml 
--libvirt-type kvm 
--ntp-server pool.ntp.org -v -y &

Below is the error I am getting during upgrade prepare command execution:
2021-06-04 15:24:51.440 106715 WARNING tripleoclient.plugin [-] Waiting for messages on queue 'tripleo' with no timeout.^[[00m
Overcloud Endpoint: http://192.168.21.127:5000
Overcloud Horizon Dashboard URL: http://192.168.21.127:80/dashboard
Overcloud rc file: /home/stack/templates/overcloudrc
Overcloud Deployed without error
Success
Enabling ssh admin (tripleo-admin) for hosts:
192.168.100.206 192.168.100.223 192.168.100.208 192.168.100.232 192.168.100.204 192.168.100.202 192.168.100.220 192.168.100.222
Using ssh user heat-admin for initial connection.
Using ssh key at /home/stack/.ssh/id_rsa for initial connection.
2021-06-04 15:26:57.296 106715 INFO tripleoclient.v1.overcloud_upgrade.UpgradePrepare [-] ssh-keygen has been run successfully^[[00m
Warning: Permanently added '192.168.100.206' (ECDSA) to the list of known hosts.^M
Warning: Permanently added '192.168.100.223' (ECDSA) to the list of known hosts.^M
Warning: Permanently added '192.168.100.208' (ECDSA) to the list of known hosts.^M
Warning: Permanently added '192.168.100.232' (ECDSA) to the list of known hosts.^M
Inserting TripleO short term key for 192.168.100.206
Inserting TripleO short term key for 192.168.100.223
Inserting TripleO short term key for 192.168.100.208
Inserting TripleO short term key for 192.168.100.232
Removing short term keys locally
2021-06-04 15:37:02.954 106715 ERROR openstack [-] Timed out waiting for port 22 from 192.168.100.204: tripleoclient.exceptions.DeploymentError: Timed out waiting for port 22 from 192.168.100.204^[[00m
2021-06-04 15:37:02.984 106715 INFO osc_lib.shell [-] END return value: 1^[[00m

[stack@manager ~]$ . stackrc
(undercloud) [stack@manager ~]$ openstack server list
+--------------------------------------+--------------------------+---------+--------------------------+----------------+--------------+
| ID                                   | Name                     | Status  | Networks                 | Image          | Flavor       |
+--------------------------------------+--------------------------+---------+--------------------------+----------------+--------------+
| e5f268b2-3f75-4a19-81f6-edd04726c7e9 | overcloud-computesriov-0 | ACTIVE  | ctlplane=192.168.100.232 | overcloud-full | compute      |
| eea9bd9f-3598-41db-8acf-7c207ce5b467 | overcloud-cephstorage-1  | ACTIVE  | ctlplane=192.168.100.220 | overcloud-full | ceph-storage |
| 00e5f7a9-3ac3-4fa4-9fbf-4e88d5bd6a8a | overcloud-cephstorage-2  | ACTIVE  | ctlplane=192.168.100.222 | overcloud-full | ceph-storage |
| 0563456d-7ad7-471d-bf26-c43417030793 | overcloud-controller-2   | ACTIVE  | ctlplane=192.168.100.208 | overcloud-full | control      |
| 860a23a5-842c-434f-a1ab-db3a0b8a493e | overcloud-controller-0   | ACTIVE  | ctlplane=192.168.100.206 | overcloud-full | control      |
| de5441d0-1807-4779-957c-3b93f8ce0b19 | overcloud-controller-1   | ACTIVE  | ctlplane=192.168.100.223 | overcloud-full | control      |
| e5818eb5-9ed1-4b19-8959-5fecbca371f3 | overcloud-computesriov-1 | SHUTOFF | ctlplane=192.168.100.204 | overcloud-full | compute      | <<-- faulty compute node
| f2aa4afd-e5f6-43f9-97d8-43c8967e1773 | overcloud-cephstorage-0  | ACTIVE  | ctlplane=192.168.100.202 | overcloud-full | ceph-storage |
+--------------------------------------+--------------------------+---------+--------------------------+----------------+--------------+

(undercloud) [stack@manager templates]$ cat blacklist.yaml
parameter_defaults:
  DeploymentServerBlacklist:
    - overcloud-computesriov-1

Actual results:
After adding blacklist.yaml file and running upgrade prepare should be successful.
 
Expected results:
upgrade prepare command fails with exit code 1 because of blacklisted compute node ip is not reachable.


Note You need to log in before you can comment on or make changes to this bug.