Bug 2092063

Summary: OSP17 Delete of unreachable node fails
Product: Red Hat OpenStack Reporter: David Rosenfeld <drosenfe>
Component: python-tripleoclientAssignee: Rabi Mishra <ramishra>
Status: CLOSED ERRATA QA Contact: David Rosenfeld <drosenfe>
Severity: high Docs Contact:
Priority: high    
Version: 17.0 (Wallaby)CC: cjeanner, hbrock, jschluet, jslagle, mburns, ramishra, slinaber
Target Milestone: betaKeywords: Triaged
Target Release: 17.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-tripleoclient-16.4.1-0.20220629155516.d451aaa.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-21 12:22:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Rosenfeld 2022-05-31 16:33:40 UTC
Description of problem: used directions here (scroll to bottom) to delete an unreachable compute node:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html#deploying-the-overcloud

The delete fails due to the node being unreachable. The directions do not say anything different needs to be done to delete an unreachable node.

This instance is used in baremetal_deployment.yaml:

 - hostname: compute-1
    name: compute-1
    provisioned: false


undercloud) [stack@undercloud-0 ~]$ openstack baremetal node power off compute-1
/usr/lib/python3.9/site-packages/ansible/_vendor/__init__.py:42: UserWarning: One or more Python packages bundled by this ansible-core distribution were already loaded (pyparsing). This may result in undefined behavior.
  warnings.warn('One or more Python packages bundled by this ansible-core distribution were already '

(undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list
/usr/lib/python3.9/site-packages/ansible/_vendor/__init__.py:42: UserWarning: One or more Python packages bundled by this ansible-core distribution were already loaded (pyparsing). This may result in undefined behavior.
  warnings.warn('One or more Python packages bundled by this ansible-core distribution were already '
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| cddeb6f2-3b87-4f9b-a569-2cda2620b37e | ceph-0       | 2332a816-606c-40d2-93b2-03912f353982 | power on    | active             | False       |
| 437d8f0d-5e5e-430e-9d8a-73d77304461f | compute-0    | None                                 | power off   | available          | False       |
| 9526fc3d-5df2-47b7-a413-59e8d56ac39d | compute-1    | 3c08a53d-03fa-4897-ade4-8725b31ed44d | power off   | active             | False       |
| d49acfbd-95c0-4ead-9ebd-d758bcc87952 | compute-2    | 2dad81dd-749b-47fa-bd22-8a85696505b1 | power on    | active             | False       |
| d42c9628-8ef4-41a3-b6bf-33fabb661967 | controller-0 | 15a325a6-68b2-4a7a-bcb7-28d33e0126f5 | power on    | active             | False       |
| 6d98a2f5-da20-42d6-9da5-cbe45ea86358 | controller-1 | c163d432-3688-493c-8f46-2671cabaa3a8 | power on    | active             | False       |
| c7bc3f00-9104-4ed8-8738-d1f3769ecac0 | controller-2 | d0cff020-feb1-4926-9924-f960542bb38c | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+




(undercloud) [stack@undercloud-0 ~]$ openstack overcloud node delete  --stack overcloud --baremetal-deployment /home/stack/virt/network/baremetal_deployment.yaml
/usr/lib/python3.9/site-packages/ansible/_vendor/__init__.py:42: UserWarning: One or more Python packages bundled by this ansible-core distribution were already loaded (pyparsing). This may result in undefined behavior.
  warnings.warn('One or more Python packages bundled by this ansible-core distribution were already '
[DEPRECATION WARNING]: ANSIBLE_CALLBACK_WHITELIST option, normalizing names to 
new standard, use ANSIBLE_CALLBACKS_ENABLED instead. This feature will be 
removed from ansible-core in version 2.15. Deprecation warnings can be disabled
 by setting deprecation_warnings=False in ansible.cfg.

PLAY [Overcloud Node Unprovision] **********************************************
2022-05-31 15:51:33.215066 | 525400c0-f3c0-88fe-cbcb-000000000008 |    SKIPPED | fail | localhost
2022-05-31 15:51:33.216423 | 525400c0-f3c0-88fe-cbcb-000000000008 |     TIMING | fail | localhost | 0:00:00.088713 | 0.04s
2022-05-31 15:51:33.260320 | 525400c0-f3c0-88fe-cbcb-000000000009 |    SKIPPED | fail | localhost
2022-05-31 15:51:33.262243 | 525400c0-f3c0-88fe-cbcb-000000000009 |     TIMING | fail | localhost | 0:00:00.134533 | 0.04s
2022-05-31 15:51:33.309222 | 525400c0-f3c0-88fe-cbcb-00000000000a |    SKIPPED | fail | localhost
2022-05-31 15:51:33.310852 | 525400c0-f3c0-88fe-cbcb-00000000000a |     TIMING | fail | localhost | 0:00:00.183117 | 0.04s
2022-05-31 15:51:33.323024 | 525400c0-f3c0-88fe-cbcb-00000000000c |       TASK | Expand roles
2022-05-31 15:51:34.577774 | 525400c0-f3c0-88fe-cbcb-00000000000c |    CHANGED | Expand roles | localhost
2022-05-31 15:51:34.579922 | 525400c0-f3c0-88fe-cbcb-00000000000c |     TIMING | Expand roles | localhost | 0:00:01.452214 | 1.26s
2022-05-31 15:51:34.587421 | 525400c0-f3c0-88fe-cbcb-00000000000d |       TASK | Find existing instances
2022-05-31 15:51:36.788735 | 525400c0-f3c0-88fe-cbcb-00000000000d |         OK | Find existing instances | localhost
2022-05-31 15:51:36.791012 | 525400c0-f3c0-88fe-cbcb-00000000000d |     TIMING | Find existing instances | localhost | 0:00:03.663303 | 2.20s
2022-05-31 15:51:36.799056 | 525400c0-f3c0-88fe-cbcb-00000000000e |       TASK | Write unprovision confirmation
2022-05-31 15:51:37.708836 | 525400c0-f3c0-88fe-cbcb-00000000000e |    CHANGED | Write unprovision confirmation | localhost
2022-05-31 15:51:37.710755 | 525400c0-f3c0-88fe-cbcb-00000000000e |     TIMING | Write unprovision confirmation | localhost | 0:00:04.583047 | 0.91s
2022-05-31 15:51:37.716332 | 525400c0-f3c0-88fe-cbcb-00000000000f |       TASK | Unprovision instances
2022-05-31 15:51:37.742690 | 525400c0-f3c0-88fe-cbcb-00000000000f |    SKIPPED | Unprovision instances | localhost
2022-05-31 15:51:37.743813 | 525400c0-f3c0-88fe-cbcb-00000000000f |     TIMING | Unprovision instances | localhost | 0:00:04.616105 | 0.03s
2022-05-31 15:51:37.749045 | 525400c0-f3c0-88fe-cbcb-000000000010 |       TASK | Unprovision instance network ports
2022-05-31 15:51:37.774711 | 525400c0-f3c0-88fe-cbcb-000000000010 |    SKIPPED | Unprovision instance network ports | localhost
2022-05-31 15:51:37.775966 | 525400c0-f3c0-88fe-cbcb-000000000010 |     TIMING | Unprovision instance network ports | localhost | 0:00:04.648260 | 0.03s

PLAY RECAP *********************************************************************
localhost                  : ok=3    changed=2    unreachable=0    failed=0    skipped=5    rescued=0    ignored=0   
2022-05-31 15:51:37.788688 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:51:37.789307 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 8          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:51:37.789867 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:00:04.662173 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:51:37.790359 |                                 UUID |       Info |       Host |   Task Name |   Run Time
2022-05-31 15:51:37.790840 | 525400c0-f3c0-88fe-cbcb-00000000000d |    SUMMARY |  localhost | Find existing instances | 2.20s
2022-05-31 15:51:37.791287 | 525400c0-f3c0-88fe-cbcb-00000000000c |    SUMMARY |  localhost | Expand roles | 1.26s
2022-05-31 15:51:37.791778 | 525400c0-f3c0-88fe-cbcb-00000000000e |    SUMMARY |  localhost | Write unprovision confirmation | 0.91s
2022-05-31 15:51:37.792254 | 525400c0-f3c0-88fe-cbcb-00000000000a |    SUMMARY |  localhost | fail | 0.04s
2022-05-31 15:51:37.792776 | 525400c0-f3c0-88fe-cbcb-000000000009 |    SUMMARY |  localhost | fail | 0.04s
2022-05-31 15:51:37.793320 | 525400c0-f3c0-88fe-cbcb-000000000008 |    SUMMARY |  localhost | fail | 0.04s
2022-05-31 15:51:37.793902 | 525400c0-f3c0-88fe-cbcb-00000000000f |    SUMMARY |  localhost | Unprovision instances | 0.03s
2022-05-31 15:51:37.794503 | 525400c0-f3c0-88fe-cbcb-000000000010 |    SUMMARY |  localhost | Unprovision instance network ports | 0.03s
2022-05-31 15:51:37.795070 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-----------+-----------+--------------------------------------+
| hostname  | name      | id                                   |
+-----------+-----------+--------------------------------------+
| compute-1 | compute-1 | 9526fc3d-5df2-47b7-a413-59e8d56ac39d |
+-----------+-----------+--------------------------------------+

Are you sure you want to delete these overcloud nodes [y/N]? y
[DEPRECATION WARNING]: ANSIBLE_CALLBACK_WHITELIST option, normalizing names to 
new standard, use ANSIBLE_CALLBACKS_ENABLED instead. This feature will be 
removed from ansible-core in version 2.15. Deprecation warnings can be disabled
 by setting deprecation_warnings=False in ansible.cfg.

PLAY [Check if required variables are defined] *********************************
skipping: no hosts matched

PLAY [Clear cached facts] ******************************************************

PLAY [Gather facts] ************************************************************
2022-05-31 15:51:44.924762 | 525400c0-f3c0-b250-b407-00000000003e |       TASK | Gathering Facts
[WARNING]: Unhandled error in Python interpreter discovery for host compute-1:
Failed to connect to the host via ssh: ssh: connect to host 192.168.24.20 port
22: No route to host
2022-05-31 15:52:25.402838 | 525400c0-f3c0-b250-b407-00000000003e | UNREACHABLE | Gathering Facts | compute-1
2022-05-31 15:52:25.404282 | 525400c0-f3c0-b250-b407-00000000003e |     TIMING | Gathering Facts | compute-1 | 0:00:40.582438 | 40.48s

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
compute-1                  : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
2022-05-31 15:52:25.409865 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:52:25.410409 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:52:25.410953 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:00:40.589116 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:52:25.411646 |                                 UUID |       Info |       Host |   Task Name |   Run Time
2022-05-31 15:52:25.412137 | 525400c0-f3c0-b250-b407-00000000003e |    SUMMARY |  compute-1 | Gathering Facts | 40.48s
2022-05-31 15:52:25.412945 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:52:25.413633 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:52:25.414352 | ~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 1 ~~~~~~~~~~~~~~~~~
2022-05-31 15:52:25.415004 |  The following node(s) had failures: compute-1
2022-05-31 15:52:25.415749 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ansible execution failed. playbook: /home/stack/overcloud-deploy/overcloud/config-download/overcloud/scale_playbook.yaml, Run Status: failed, Return Code: 4, To rerun the failed command manually execute the following script: /home/stack/overcloud-deploy/overcloud/config-download/overcloud/ansible-playbook-command.sh
Exception occured while running the command


(undercloud) [stack@undercloud-0 ~]$ metalsmith list
+--------------------------------------+--------------+--------------------------------------+--------------+--------+------------------------+
| UUID                                 | Node Name    | Allocation UUID                      | Hostname     | State  | IP Addresses           |
+--------------------------------------+--------------+--------------------------------------+--------------+--------+------------------------+
| cddeb6f2-3b87-4f9b-a569-2cda2620b37e | ceph-0       | 2332a816-606c-40d2-93b2-03912f353982 | ceph-0       | ACTIVE | ctlplane=192.168.24.49 |
| 9526fc3d-5df2-47b7-a413-59e8d56ac39d | compute-1    | 3c08a53d-03fa-4897-ade4-8725b31ed44d | compute-1    | ACTIVE | ctlplane=192.168.24.20 |
| d49acfbd-95c0-4ead-9ebd-d758bcc87952 | compute-2    | 2dad81dd-749b-47fa-bd22-8a85696505b1 | compute-2    | ACTIVE | ctlplane=192.168.24.32 |
| d42c9628-8ef4-41a3-b6bf-33fabb661967 | controller-0 | 15a325a6-68b2-4a7a-bcb7-28d33e0126f5 | controller-2 | ACTIVE | ctlplane=192.168.24.39 |
| 6d98a2f5-da20-42d6-9da5-cbe45ea86358 | controller-1 | c163d432-3688-493c-8f46-2671cabaa3a8 | controller-0 | ACTIVE | ctlplane=192.168.24.25 |
| c7bc3f00-9104-4ed8-8738-d1f3769ecac0 | controller-2 | d0cff020-feb1-4926-9924-f960542bb38c | controller-1 | ACTIVE | ctlplane=192.168.24.10 |
+--------------------------------------+--------------+--------------------------------------+--------------+--------+------------------------+
(undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list
/usr/lib/python3.9/site-packages/ansible/_vendor/__init__.py:42: UserWarning: One or more Python packages bundled by this ansible-core distribution were already loaded (pyparsing). This may result in undefined behavior.
  warnings.warn('One or more Python packages bundled by this ansible-core distribution were already '
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| cddeb6f2-3b87-4f9b-a569-2cda2620b37e | ceph-0       | 2332a816-606c-40d2-93b2-03912f353982 | power on    | active             | False       |
| 437d8f0d-5e5e-430e-9d8a-73d77304461f | compute-0    | None                                 | power off   | available          | False       |
| 9526fc3d-5df2-47b7-a413-59e8d56ac39d | compute-1    | 3c08a53d-03fa-4897-ade4-8725b31ed44d | power off   | active             | False       |
| d49acfbd-95c0-4ead-9ebd-d758bcc87952 | compute-2    | 2dad81dd-749b-47fa-bd22-8a85696505b1 | power on    | active             | False       |
| d42c9628-8ef4-41a3-b6bf-33fabb661967 | controller-0 | 15a325a6-68b2-4a7a-bcb7-28d33e0126f5 | power on    | active             | False       |
| 6d98a2f5-da20-42d6-9da5-cbe45ea86358 | controller-1 | c163d432-3688-493c-8f46-2671cabaa3a8 | power on    | active             | False       |
| c7bc3f00-9104-4ed8-8738-d1f3769ecac0 | controller-2 | d0cff020-feb1-4926-9924-f960542bb38c | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+



Version-Release number of selected component (if applicable): RHOS-17.0-RHEL-9-20220526.n.0


How reproducible: Every time


Steps to Reproduce:
1. Power off a node and the use command openstack overcloud node delete to delete the node.
2.
3.

Actual results: Unreachable node is not deleted.


Expected results: Unreachable node is deleted.


Additional info:

Comment 5 David Rosenfeld 2022-07-11 12:49:47 UTC
This is from logs of Phase 3 job that deletes an unreachable node:

- Power off compute-0:

2022-07-09 02:53:52.843 | TASK [wait for node "compute-0" to go down] ************************************
2022-07-09 02:53:52.845 | task path: /home/rhos-ci/jenkins/workspace/DFG-df-rfe-17.0-virsh-3cont_3comp_1ceph-blacklist-1compute-scaledown/infrared/plugins/cloud-config/post_tasks/scale_down.yml:75
2022-07-09 02:53:52.848 | Saturday 09 July 2022  02:53:52 +0000 (0:00:03.585)       0:00:40.651 ********* 
2022-07-09 02:53:56.216 | FAILED - RETRYING: wait for node "compute-0" to go down (20 retries left).
2022-07-09 02:54:02.775 | FAILED - RETRYING: wait for node "compute-0" to go down (19 retries left).
2022-07-09 02:54:09.257 | changed: [undercloud-0] => {
2022-07-09 02:54:09.259 |     "attempts": 3,
2022-07-09 02:54:09.261 |     "changed": true,
2022-07-09 02:54:09.263 |     "cmd": "source ~/stackrc\nopenstack baremetal node show compute-0 -c power_state -f value\n",
2022-07-09 02:54:09.265 |     "delta": "0:00:03.226230",
2022-07-09 02:54:09.267 |     "end": "2022-07-09 02:54:09.225864",
2022-07-09 02:54:09.270 |     "rc": 0,
2022-07-09 02:54:09.272 |     "start": "2022-07-09 02:54:05.999634"
2022-07-09 02:54:09.273 | }
2022-07-09 02:54:09.275 | 
2022-07-09 02:54:09.277 | STDOUT:
2022-07-09 02:54:09.280 | 
2022-07-09 02:54:09.282 | power off


- Execute delete command:
openstack overcloud node delete -y --stack overcloud --baremetal-deployment \"/home/stack/virt/network/baremetal_deployment.yaml

- See in logs that it couldn't reach compute-0:
2022-07-09 03:10:14.114 | PLAY [Gather facts] ************************************************************
2022-07-09 03:10:14.116 | 2022-07-09 02:54:29.903770 | 52540072-e6b4-b269-807d-00000000003e |       TASK | Gathering Facts
2022-07-09 03:10:14.118 | [WARNING]: Unhandled error in Python interpreter discovery for host compute-0:
2022-07-09 03:10:14.120 | Failed to connect to the host via ssh: ssh: connect to host 192.168.24.25 port
2022-07-09 03:10:14.122 | 22: No route to host
2022-07-09 03:10:14.124 | 2022-07-09 03:09:57.027282 | 52540072-e6b4-b269-807d-00000000003e | UNREACHABLE | Gathering Facts | compute-0
2022-07-09 03:10:14.126 | 2022-07-09 03:09:57.029641 | 52540072-e6b4-b269-807d-00000000003e |     TIMING | Gathering Facts | compute-0 | 0:15:27.219228 | 927.12s
2022-07-09 03:10:14.128 | 
2022-07-09 03:10:14.130 | NO MORE HOSTS LEFT *************************************************************

- see that compute-0 was set to available:

 [stack@undercloud-0 ~]$  openstack baremetal node list | grep compute-0
| 6696911a-589f-4b7a-ac0f-dc36e5651545 | compute-0    | None                                 | power off   | available          | False       |

Comment 10 errata-xmlrpc 2022-09-21 12:22:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543