Bug 2092063 - OSP17 Delete of unreachable node fails
Summary: OSP17 Delete of unreachable node fails
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 17.0
Assignee: Rabi Mishra
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-31 16:33 UTC by David Rosenfeld
Modified: 2022-09-21 12:22 UTC (History)
7 users (show)

Fixed In Version: python-tripleoclient-16.4.1-0.20220629155516.d451aaa.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-21 12:22:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 846598 0 None stable/wallaby: MERGED python-tripleoclient: Ignore unreachable errors for scale playbook (Iea2022bf8dd27eb8762158c04cd3e0186da4fe0c) 2022-06-28 14:36:36 UTC
Red Hat Issue Tracker OSP-15455 0 None None None 2022-05-31 16:38:51 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:22:37 UTC

Description David Rosenfeld 2022-05-31 16:33:40 UTC
Description of problem: used directions here (scroll to bottom) to delete an unreachable compute node:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html#deploying-the-overcloud

The delete fails due to the node being unreachable. The directions do not say anything different needs to be done to delete an unreachable node.

This instance is used in baremetal_deployment.yaml:

 - hostname: compute-1
    name: compute-1
    provisioned: false


undercloud) [stack@undercloud-0 ~]$ openstack baremetal node power off compute-1
/usr/lib/python3.9/site-packages/ansible/_vendor/__init__.py:42: UserWarning: One or more Python packages bundled by this ansible-core distribution were already loaded (pyparsing). This may result in undefined behavior.
  warnings.warn('One or more Python packages bundled by this ansible-core distribution were already '

(undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list
/usr/lib/python3.9/site-packages/ansible/_vendor/__init__.py:42: UserWarning: One or more Python packages bundled by this ansible-core distribution were already loaded (pyparsing). This may result in undefined behavior.
  warnings.warn('One or more Python packages bundled by this ansible-core distribution were already '
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| cddeb6f2-3b87-4f9b-a569-2cda2620b37e | ceph-0       | 2332a816-606c-40d2-93b2-03912f353982 | power on    | active             | False       |
| 437d8f0d-5e5e-430e-9d8a-73d77304461f | compute-0    | None                                 | power off   | available          | False       |
| 9526fc3d-5df2-47b7-a413-59e8d56ac39d | compute-1    | 3c08a53d-03fa-4897-ade4-8725b31ed44d | power off   | active             | False       |
| d49acfbd-95c0-4ead-9ebd-d758bcc87952 | compute-2    | 2dad81dd-749b-47fa-bd22-8a85696505b1 | power on    | active             | False       |
| d42c9628-8ef4-41a3-b6bf-33fabb661967 | controller-0 | 15a325a6-68b2-4a7a-bcb7-28d33e0126f5 | power on    | active             | False       |
| 6d98a2f5-da20-42d6-9da5-cbe45ea86358 | controller-1 | c163d432-3688-493c-8f46-2671cabaa3a8 | power on    | active             | False       |
| c7bc3f00-9104-4ed8-8738-d1f3769ecac0 | controller-2 | d0cff020-feb1-4926-9924-f960542bb38c | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+




(undercloud) [stack@undercloud-0 ~]$ openstack overcloud node delete  --stack overcloud --baremetal-deployment /home/stack/virt/network/baremetal_deployment.yaml
/usr/lib/python3.9/site-packages/ansible/_vendor/__init__.py:42: UserWarning: One or more Python packages bundled by this ansible-core distribution were already loaded (pyparsing). This may result in undefined behavior.
  warnings.warn('One or more Python packages bundled by this ansible-core distribution were already '
[DEPRECATION WARNING]: ANSIBLE_CALLBACK_WHITELIST option, normalizing names to 
new standard, use ANSIBLE_CALLBACKS_ENABLED instead. This feature will be 
removed from ansible-core in version 2.15. Deprecation warnings can be disabled
 by setting deprecation_warnings=False in ansible.cfg.

PLAY [Overcloud Node Unprovision] **********************************************
2022-05-31 15:51:33.215066 | 525400c0-f3c0-88fe-cbcb-000000000008 |    SKIPPED | fail | localhost
2022-05-31 15:51:33.216423 | 525400c0-f3c0-88fe-cbcb-000000000008 |     TIMING | fail | localhost | 0:00:00.088713 | 0.04s
2022-05-31 15:51:33.260320 | 525400c0-f3c0-88fe-cbcb-000000000009 |    SKIPPED | fail | localhost
2022-05-31 15:51:33.262243 | 525400c0-f3c0-88fe-cbcb-000000000009 |     TIMING | fail | localhost | 0:00:00.134533 | 0.04s
2022-05-31 15:51:33.309222 | 525400c0-f3c0-88fe-cbcb-00000000000a |    SKIPPED | fail | localhost
2022-05-31 15:51:33.310852 | 525400c0-f3c0-88fe-cbcb-00000000000a |     TIMING | fail | localhost | 0:00:00.183117 | 0.04s
2022-05-31 15:51:33.323024 | 525400c0-f3c0-88fe-cbcb-00000000000c |       TASK | Expand roles
2022-05-31 15:51:34.577774 | 525400c0-f3c0-88fe-cbcb-00000000000c |    CHANGED | Expand roles | localhost
2022-05-31 15:51:34.579922 | 525400c0-f3c0-88fe-cbcb-00000000000c |     TIMING | Expand roles | localhost | 0:00:01.452214 | 1.26s
2022-05-31 15:51:34.587421 | 525400c0-f3c0-88fe-cbcb-00000000000d |       TASK | Find existing instances
2022-05-31 15:51:36.788735 | 525400c0-f3c0-88fe-cbcb-00000000000d |         OK | Find existing instances | localhost
2022-05-31 15:51:36.791012 | 525400c0-f3c0-88fe-cbcb-00000000000d |     TIMING | Find existing instances | localhost | 0:00:03.663303 | 2.20s
2022-05-31 15:51:36.799056 | 525400c0-f3c0-88fe-cbcb-00000000000e |       TASK | Write unprovision confirmation
2022-05-31 15:51:37.708836 | 525400c0-f3c0-88fe-cbcb-00000000000e |    CHANGED | Write unprovision confirmation | localhost
2022-05-31 15:51:37.710755 | 525400c0-f3c0-88fe-cbcb-00000000000e |     TIMING | Write unprovision confirmation | localhost | 0:00:04.583047 | 0.91s
2022-05-31 15:51:37.716332 | 525400c0-f3c0-88fe-cbcb-00000000000f |       TASK | Unprovision instances
2022-05-31 15:51:37.742690 | 525400c0-f3c0-88fe-cbcb-00000000000f |    SKIPPED | Unprovision instances | localhost
2022-05-31 15:51:37.743813 | 525400c0-f3c0-88fe-cbcb-00000000000f |     TIMING | Unprovision instances | localhost | 0:00:04.616105 | 0.03s
2022-05-31 15:51:37.749045 | 525400c0-f3c0-88fe-cbcb-000000000010 |       TASK | Unprovision instance network ports
2022-05-31 15:51:37.774711 | 525400c0-f3c0-88fe-cbcb-000000000010 |    SKIPPED | Unprovision instance network ports | localhost
2022-05-31 15:51:37.775966 | 525400c0-f3c0-88fe-cbcb-000000000010 |     TIMING | Unprovision instance network ports | localhost | 0:00:04.648260 | 0.03s

PLAY RECAP *********************************************************************
localhost                  : ok=3    changed=2    unreachable=0    failed=0    skipped=5    rescued=0    ignored=0   
2022-05-31 15:51:37.788688 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:51:37.789307 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 8          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:51:37.789867 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:00:04.662173 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:51:37.790359 |                                 UUID |       Info |       Host |   Task Name |   Run Time
2022-05-31 15:51:37.790840 | 525400c0-f3c0-88fe-cbcb-00000000000d |    SUMMARY |  localhost | Find existing instances | 2.20s
2022-05-31 15:51:37.791287 | 525400c0-f3c0-88fe-cbcb-00000000000c |    SUMMARY |  localhost | Expand roles | 1.26s
2022-05-31 15:51:37.791778 | 525400c0-f3c0-88fe-cbcb-00000000000e |    SUMMARY |  localhost | Write unprovision confirmation | 0.91s
2022-05-31 15:51:37.792254 | 525400c0-f3c0-88fe-cbcb-00000000000a |    SUMMARY |  localhost | fail | 0.04s
2022-05-31 15:51:37.792776 | 525400c0-f3c0-88fe-cbcb-000000000009 |    SUMMARY |  localhost | fail | 0.04s
2022-05-31 15:51:37.793320 | 525400c0-f3c0-88fe-cbcb-000000000008 |    SUMMARY |  localhost | fail | 0.04s
2022-05-31 15:51:37.793902 | 525400c0-f3c0-88fe-cbcb-00000000000f |    SUMMARY |  localhost | Unprovision instances | 0.03s
2022-05-31 15:51:37.794503 | 525400c0-f3c0-88fe-cbcb-000000000010 |    SUMMARY |  localhost | Unprovision instance network ports | 0.03s
2022-05-31 15:51:37.795070 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-----------+-----------+--------------------------------------+
| hostname  | name      | id                                   |
+-----------+-----------+--------------------------------------+
| compute-1 | compute-1 | 9526fc3d-5df2-47b7-a413-59e8d56ac39d |
+-----------+-----------+--------------------------------------+

Are you sure you want to delete these overcloud nodes [y/N]? y
[DEPRECATION WARNING]: ANSIBLE_CALLBACK_WHITELIST option, normalizing names to 
new standard, use ANSIBLE_CALLBACKS_ENABLED instead. This feature will be 
removed from ansible-core in version 2.15. Deprecation warnings can be disabled
 by setting deprecation_warnings=False in ansible.cfg.

PLAY [Check if required variables are defined] *********************************
skipping: no hosts matched

PLAY [Clear cached facts] ******************************************************

PLAY [Gather facts] ************************************************************
2022-05-31 15:51:44.924762 | 525400c0-f3c0-b250-b407-00000000003e |       TASK | Gathering Facts
[WARNING]: Unhandled error in Python interpreter discovery for host compute-1:
Failed to connect to the host via ssh: ssh: connect to host 192.168.24.20 port
22: No route to host
2022-05-31 15:52:25.402838 | 525400c0-f3c0-b250-b407-00000000003e | UNREACHABLE | Gathering Facts | compute-1
2022-05-31 15:52:25.404282 | 525400c0-f3c0-b250-b407-00000000003e |     TIMING | Gathering Facts | compute-1 | 0:00:40.582438 | 40.48s

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
compute-1                  : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
2022-05-31 15:52:25.409865 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:52:25.410409 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:52:25.410953 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:00:40.589116 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:52:25.411646 |                                 UUID |       Info |       Host |   Task Name |   Run Time
2022-05-31 15:52:25.412137 | 525400c0-f3c0-b250-b407-00000000003e |    SUMMARY |  compute-1 | Gathering Facts | 40.48s
2022-05-31 15:52:25.412945 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:52:25.413633 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-05-31 15:52:25.414352 | ~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 1 ~~~~~~~~~~~~~~~~~
2022-05-31 15:52:25.415004 |  The following node(s) had failures: compute-1
2022-05-31 15:52:25.415749 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ansible execution failed. playbook: /home/stack/overcloud-deploy/overcloud/config-download/overcloud/scale_playbook.yaml, Run Status: failed, Return Code: 4, To rerun the failed command manually execute the following script: /home/stack/overcloud-deploy/overcloud/config-download/overcloud/ansible-playbook-command.sh
Exception occured while running the command


(undercloud) [stack@undercloud-0 ~]$ metalsmith list
+--------------------------------------+--------------+--------------------------------------+--------------+--------+------------------------+
| UUID                                 | Node Name    | Allocation UUID                      | Hostname     | State  | IP Addresses           |
+--------------------------------------+--------------+--------------------------------------+--------------+--------+------------------------+
| cddeb6f2-3b87-4f9b-a569-2cda2620b37e | ceph-0       | 2332a816-606c-40d2-93b2-03912f353982 | ceph-0       | ACTIVE | ctlplane=192.168.24.49 |
| 9526fc3d-5df2-47b7-a413-59e8d56ac39d | compute-1    | 3c08a53d-03fa-4897-ade4-8725b31ed44d | compute-1    | ACTIVE | ctlplane=192.168.24.20 |
| d49acfbd-95c0-4ead-9ebd-d758bcc87952 | compute-2    | 2dad81dd-749b-47fa-bd22-8a85696505b1 | compute-2    | ACTIVE | ctlplane=192.168.24.32 |
| d42c9628-8ef4-41a3-b6bf-33fabb661967 | controller-0 | 15a325a6-68b2-4a7a-bcb7-28d33e0126f5 | controller-2 | ACTIVE | ctlplane=192.168.24.39 |
| 6d98a2f5-da20-42d6-9da5-cbe45ea86358 | controller-1 | c163d432-3688-493c-8f46-2671cabaa3a8 | controller-0 | ACTIVE | ctlplane=192.168.24.25 |
| c7bc3f00-9104-4ed8-8738-d1f3769ecac0 | controller-2 | d0cff020-feb1-4926-9924-f960542bb38c | controller-1 | ACTIVE | ctlplane=192.168.24.10 |
+--------------------------------------+--------------+--------------------------------------+--------------+--------+------------------------+
(undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list
/usr/lib/python3.9/site-packages/ansible/_vendor/__init__.py:42: UserWarning: One or more Python packages bundled by this ansible-core distribution were already loaded (pyparsing). This may result in undefined behavior.
  warnings.warn('One or more Python packages bundled by this ansible-core distribution were already '
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| cddeb6f2-3b87-4f9b-a569-2cda2620b37e | ceph-0       | 2332a816-606c-40d2-93b2-03912f353982 | power on    | active             | False       |
| 437d8f0d-5e5e-430e-9d8a-73d77304461f | compute-0    | None                                 | power off   | available          | False       |
| 9526fc3d-5df2-47b7-a413-59e8d56ac39d | compute-1    | 3c08a53d-03fa-4897-ade4-8725b31ed44d | power off   | active             | False       |
| d49acfbd-95c0-4ead-9ebd-d758bcc87952 | compute-2    | 2dad81dd-749b-47fa-bd22-8a85696505b1 | power on    | active             | False       |
| d42c9628-8ef4-41a3-b6bf-33fabb661967 | controller-0 | 15a325a6-68b2-4a7a-bcb7-28d33e0126f5 | power on    | active             | False       |
| 6d98a2f5-da20-42d6-9da5-cbe45ea86358 | controller-1 | c163d432-3688-493c-8f46-2671cabaa3a8 | power on    | active             | False       |
| c7bc3f00-9104-4ed8-8738-d1f3769ecac0 | controller-2 | d0cff020-feb1-4926-9924-f960542bb38c | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+



Version-Release number of selected component (if applicable): RHOS-17.0-RHEL-9-20220526.n.0


How reproducible: Every time


Steps to Reproduce:
1. Power off a node and the use command openstack overcloud node delete to delete the node.
2.
3.

Actual results: Unreachable node is not deleted.


Expected results: Unreachable node is deleted.


Additional info:

Comment 5 David Rosenfeld 2022-07-11 12:49:47 UTC
This is from logs of Phase 3 job that deletes an unreachable node:

- Power off compute-0:

2022-07-09 02:53:52.843 | TASK [wait for node "compute-0" to go down] ************************************
2022-07-09 02:53:52.845 | task path: /home/rhos-ci/jenkins/workspace/DFG-df-rfe-17.0-virsh-3cont_3comp_1ceph-blacklist-1compute-scaledown/infrared/plugins/cloud-config/post_tasks/scale_down.yml:75
2022-07-09 02:53:52.848 | Saturday 09 July 2022  02:53:52 +0000 (0:00:03.585)       0:00:40.651 ********* 
2022-07-09 02:53:56.216 | FAILED - RETRYING: wait for node "compute-0" to go down (20 retries left).
2022-07-09 02:54:02.775 | FAILED - RETRYING: wait for node "compute-0" to go down (19 retries left).
2022-07-09 02:54:09.257 | changed: [undercloud-0] => {
2022-07-09 02:54:09.259 |     "attempts": 3,
2022-07-09 02:54:09.261 |     "changed": true,
2022-07-09 02:54:09.263 |     "cmd": "source ~/stackrc\nopenstack baremetal node show compute-0 -c power_state -f value\n",
2022-07-09 02:54:09.265 |     "delta": "0:00:03.226230",
2022-07-09 02:54:09.267 |     "end": "2022-07-09 02:54:09.225864",
2022-07-09 02:54:09.270 |     "rc": 0,
2022-07-09 02:54:09.272 |     "start": "2022-07-09 02:54:05.999634"
2022-07-09 02:54:09.273 | }
2022-07-09 02:54:09.275 | 
2022-07-09 02:54:09.277 | STDOUT:
2022-07-09 02:54:09.280 | 
2022-07-09 02:54:09.282 | power off


- Execute delete command:
openstack overcloud node delete -y --stack overcloud --baremetal-deployment \"/home/stack/virt/network/baremetal_deployment.yaml

- See in logs that it couldn't reach compute-0:
2022-07-09 03:10:14.114 | PLAY [Gather facts] ************************************************************
2022-07-09 03:10:14.116 | 2022-07-09 02:54:29.903770 | 52540072-e6b4-b269-807d-00000000003e |       TASK | Gathering Facts
2022-07-09 03:10:14.118 | [WARNING]: Unhandled error in Python interpreter discovery for host compute-0:
2022-07-09 03:10:14.120 | Failed to connect to the host via ssh: ssh: connect to host 192.168.24.25 port
2022-07-09 03:10:14.122 | 22: No route to host
2022-07-09 03:10:14.124 | 2022-07-09 03:09:57.027282 | 52540072-e6b4-b269-807d-00000000003e | UNREACHABLE | Gathering Facts | compute-0
2022-07-09 03:10:14.126 | 2022-07-09 03:09:57.029641 | 52540072-e6b4-b269-807d-00000000003e |     TIMING | Gathering Facts | compute-0 | 0:15:27.219228 | 927.12s
2022-07-09 03:10:14.128 | 
2022-07-09 03:10:14.130 | NO MORE HOSTS LEFT *************************************************************

- see that compute-0 was set to available:

 [stack@undercloud-0 ~]$  openstack baremetal node list | grep compute-0
| 6696911a-589f-4b7a-ac0f-dc36e5651545 | compute-0    | None                                 | power off   | available          | False       |

Comment 10 errata-xmlrpc 2022-09-21 12:22:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.