Bug 2148400

Summary: Failed to scale down or delete compute node which is in unreachable state.
Product: Red Hat OpenStack Reporter: Piyush Shukla <t_piyush.shukla>
Component: documentationAssignee: fallen
Status: CLOSED NOTABUG QA Contact:
Severity: low Docs Contact:
Priority: unspecified    
Version: 17.0 (Wallaby)CC: drosenfe, fallen, hjensas, rahulxp22, ramishra
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-30 12:20:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Piyush Shukla 2022-11-25 11:06:32 UTC
Document URL: 
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.0/html/director_installation_and_usage/assembly_scaling-overcloud-nodes#proc_scaling-down-bare-metal-nodes_scaling-overcloud-nodes

Section Number and Name: 
Section Number: 19.4
Section Name: Removing or replacing a Compute node

Describe the issue: 
Tried to delete compute node which is in shutdown and unreachable state as mentioned section 19.4 of RHOSP17.0 document.
But overcloud node delete command is failed with ssh issue.


Steps to reproduce:

Steps as mentioned in Section 19.4 of Document
1. Disable compute service for that node that we need to delete
(overcloud)[stack@manager ~]$ openstack compute service set overcloud-novacompute-0.example.com nova-compute --disable
 
2. Verify that the service is disable or not
(overcloud)[stack@manager ~]$ openstack compute service list
+--------------------------------------+----------------+-------------------------------------+----------+----------+-------+----------------------------+
| ID                                   | Binary         | Host                                | Zone     | Status   | State | Updated At                 |
+--------------------------------------+----------------+-------------------------------------+----------+----------+-------+----------------------------+
| 1d50d9f3-c871-4d95-ad1a-1692f98673e9 | nova-compute   | overcloud-novacompute-0.example.com | nova     | disabled | up    | 2022-11-24T12:27:48.000000 |
+--------------------------------------+----------------+-------------------------------------+----------+----------+-------+----------------------------+

3. Power off compute node through baremetal node command 
(undercloud) [stack@manager ~]$ openstack baremetal node power off a53f0d4b-3436-44ac-b83a-74a7413d4863

(undercloud) [stack@manager ~]$ openstack baremetal node list
+--------------------------------------+-------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name        | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+-------------+--------------------------------------+-------------+--------------------+-------------+
| a53f0d4b-3436-44ac-b83a-74a7413d4863 | compute2    | 39522868-301d-468d-a1e1-33e91a7e6e37 | power off   | active             | False       |
+--------------------------------------+-------------+--------------------------------------+-------------+--------------------+-------------+

4. Check ping and reachability of node  (node is unreachable)
(undercloud) [stack@manager ~]$ ping 192.168.100.188
PING 192.168.100.188 (192.168.100.188) 56(84) bytes of data.
From 192.168.100.30 icmp_seq=10 Destination Host Unreachable
From 192.168.100.30 icmp_seq=11 Destination Host Unreachable

5. Update node count in "overcloud-baremetal-deploy.yaml" file and add parameter "provisioned: false"
- name: Compute
  count: 1
  instances:
  - hostname: overcloud-novacompute-0
    name: compute2
    provisioned: false

6. Execute overcloud deployment command and command failed with ssh error mesg 
[WARNING]: Unhandled error in Python interpreter discovery for host
192.168.100.188: Failed to connect to the host via ssh: ssh: connect to host
192.168.100.188 port 22: No route to host
2022-11-24 18:56:27.812466 | 52540049-b3bc-047a-4d73-000000000038 |      FATAL | Wait for connection to become available | 192.168.100.188 | error={"changed": false, "elapsed": 2416, "msg": "timed out waiting for ping module test: Data could not be sent to remote host \"192.168.100.188\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.100.188 port 22: No route to host\r\n"}
2022-11-24 18:56:27.817609 | 52540049-b3bc-047a-4d73-000000000038 |     TIMING | Wait for connection to become available | 192.168.100.188 | 0:40:23.580331 | 2416.99s

7. Execute overcloud node delete command, this command also failed below error mesg
(overcloud)[stack@manager ~]$ openstack overcloud node delete --stack overcloud --baremetal-deployment /home/stack/templates/overcloud-baremetal-deploy.yaml


2022-11-24 19:09:01.596360 | 52540049-b3bc-81e2-d9b9-00000000000c |     TIMING | Expand roles | localhost | 0:00:02.554421 | 2.29s
2022-11-24 19:09:01.611634 | 52540049-b3bc-81e2-d9b9-00000000000d |       TASK | Find existing instances
2022-11-24 19:09:04.965473 | 52540049-b3bc-81e2-d9b9-00000000000d |      FATAL | Find existing instances | localhost | error={"changed": false, "msg": "Instance overcloud-novacompute-0 is not specified as pre-provisioned (managed: False), and no connection to the baremetal service was provided."}
2022-11-24 19:09:04.968353 | 52540049-b3bc-81e2-d9b9-00000000000d |     TIMING | Find existing instances | localhost | 0:00:05.926413 | 3.35s

8. Now as mentioned in document Section 19.4.1, if the overcloud node delete command failed due to an unreachable node then redirect towards manually node deletion procedure

9. Set Baremetal node to maintenance mode 
(undercloud) [stack@manager ~]$ openstack baremetal node maintenance set a53f0d4b-3436-44ac-b83a-74a7413d4863

10. Verify status
(undercloud) [stack@manager ~]$ openstack baremetal node list
+--------------------------------------+-------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name        | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+-------------+--------------------------------------+-------------+--------------------+-------------+
| ccf89e1a-b45f-437e-8e9f-3a605b614b1e | compute1    | fadcbce5-e925-4846-a996-f616a6b26ff5 | power on    | active             | False       |
| 825b1dc8-9875-44c5-b32c-2548d74797d4 | controller0 | b146a976-2258-4684-a0b3-157ecfe16beb | power on    | active             | False       |
| 316247a9-a06a-4e29-aefe-9a92ee77eb2c | controller1 | 96021f32-a27f-423b-ab07-ccc211f5875c | power on    | active             | False       |
| aace3156-d428-4464-a590-81d69a15d5d1 | controller2 | 9f802b46-8c21-44d7-b125-fb1700fe5d77 | power on    | active             | False       |
| 777a0672-ce34-49db-93ee-caec2a6dcf03 | storage1    | 05e73efe-139c-4635-9f1d-469832015355 | power on    | active             | False       |
| bc35858a-adf8-437c-8de3-d498bbff621d | storage2    | 17a28eb9-8b23-4c26-b27a-703cab2d50a8 | power on    | active             | False       |
| 54fff889-5d8b-496e-a0fb-affdea006bc1 | storage0    | 440fa182-7dc7-4ed4-bc1e-3695ed34e644 | power on    | active             | False       |
| a53f0d4b-3436-44ac-b83a-74a7413d4863 | compute2    | 39522868-301d-468d-a1e1-33e91a7e6e37 | power off   | active             | True        |
+--------------------------------------+-------------+--------------------------------------+-------------+--------------------+-------------+

11. Delete network agent for that node 
(overcloud) [stack@manager ~]$ for AGENT in $(openstack network agent list --host overcloud-novacompute-0.example.com  -c ID -f value) ; do openstack network agent delete $AGENT ; done

(overcloud) [stack@manager ~]$ openstack network agent list
+--------------------------------------+------------------------------+-------------------------------------+-------------------+-------+-------+----------------------------+
| ID                                   | Agent Type                   | Host                                | Availability Zone | Alive | State | Binary                     |
+--------------------------------------+------------------------------+-------------------------------------+-------------------+-------+-------+----------------------------+
| 8c0778ab-bf49-4a35-ba0a-4f9921c06536 | OVN Controller Gateway agent | overcloud-controller-0.example.com  |                   | :-)   | UP    | ovn-controller             |
| 8c433437-ee33-4970-a7ab-6777813c987d | OVN Controller agent         | overcloud-novacompute-1.example.com |                   | :-)   | UP    | ovn-controller             |
| 867828e6-b41e-5ba7-94d1-ff9eede53b01 | OVN Metadata agent           | overcloud-novacompute-1.example.com |                   | :-)   | UP    | neutron-ovn-metadata-agent |
| 3a60c798-8325-41ef-9551-b831f714b05b | OVN Controller Gateway agent | overcloud-controller-2.example.com  |                   | :-)   | UP    | ovn-controller             |
| 6f5ab488-3b77-4802-a798-5eb3b997a620 | OVN Controller Gateway agent | overcloud-controller-1.example.com  |                   | :-)   | UP    | ovn-controller             |
+--------------------------------------+------------------------------+-------------------------------------+-------------------+-------+-------+----------------------------+


12. Delete Resource provide list for that node 

(overcloud) [stack@manager ~]$ openstack resource provider list
+--------------------------------------+-------------------------------------+------------+
| uuid                                 | name                                | generation |
+--------------------------------------+-------------------------------------+------------+
| 434c4c7a-cbd0-42c4-815c-3d85199f9ee9 | overcloud-novacompute-1.example.com |        699 |
| 54522276-0d1d-491d-9760-ba7863d90611 | overcloud-novacompute-0.example.com |         17 |
+--------------------------------------+-------------------------------------+------------+
(overcloud) [stack@manager ~]$ openstack resource provider delete 54522276-0d1d-491d-9760-ba7863d90611

(overcloud) [stack@manager ~]$ openstack resource provider list
+--------------------------------------+-------------------------------------+------------+
| uuid                                 | name                                | generation |
+--------------------------------------+-------------------------------------+------------+
| 434c4c7a-cbd0-42c4-815c-3d85199f9ee9 | overcloud-novacompute-1.example.com |        699 |
+--------------------------------------+-------------------------------------+------------+

13. Now again execute overcloud node delete command 
(undercloud) [stack@manager ~]$ openstack overcloud node delete --stack overcloud overcloud-novacompute-0
Are you sure you want to delete these overcloud nodes [y/N]? y


[DEPRECATION WARNING]: ANSIBLE_CALLBACK_WHITELIST option, normalizing names to
new standard, use ANSIBLE_CALLBACKS_ENABLED instead. This feature will be
removed from ansible-core in version 2.15. Deprecation warnings can be disabled
 by setting deprecation_warnings=False in ansible.cfg.

PLAY [Check if required variables are defined] *********************************
skipping: no hosts matched

PLAY [Clear cached facts] ******************************************************

PLAY [Gather facts] ************************************************************
2022-11-25 11:25:22.141140 | 52540049-b3bc-b519-c4be-00000000003f |       TASK | Gathering Facts
[WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
novacompute-0: Failed to connect to the host via ssh: ssh: connect to host
192.168.100.188 port 22: No route to host
2022-11-25 11:25:47.518861 | 52540049-b3bc-b519-c4be-00000000003f | UNREACHABLE | Gathering Facts | overcloud-novacompute-0
2022-11-25 11:25:47.525352 | 52540049-b3bc-b519-c4be-00000000003f |     TIMING | Gathering Facts | overcloud-novacompute-0 | 0:00:25.484844 | 25.38s

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
overcloud-novacompute-0    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
2022-11-25 11:25:47.537067 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-11-25 11:25:47.538103 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-11-25 11:25:47.539144 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:00:25.498694 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-11-25 11:25:47.540284 |                                 UUID |       Info |       Host |   Task Name |   Run Time
2022-11-25 11:25:47.541269 | 52540049-b3bc-b519-c4be-00000000003f |    SUMMARY | overcloud-novacompute-0 | Gathering Facts | 25.38s
2022-11-25 11:25:47.542267 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-11-25 11:25:47.543349 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-11-25 11:25:47.544421 | ~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 1 ~~~~~~~~~~~~~~~~~
2022-11-25 11:25:47.545436 |  The following node(s) had failures: overcloud-novacompute-0
2022-11-25 11:25:47.546457 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



Again, Node is not deleted from the system and overcloud node delete command failed.
(undercloud) [stack@manager ~]$ openstack baremetal node list
+--------------------------------------+-------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name        | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+-------------+--------------------------------------+-------------+--------------------+-------------+
| ccf89e1a-b45f-437e-8e9f-3a605b614b1e | compute1    | fadcbce5-e925-4846-a996-f616a6b26ff5 | power on    | active             | False       |
| 825b1dc8-9875-44c5-b32c-2548d74797d4 | controller0 | b146a976-2258-4684-a0b3-157ecfe16beb | power on    | active             | False       |
| 316247a9-a06a-4e29-aefe-9a92ee77eb2c | controller1 | 96021f32-a27f-423b-ab07-ccc211f5875c | power on    | active             | False       |
| aace3156-d428-4464-a590-81d69a15d5d1 | controller2 | 9f802b46-8c21-44d7-b125-fb1700fe5d77 | power on    | active             | False       |
| 777a0672-ce34-49db-93ee-caec2a6dcf03 | storage1    | 05e73efe-139c-4635-9f1d-469832015355 | power on    | active             | False       |
| bc35858a-adf8-437c-8de3-d498bbff621d | storage2    | 17a28eb9-8b23-4c26-b27a-703cab2d50a8 | power on    | active             | False       |
| 54fff889-5d8b-496e-a0fb-affdea006bc1 | storage0    | 440fa182-7dc7-4ed4-bc1e-3695ed34e644 | power on    | active             | False       |
| a53f0d4b-3436-44ac-b83a-74a7413d4863 | compute2    | 39522868-301d-468d-a1e1-33e91a7e6e37 | power off   | active             | True        |
+--------------------------------------+-------------+--------------------------------------+-------------+--------------------+-------------+


Actual Result:
Overcloud node delete command is failed with error message

Expected Result:
Overcloud node delete command is executed successfully

Comment 1 Andy Stillman 2022-11-28 15:02:10 UTC
If it incurs a procedural change, please include this in the documentation change log.

Thanks,
Andy

Comment 2 Rahul Kaushal 2022-11-29 07:26:23 UTC
(In reply to Andy Stillman from comment #1)
> If it incurs a procedural change, please include this in the documentation
> change log.
> 
> Thanks,
> Andy

Hi Andy,

Thanks for the response.

Yes This is a procedural change, as the mentioned in procedure this is not working as expected.
Procedure needs to be updated with correct steps.

We have already mentioned reproduction steps in description.

Requesting Redhat Team to look into it.

>>please include this in the documentation change log.
This statement is not clear to us. Can you explain a little more this.


If this bug is not falls in documentation category then please assign this bug to correct owner.

Thanks & Regards
Rahul Kaushal

Comment 3 fallen 2022-11-29 12:00:45 UTC
This bug is linked: https://bugzilla.redhat.com/show_bug.cgi?id=2147614

Hey Harald, 

Could you help with this procedure?

Thanks
Fiona

Comment 8 Rabi Mishra 2022-11-29 14:15:45 UTC
You're running node delete after sourcing overcloudrc[1]. You should source stackrc before running the command. The error is clear 

7. Execute overcloud node delete command, this command also failed below error mesg "no connection to the baremetal service was provided."

[1] (overcloud)[stack@manager ~]$ openstack overcloud node delete --stack overcloud --baremetal-deployment /home/stack/templates/overcloud-baremetal-deploy.yaml

Comment 10 Rahul Kaushal 2022-11-30 04:14:03 UTC
(In reply to Rabi Mishra from comment #8)
> You're running node delete after sourcing overcloudrc[1]. You should source
> stackrc before running the command. The error is clear 
> 
> 7. Execute overcloud node delete command, this command also failed below
> error mesg "no connection to the baremetal service was provided."
> 
> [1] (overcloud)[stack@manager ~]$ openstack overcloud node delete --stack
> overcloud --baremetal-deployment
> /home/stack/templates/overcloud-baremetal-deploy.yaml

Thanks Rabi,

For pointing out mistake in our execution procedure.

We will again execute the procedure by source stackrc and share findings again on this bug.

Comment 11 Piyush Shukla 2022-11-30 12:20:05 UTC
Dear All,

Thank you for your investigation and analysis on this bug.

we have tried node deletion procedure by sourcing stackrc and this time procedure works well according to steps mentioned in document.
Hence this functionality is working as expected so we are closing this bug.

Thanks & Regards,
Piyush