Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1856922

Summary: [OSP16.1 RC] Unable to remove compute which has scale-out with --stack-only option and failed during ssh admin key insertion.
Product: Red Hat OpenStack Reporter: Pradipta Kumar Sahoo <psahoo>
Component: openstack-tripleoAssignee: James Slagle <jslagle>
Status: CLOSED DUPLICATE QA Contact: Arik Chernetsky <achernet>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: aschultz, bdobreli, dwilson, mburns, psahoo, smalleni
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-09 13:22:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pradipta Kumar Sahoo 2020-07-14 17:32:10 UTC
Description of problem:
In large scale test, there are few nodes failed during admin ssh key insertion where overcloud stack updated successfully with "--stack-only" option.
But we can't remove these faulty nodes from overcloud stack as the command failed with config-download.

It seems there is no option available to remove the node even it updated in overcloud heat stack.


Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 16.1.0 RC (Train)
python3-tripleoclient-12.3.2-0.20200615103427.6f877f6.el8ost.noarch

How reproducible: 100% reproduced

Steps to Reproduce:
1. During the admin ssh key insertion, the below IP failed with ssh timeout and found the below faulty node.
$ grep "Timed out" overcloud_admin_key.log
2020-07-14 12:33:05.842 458660 ERROR openstack [-] Timed out waiting for port 22 from 192.168.3.113: tripleoclient.exceptions.DeploymentError: Timed out waiting for port 22 from 192.168.3.113

$ openstack server list|grep 192.168.3.113
| c37455ff-bf6e-451f-a063-e39006eaceca | overcloud-fc640compute-32   | ACTIVE | ctlplane=192.168.3.113 | overcloud-full | fc640-compute   |

$ openstack baremetal node list |grep c37455ff-bf6e-451f-a063-e39006eaceca
| 3d20ddcf-ad24-4cc3-912b-98dd5fe189fe | None | c37455ff-bf6e-451f-a063-e39006eaceca | power on    | active             | False       |

2. 
$ openstack overcloud node delete --stack 94a1e1aa-c10e-4597-8050-4c95b8118388 c37455ff-bf6e-451f-a063-e39006eaceca                                                                                   
Are you sure you want to delete these overcloud nodes [y/N]? y                                                                                                                                                                                
Deleting the following nodes from stack overcloud:                                                                                                                                                                                            
- c37455ff-bf6e-451f-a063-e39006eaceca                                                                                                                                                                                                        
Waiting for messages on queue 'tripleo' with no timeout.                                                                                                                                                                                      
Config downloaded at /var/lib/mistral/overcloud                                                                                                                                                                                               
Inventory generated at /var/lib/mistral/overcloud/tripleo-ansible-inventory.yaml                                                                                                                                                              
Running ansible playbook at /var/lib/mistral/overcloud/scale_playbook.yaml. See log file at /var/lib/mistral/overcloud/ansible.log for progress. ...                                                                                         


PLAY [Gather facts from undercloud] ********************************************
skipping: no hosts matched
[WARNING]: Found variable using reserved name: ignore_unreachable

PLAY [Gather facts from overcloud] *********************************************

TASK [Gathering Facts] *********************************************************
Tuesday 14 July 2020  12:52:26 +0000 (0:00:00.091)       0:00:00.091 **********
[WARNING]: Failure using method (v2_runner_on_start) in callback plugin
(<ansible.plugins.callback.tripleo.CallbackModule object at 0x7f85d8c1ae48>):
'show_per_host_start'
[WARNING]: Unhandled error in Python interpreter discovery for host overcloud-

fc640compute-32: Failed to connect to the host via ssh: ssh: connect to host
192.168.3.113 port 22: No route to host
fatal: [overcloud-fc640compute-32]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"192.168.3.113\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.3.113 port 22: No route to $
ost\r\n", "skip_reason": "Host overcloud-fc640compute-32 is unreachable", "unreachable": true}


NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
overcloud-fc640compute-32  : ok=0    changed=0    unreachable=1    failed=0    skipped=1    rescued=0    ignored=0

Tuesday 14 July 2020  12:57:06 +0000 (0:04:40.113)       0:04:40.205 **********
===============================================================================

Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log.
Scale-down configuration failed.


Expected results: These compute node can be removed if it deployed via stack-only option


Additional info:

Comment 1 Alex Schultz 2020-09-09 13:22:55 UTC

*** This bug has been marked as a duplicate of bug 1857365 ***