Bug 1856922
| Summary: | [OSP16.1 RC] Unable to remove compute which has scale-out with --stack-only option and failed during ssh admin key insertion. | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Pradipta Kumar Sahoo <psahoo> |
| Component: | openstack-tripleo | Assignee: | James Slagle <jslagle> |
| Status: | CLOSED DUPLICATE | QA Contact: | Arik Chernetsky <achernet> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 16.1 (Train) | CC: | aschultz, bdobreli, dwilson, mburns, psahoo, smalleni |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-09-09 13:22:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
*** This bug has been marked as a duplicate of bug 1857365 *** |
Description of problem: In large scale test, there are few nodes failed during admin ssh key insertion where overcloud stack updated successfully with "--stack-only" option. But we can't remove these faulty nodes from overcloud stack as the command failed with config-download. It seems there is no option available to remove the node even it updated in overcloud heat stack. Version-Release number of selected component (if applicable): Red Hat OpenStack Platform release 16.1.0 RC (Train) python3-tripleoclient-12.3.2-0.20200615103427.6f877f6.el8ost.noarch How reproducible: 100% reproduced Steps to Reproduce: 1. During the admin ssh key insertion, the below IP failed with ssh timeout and found the below faulty node. $ grep "Timed out" overcloud_admin_key.log 2020-07-14 12:33:05.842 458660 ERROR openstack [-] Timed out waiting for port 22 from 192.168.3.113: tripleoclient.exceptions.DeploymentError: Timed out waiting for port 22 from 192.168.3.113 $ openstack server list|grep 192.168.3.113 | c37455ff-bf6e-451f-a063-e39006eaceca | overcloud-fc640compute-32 | ACTIVE | ctlplane=192.168.3.113 | overcloud-full | fc640-compute | $ openstack baremetal node list |grep c37455ff-bf6e-451f-a063-e39006eaceca | 3d20ddcf-ad24-4cc3-912b-98dd5fe189fe | None | c37455ff-bf6e-451f-a063-e39006eaceca | power on | active | False | 2. $ openstack overcloud node delete --stack 94a1e1aa-c10e-4597-8050-4c95b8118388 c37455ff-bf6e-451f-a063-e39006eaceca Are you sure you want to delete these overcloud nodes [y/N]? y Deleting the following nodes from stack overcloud: - c37455ff-bf6e-451f-a063-e39006eaceca Waiting for messages on queue 'tripleo' with no timeout. Config downloaded at /var/lib/mistral/overcloud Inventory generated at /var/lib/mistral/overcloud/tripleo-ansible-inventory.yaml Running ansible playbook at /var/lib/mistral/overcloud/scale_playbook.yaml. See log file at /var/lib/mistral/overcloud/ansible.log for progress. ... PLAY [Gather facts from undercloud] ******************************************** skipping: no hosts matched [WARNING]: Found variable using reserved name: ignore_unreachable PLAY [Gather facts from overcloud] ********************************************* TASK [Gathering Facts] ********************************************************* Tuesday 14 July 2020 12:52:26 +0000 (0:00:00.091) 0:00:00.091 ********** [WARNING]: Failure using method (v2_runner_on_start) in callback plugin (<ansible.plugins.callback.tripleo.CallbackModule object at 0x7f85d8c1ae48>): 'show_per_host_start' [WARNING]: Unhandled error in Python interpreter discovery for host overcloud- fc640compute-32: Failed to connect to the host via ssh: ssh: connect to host 192.168.3.113 port 22: No route to host fatal: [overcloud-fc640compute-32]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"192.168.3.113\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.3.113 port 22: No route to $ ost\r\n", "skip_reason": "Host overcloud-fc640compute-32 is unreachable", "unreachable": true} NO MORE HOSTS LEFT ************************************************************* PLAY RECAP ********************************************************************* overcloud-fc640compute-32 : ok=0 changed=0 unreachable=1 failed=0 skipped=1 rescued=0 ignored=0 Tuesday 14 July 2020 12:57:06 +0000 (0:04:40.113) 0:04:40.205 ********** =============================================================================== Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log. Scale-down configuration failed. Expected results: These compute node can be removed if it deployed via stack-only option Additional info: