Bug 1579785
Summary: | On split-stack setups, left over node information prevents a node from rejoin the cloud | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Sven Michels <svmichel> | |
Component: | openstack-nova | Assignee: | Martin Schuppert <mschuppe> | |
Status: | CLOSED ERRATA | QA Contact: | OSP DFG:Compute <osp-dfg-compute> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 12.0 (Pike) | CC: | awaugama, berrange, dasmith, eglynn, gferrazs, ipetrova, jamsmith, jhakimra, kchamart, lmarsh, lyarwood, madgupta, mschuppe, sbauza, sferdjao, sgordon, srevivo, svmichel, tmicheli, vromanso | |
Target Milestone: | z3 | Keywords: | Triaged, ZStream | |
Target Release: | 12.0 (Pike) | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | openstack-nova-16.1.3-1.el7ost | Doc Type: | Bug Fix | |
Doc Text: |
Prior to this update, to re-discover a compute node record after deleting a host mapping from the API database, the compute node record had to be manually marked as unmapped. Otherwise, a compute node with the same hostname could not be mapped back to the cell from which it was removed.
With this update, the compute node record is automatically marked as unmapped when you delete a host from a cell, enabling a compute node with the same hostname to be added to the cell during host discovery.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1591788 (view as bug list) | Environment: | ||
Last Closed: | 2018-08-20 12:55:30 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1591788, 1731150, 1781142 |
Description
Sven Michels
2018-05-18 10:35:14 UTC
This was fixed in upstream pike: https://review.openstack.org/#/c/553829/ (In reply to Dan Smith from comment #1) > This was fixed in upstream pike: > > https://review.openstack.org/#/c/553829/ Hey Dan, but this was only for cell deletion, the issue we see is when a node is deleted, right? We would need to add a node delete from cell to the "service delete" of compute or to the templates when we scale down a node. The first one would probably the best solution, as this could also happen without director. Cheers, Sven Sorry, this is the upstream fix for delete_host that I was thinking, which unmaps the node when deleting the host mapping: https://review.openstack.org/#/c/527560/ Maybe you could elaborate more on what you mean by "remove compute resource" so I can know what you're deleting and re-adding? The thing is, if the compute node and service come back in the same configuration (i.e. the same hostname) the old host mapping should still be sufficient to find it again (i.e. you shouldn't need to discover again). Hey Dan, sorry for the delay, missed that one :( To clearify: The issue was triggered first, when we started to do some scale in and out tests in our test environment. So what we basically did: - install RHOSP12 with 2 Compute Nodes - scale to 3 nodes (adding node c) - scale to 2 nodes (removing a, b or c) - reinstall the node - scale to 3 nodes (adding the removed node) This is in a split stack environment, so the nodes are not installed by ironic, but externally by the customer. For that reason the node name won't change. So if you remove compute-b and reinstall it, it will be compute-b again. In this scenario, the node is only removed from the environment within heat. Since there is no delete_host executed or done, the whole mapping stays as is. So if you try to bring a node "back" (or lets simplify it: if you add a node which has the exact same hostname as one node had before), it doesn't work. The existing, orphaned entry prevents the node from being re-added to the cell. So the commit you're refering to might fix the need of manually fiddleing around in the database, as a delete_host would be enough. But then we need to add the delete_host as a needed step to our documentation. Or we add a task into our templates executing exactly the delete_host command. But as we ask the customer to disable and remove the service manually already, the first one might be easier (except that this needs to be executed inside a container as there is no external command for it, right?). If you still miss something, please let me know. Cheers and thanks, Sven This is not an issue only with in the deployed server scenario. It can also happen when we use hostname mappings to keep the same hostname even if the internal index goes up. What we'd need is: 1) https://review.openstack.org/#/c/527560/ as part of next OSP12 maintenance release (16.1.x). 2) a doc bug to enhance our scale down procedure to remove the compute from the cell [1]. existing instructions: ~~~ ... Finally, remove the node’s Compute service: (undercloud) $ source ~/stack/overcloudrc (overcloud) $ openstack compute service list (overcloud) $ openstack compute service delete [service-id] ~~~ Here we need to add: ~~~ Login to one of the overcloud controllers and delete the removed host $ ssh heat-admin@overcloud-controller-X $ nova-manage --config-dir /var/lib/config-data/puppet-generated/nova/etc/nova cell_v2 list_hosts $ nova-manage --config-dir /var/lib/config-data/puppet-generated/nova/etc/nova cell_v2 delete_host --cell_uuid <Cell UUID> --host <Hostname> ~~~ [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/html-single/director_installation_and_usage/#sect-Removing_Compute_Nodes This bugzilla has been removed from the release since it has not been triaged, and needs to be reviewed for targeting another release. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2332 |