Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2240825

Summary: [DOWNSTREAM-ONLY] Block nova-compute startup if symptom of host rename has been detected
Product: Red Hat OpenStack Reporter: Artom Lifshitz <alifshit>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED ERRATA QA Contact: Jason Grosso <jgrosso>
Severity: high Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: dasmith, eglynn, jgrosso, jhakimra, joflynn, kchamart, kgilliga, mariel, osp-dfg-compute, pgrist, sbauza, sgordon, tvignaud, vromanso
Target Milestone: z6Keywords: Patch, Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-20.6.2-2.20230814165223.el8ost Doc Type: Enhancement
Doc Text:
This enhancement blocks the Compute service (nova) startup if symptoms of host rename have been detected. The renaming of Compute hosts in a running deployment should never happen, as it has catastrophic consequences on resource tracking and the ability to create new instances or migrate existing ones. Until this enhancement, it was technically possible to rename a Compute host. With this update, the Compute service attempts to detect symptoms of its Compute host getting renamed and does not start if a host rename is detected. This prevents resource tracking corruption and allows the operator to undo the rename before any damage occurs to the deployment. For more information, see link:https://access.redhat.com/articles/7040980[Troubleshooting Compute host name change detection].
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-08 19:19:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2242123    

Description Artom Lifshitz 2023-09-26 19:28:38 UTC
Description of problem:

All of Nova's resource tracking (and adjacent tracking of things, like the records we write to various resources in Cinder, Neutron, and Placement) presuppose permanent, unchanging, host names.

Sometimes intentionally, more often than not by accident, the host name changes. Nova has no safeguards in place for that case, starts up as normal, and when instances are created on or migrated to that host, resource tracking breaks.

There is a new safeguard in place as of OSP 18 (upstream Antelope) [1], and in the process of being backported to 17.1 [2], but backporting it to 16.2 is not realistic.

In its stead, do a much more simple downstream-only workaround that aborts startup if there are libvirt domains on the node, but no compute node record exists. Such a situation can only arise if something went horribly wrong, most likely a compute host rename, so aborting startup makes sense.

Version-Release number of selected component (if applicable):

16.2

How reproducible:

100%

Steps to Reproduce:
1. Rename a compute (this normally happens by accident)
2. Restart nova-compute
3. Do instance operations (create, migrate, etc) on the renamed compute host.

Actual results:

Resource tracking explodes and everything is on fire.

Expected results:

Not fires?

Additional info:

[1] https://review.opendev.org/c/openstack/nova/+/863920/17
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2192710

Comment 22 errata-xmlrpc 2023-11-08 19:19:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.2.6 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6307

Comment 23 Red Hat Bugzilla 2024-03-08 04:26:15 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days