Bug 2120768 - [RFE] configurable scheduling back to a compute node after instance ha fences it
Summary: [RFE] configurable scheduling back to a compute node after instance ha fences it
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 18.0 (Zed)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-23 17:22 UTC by Jeremy
Modified: 2023-07-24 05:26 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-18340 0 None None None 2022-08-23 17:24:24 UTC

Description Jeremy 2022-08-23 17:22:29 UTC
Description of problem:



Customer does not like the fact that instance ha automatically starts nova-compute after the compute node comes back up from fencing.  The reasoning is most of the failures they see are around memory going bad, so if memory goes bad and the compute comes back up without that dimm , therefore less ram the compute still isn't ready for usage. If vms start getting scheduled there again after fencing they still have to manually disable compute service and migrate vms off to fix the hardware.  We are wondering if there is a way to have the admin confirm the compute is good before allowing scheduling to continue to that node?  We tried to play with disabling compute unfence trigger since the docs say that is what unfences the node when it comes back up; that didn't work. Manually disabling the compute service doesn't seem like a good option either since the admin may not know exactly when fencing happens.


Note You need to log in before you can comment on or make changes to this bug.