Bug 2120768

Summary: [RFE] configurable scheduling back to a compute node after instance ha fences it
Product: Red Hat OpenStack Reporter: Jeremy <jmelvin>
Component: openstack-tripleo-commonAssignee: Nobody <nobody>
Status: NEW --- QA Contact: David Rosenfeld <drosenfe>
Severity: medium Docs Contact:
Priority: medium    
Version: 18.0 (Zed)CC: bgibizer, dasmith, eglynn, jhakimra, kchamart, lmiccini, mburns, sbauza, sgordon, slinaber, spower, vromanso
Target Milestone: ---Keywords: FutureFeature, RFE
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeremy 2022-08-23 17:22:29 UTC
Description of problem:



Customer does not like the fact that instance ha automatically starts nova-compute after the compute node comes back up from fencing.  The reasoning is most of the failures they see are around memory going bad, so if memory goes bad and the compute comes back up without that dimm , therefore less ram the compute still isn't ready for usage. If vms start getting scheduled there again after fencing they still have to manually disable compute service and migrate vms off to fix the hardware.  We are wondering if there is a way to have the admin confirm the compute is good before allowing scheduling to continue to that node?  We tried to play with disabling compute unfence trigger since the docs say that is what unfences the node when it comes back up; that didn't work. Manually disabling the compute service doesn't seem like a good option either since the admin may not know exactly when fencing happens.