Bug 1942079
| Summary: | Instances are stuck in scheduling/building state when scheduled on specific compute host | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alex Stupnikov <astupnik> |
| Component: | openstack-nova | Assignee: | OSP DFG:Compute <osp-dfg-compute> |
| Status: | CLOSED NOTABUG | QA Contact: | OSP DFG:Compute <osp-dfg-compute> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 13.0 (Queens) | CC: | alifshit, dasmith, dhruv, eglynn, hberaud, igallagh, jhakimra, kchamart, lmiccini, ltamagno, mwitt, sbauza, sgordon, smooney, vromanso |
| Target Milestone: | --- | Flags: | ltamagno:
needinfo?
|
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-21 15:14:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Alex Stupnikov
2021-03-23 15:39:26 UTC
We discussed this on the DFG:Compute bug triage call today. If there are no (needed) instances on the affected compute host (C1F-OPS-CMPC20/c1f-ops-cmpc20), the simplest way to fix this would be to just scale in (remove the compute) and scale out again to re-add it back to the environment. Is this something the customer can do? Unfortunately there are instances on affected compute, I am not sure if remaining computes have enough resources to host them and how migration would work with this RPC problem in place. I am wondering, what other options we have and how time-consuming and disruptive are they? Kind Regards, Alex. We discussed this some more during bug triage today. Could we also get a placement database dump, please? We need to understand how the compute host renames have affected its resource provider in Placement. That being said, if you can convince the customer to drain the node, scale in and scale back out, that would be the safest way to fix this. And by draining first, they shouldn't need extra hardware (assuming the cloud isn't at full capacity). not that the customer case is resovled im closing this as not a bug as the root cause was determined to be an unsupported operation. the db has not been fixed and configs updated so closing this to reflect that. |