Bug 1856270
| Summary: | Machine couldn't be deleted if machine stucks in Provisioning status | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | sunzhaohua <zhsun> |
| Component: | Cloud Compute | Assignee: | egarcia |
| Cloud Compute sub component: | OpenStack Provider | QA Contact: | David Sanz <dsanzmor> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | low | CC: | adduarte, ansverma, egarcia, m.andre, mfedosin, mgugino, oarribas, pprinett |
| Version: | 4.6 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.6.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-27 16:13:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
sunzhaohua
2020-07-13 08:55:25 UTC
So, what we can do as a stopgap is remove the finalizer when we get an "Resource not found" delete error, and force a manual delete. However, I am curious as to why it is failing to delete the instance stuck in provisioning in the first place. Is there more info about the instance or about why you think that might have happened that you can give me? Removing the finalizer if there is still a VM that needs to be removed is not what we want to do. The finalizer should only be removed if we know the instance is gone. If there is a situation that requires an OpenStack administrator to remove the instance (eg, we can't do it from the actuator/provider), then we should not remove the finalizer and let the machine continue to fail. This would be a bug in OpenStack, and the machine being stuck in deleting is exactly what we want. After the user removes the instance from the cloud, the actuator will work like normal and the machine will go away because the cloud (OpenStack) is now returning the proper information. If there is something that can be done inside the actuator to either 1) Verify the instance is actually gone or 2) Make the instance go away via some other api call, we need to do one of those two things. In any case, removing the finalizer for an unhandled error is not what we want. If the cloud will always return this phantom instance (bug in OpenStack), and we cannot detect this condition via the API, the answer is to let the machine continue to fail, create some documentation around this as a known issue, and instruct the user (not the machine-controller) to remove this finalizer if this condition is encountered. In this case, we will just document the workaround. Verified as fix is on docs Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |