Bug 1868104
Summary: | Baremetal actuator should not delete Machine objects | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Zane Bitter <zbitter> |
Component: | Cloud Compute | Assignee: | Zane Bitter <zbitter> |
Cloud Compute sub component: | BareMetal Provider | QA Contact: | Daniel <dmaizel> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | medium | CC: | abeekhof, beth.white, dhellmann, mgugino, msluiter, nyehia, sdasu, shardy, stbenjam, vsibirsk |
Version: | 4.6 | Keywords: | TestBlockerForLayeredProduct, Triaged |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
The baremetal actuator in the CAPBM was written before that error handling code was available in the Cluster API. Therefore it handles a situation where the underlying Host is deleted by deleting the Machine object as well.
Consequence:
The intended operation of the Machine controller is that a Machine object is never deleted except by the user. If a Machine fails, it is put into a failed state and left for the machine remediation controller to try to recover, and the user to ultimately delete.
Fix:
1. Set "InsufficientResourcesMachineError" on Machines that are searching (unsuccessfully) for an available host. This ensures that such Machines are the first victims on scale down
2. Move Machines into the "Failed" phase if the Host is deprovisioned
3. Don't delete failed Machines, leave this task to the MachineHealthCheck (see openshift/machine-api-operator#688)
Result:
Machine object no longer automatically deleted - see above for new process, as intended.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:15:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1901040, 1909682 | ||
Bug Blocks: |
Description
Zane Bitter
2020-08-11 17:09:53 UTC
(In reply to Zane Bitter from comment #0) > Potential complications: > * On baremetal, the Machine Remediation controller will attempt to remediate > by rebooting, which obviously is not the appropriate way to handle the case > where the Host has been deleted. We're hoping to have an escalation path from reboot to deletion in 4.7 *** Bug 1840581 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |