Bug 1838430

Summary: [Metal] Support Machine Remediation
Product: OpenShift Container Platform Reporter: Nir <nyehia>
Component: Bare Metal Hardware ProvisioningAssignee: Nir <nyehia>
Bare Metal Hardware Provisioning sub component: cluster-api-provider QA Contact: mlammon
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: abeekhof, augol, gharden
Version: 4.5Keywords: Triaged
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
URL: https://github.com/openshift/cluster-api-provider-baremetal/pull/59
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Feature: Remediate unhealthy baremetal machines by rebooting them Reason: auto recovery from transient errors Result: Unhealthy baremetal machines will be automatically rebooted
Story Points: ---
Clone Of:
: 1838431 (view as bug list) Environment:
Last Closed: 2020-07-13 17:40:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1831603    
Bug Blocks: 1838431    

Description Nir 2020-05-21 06:57:49 UTC
We introduced baremetal machine remediation logic into openshift/CAPBM in
https://github.com/openshift/cluster-api-provider-baremetal/pull/59

This basically power-cycle unhealthy hosts (as detected by Machine Healthcheck Controller)

We would like to backport this to 4.4 and we need a BZ for that.

This feature depends on Baremetal Operator Reboot API:
https://bugzilla.redhat.com/show_bug.cgi?id=1831603

Comment 5 mlammon 2020-06-02 17:18:20 UTC
Successfully test on nightly build 4.5.0-0.nightly-2020-06-01-111748

Comment 6 errata-xmlrpc 2020-07-13 17:40:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409