Bug 1872147 - Disable health checks if BMO is not running or BMH is in unmanaged state
Summary: Disable health checks if BMO is not running or BMH is in unmanaged state
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Console Metal3 Plugin
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.6.0
Assignee: Rastislav Wagner
QA Contact: Yanping Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-25 07:56 UTC by Rastislav Wagner
Modified: 2020-10-27 16:33 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:32:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift console pull 6664 0 None closed Bug 1872147: Disable health checks if BMO is not running or there's no power mgmt 2020-09-28 11:15:42 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:33:00 UTC

Description Rastislav Wagner 2020-08-25 07:56:16 UTC
When power actions are not available, we cannot remediate host

Comment 1 Rastislav Wagner 2020-09-18 07:19:01 UTC
Little bit more info: BMO is not running if there's no provisioning-configuration CR (BMO pods are running, but BMO is disabled due to missing CR).

Comment 3 Yadan Pei 2020-09-29 06:09:33 UTC
1. Create MachineHealthCheck matching two worker nodes
$ cat mhc.yaml
apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: example
  namespace: openshift-machine-api
spec:
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: <your cluster name>
      machine.openshift.io/cluster-api-machine-role: worker
      machine.openshift.io/cluster-api-machine-type: worker
      machine.openshift.io/cluster-api-machineset: <your machine set>
  unhealthyConditions:
    - type: Ready
      status: Unknown
      timeout: 300s
    - type: Ready
      status: 'False'
      timeout: 300s
  maxUnhealthy: 40%
$ oc create -f mhc.yaml

2. Check `Health Checks` status on Nodes -> Click on name of worker-0-1/worker-0-0 -> Check 'Health Checks' status in 'Status' card, it shows "2 conditions passing
" and green ok icon
3. Delete provisioning-configuration
$ oc delete provisioning.metal3.io provisioning-configuration
provisioning.metal3.io "provisioning-configuration" deleted
4. Check `Health Checks` status again using same steps with step2, this time it will show 'Not available' and grey question mark ? 


Moving to VERIFIED 4.6.0-0.nightly-2020-09-27-075304

Comment 6 errata-xmlrpc 2020-10-27 16:32:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.