Bug 1834172

Summary: Node Health Check does not account for different remediation strategies
Product: OpenShift Container Platform Reporter: Rastislav Wagner <rawagner>
Component: Management ConsoleAssignee: Yadan Pei <yapei>
Status: CLOSED ERRATA QA Contact: Yadan Pei <yapei>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.5CC: aos-bugs, jokerman, yapei
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:36:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
MachineHealthCheck external-baremetal none

Description Rastislav Wagner 2020-05-11 08:46:30 UTC
MachineHealthCheck CR can be set to `external-baremetal` remediation strategy which means that the machine will reboot instead of reprovision. This is not recognized by Node's Overview page - Health checks item which always shows that the reprovision is pending.

Comment 3 Yadan Pei 2020-05-18 06:47:09 UTC
Created attachment 1689534 [details]
MachineHealthCheck external-baremetal

1. Create MachineHealthCheck with YAML below which targets one machine
apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: example
  namespace: openshift-machine-api
spec:
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: qe-ui4-l99xm
      machine.openshift.io/cluster-api-machine-role: worker
      machine.openshift.io/cluster-api-machine-type: worker
      machine.openshift.io/cluster-api-machineset: qe-ui4-l99xm-w-c
  unhealthyConditions:
    - type: Ready
      status: Unknown
      timeout: 300s
    - type: Ready
      status: 'False'
      timeout: 300s
  maxUnhealthy: 40%

2. Goes to cloud provider console and stop the instances targeted in MHC
3. The node will become NotReady firstly, after 300s MHC will report accurate status, on Nodes Overview Status card, click on 'Health Checks' it will show `Warning alert:Reboot pending` instead of  `reprovisioning pending`

Verified on 4.5.0-0.nightly-2020-05-17-220731

Comment 4 errata-xmlrpc 2020-07-13 17:36:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409