Bug 1834172

Summary:

Node Health Check does not account for different remediation strategies

Product:

OpenShift Container Platform

Reporter:

Rastislav Wagner <rawagner>

Component:

Management Console

Assignee:

Yadan Pei <yapei>

Status:

CLOSED ERRATA

QA Contact:

Yadan Pei <yapei>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

4.5

CC:

aos-bugs, jokerman, yapei

Target Milestone:

---

Target Release:

4.5.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

No Doc Update

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-07-13 17:36:48 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
MachineHealthCheck external-baremetal	none

Description Rastislav Wagner 2020-05-11 08:46:30 UTC

MachineHealthCheck CR can be set to `external-baremetal` remediation strategy which means that the machine will reboot instead of reprovision. This is not recognized by Node's Overview page - Health checks item which always shows that the reprovision is pending.

Comment 3 Yadan Pei 2020-05-18 06:47:09 UTC

Created attachment 1689534 [details]
MachineHealthCheck external-baremetal

1. Create MachineHealthCheck with YAML below which targets one machine
apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: example
  namespace: openshift-machine-api
spec:
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: qe-ui4-l99xm
      machine.openshift.io/cluster-api-machine-role: worker
      machine.openshift.io/cluster-api-machine-type: worker
      machine.openshift.io/cluster-api-machineset: qe-ui4-l99xm-w-c
  unhealthyConditions:
    - type: Ready
      status: Unknown
      timeout: 300s
    - type: Ready
      status: 'False'
      timeout: 300s
  maxUnhealthy: 40%

2. Goes to cloud provider console and stop the instances targeted in MHC
3. The node will become NotReady firstly, after 300s MHC will report accurate status, on Nodes Overview Status card, click on 'Health Checks' it will show `Warning alert:Reboot pending` instead of  `reprovisioning pending`

Verified on 4.5.0-0.nightly-2020-05-17-220731

Comment 4 errata-xmlrpc 2020-07-13 17:36:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409