1834172 – Node Health Check does not account for different remediation strategies

Bug 1834172 - Node Health Check does not account for different remediation strategies

Summary: Node Health Check does not account for different remediation strategies

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Management Console
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Yadan Pei
QA Contact:	Yadan Pei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-05-11 08:46 UTC by Rastislav Wagner
Modified:	2020-07-13 17:37 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-07-13 17:36:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
MachineHealthCheck external-baremetal (462.72 KB, image/png) 2020-05-18 06:47 UTC, Yadan Pei	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift console pull 5373	0	None	closed	Bug 1834172: Detect remediation strategy for Machine	2020-06-24 02:15:33 UTC
Red Hat Product Errata	RHBA-2020:2409	0	None	None	None	2020-07-13 17:37:26 UTC

Description Rastislav Wagner 2020-05-11 08:46:30 UTC

MachineHealthCheck CR can be set to `external-baremetal` remediation strategy which means that the machine will reboot instead of reprovision. This is not recognized by Node's Overview page - Health checks item which always shows that the reprovision is pending.

Comment 3 Yadan Pei 2020-05-18 06:47:09 UTC

Created attachment 1689534 [details]
MachineHealthCheck external-baremetal

1. Create MachineHealthCheck with YAML below which targets one machine
apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: example
  namespace: openshift-machine-api
spec:
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: qe-ui4-l99xm
      machine.openshift.io/cluster-api-machine-role: worker
      machine.openshift.io/cluster-api-machine-type: worker
      machine.openshift.io/cluster-api-machineset: qe-ui4-l99xm-w-c
  unhealthyConditions:
    - type: Ready
      status: Unknown
      timeout: 300s
    - type: Ready
      status: 'False'
      timeout: 300s
  maxUnhealthy: 40%

2. Goes to cloud provider console and stop the instances targeted in MHC
3. The node will become NotReady firstly, after 300s MHC will report accurate status, on Nodes Overview Status card, click on 'Health Checks' it will show `Warning alert:Reboot pending` instead of  `reprovisioning pending`

Verified on 4.5.0-0.nightly-2020-05-17-220731

Comment 4 errata-xmlrpc 2020-07-13 17:36:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.