1889333 – [CNV][Chaos] Integrate with MachineHealthCheck

Bug 1889333 - [CNV][Chaos] Integrate with MachineHealthCheck

Summary: [CNV][Chaos] Integrate with MachineHealthCheck

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	assisted-installer
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Piotr Kliczewski
QA Contact:	Yuri Obshansky
Docs Contact:
URL:
Whiteboard:	AI-Team-Projects
Depends On:
Blocks:	1908661
TreeView+	depends on / blocked

Reported:	2020-10-19 12:23 UTC by Piotr Kliczewski
Modified:	2021-05-06 12:13 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-06 12:13:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Piotr Kliczewski 2020-10-19 12:23:18 UTC

In order to make sure chaos scenarios won't affect user workload we need to enable machine health check by default on freshly installed clusters.

Comment 1 Michael Filanov 2020-10-20 06:48:20 UTC

Not sure what it means, are you talking about https://github.com/openshift/assisted-service/blob/master/deploy/assisted-service.yaml#L29 ?

Comment 2 Piotr Kliczewski 2020-10-20 07:07:37 UTC

Michael I am talking about https://docs.openshift.com/container-platform/4.5/machine_management/deploying-machine-health-checks.html

Comment 3 Eran Cohen 2020-10-20 08:00:36 UTC

@yshnaidm  I guess we can do it the same way we create the BMH?
alazar, rom if we want to add it we should probably do it during the ignition generation.
Thoughts?

Comment 4 Dan Kenigsberg 2020-10-20 10:07:26 UTC

I think that the problem here is more profound: since the assisted installer is not an IPI, it does not integrate at all with the MachineHealthCheck (MHC). I think this bz should be changed to a request for extension: let assisted-installed cluster integrate with MHC, so that non-responsive nodes can be automatically recycled/restarted.

Comment 5 Moti Asayag 2020-11-25 19:16:40 UTC

(In reply to Eran Cohen from comment #3)
> @yshnaidm  I guess we can do it the same way we create the BMH?
> alazar, rom if we want to add it we should probably do
> it during the ignition generation.
> Thoughts?

In terms of implementation, if the purpose is to add a custom manifest to the cluster, such as the one describe here:
https://docs.openshift.com/container-platform/4.5/machine_management/deploying-machine-health-checks.html#machine-health-checks-resource_deploying-machine-health-checks

It can be achieved by using the manifest API to provide it after the cluster was created and it will be rendered into the ign file by: 
https://github.com/openshift/assisted-service/blob/master/internal/ignition/ignition.go#L141

Comment 6 yevgeny shnaidman 2020-11-26 06:33:12 UTC

@ercohen why are we integrated with Machine Health from the start? I mean, why should we create any specific manifest? Does not openshift-installer should do it? by some kind of configuraiton

Comment 7 Piotr Kliczewski 2020-11-26 08:39:24 UTC

Please take a look at BZ #1889651 comments to have more context about this change.

Comment 13 Andrew Beekhof 2021-01-20 02:12:41 UTC

The lack of a provisioning network isn't specifically an issue, but we do need the Machine API to be functional and able to provision/destroy nodes.

Adding the Lifecycle squad for visibility

Comment 15 Angus Salkeld 2021-01-24 23:23:35 UTC

(In reply to Andrew Beekhof from comment #13)
> The lack of a provisioning network isn't specifically an issue, but we do
> need the Machine API to be functional and able to provision/destroy nodes.
> 
> Adding the Lifecycle squad for visibility

Currently in AI, creating/deleting machine objects has no effect as the bmh entities
have no BMC details or any other ability to provision (they are discovered, but in an unmanaged state).
There is work ahead to enable day 2 provisioning, but this is a while off.

Comment 16 Angus Salkeld 2021-03-22 21:44:22 UTC

Not working directly on AI at the moment. Releasing so someone else can work on it.

Comment 17 Piotr Kliczewski 2021-05-06 12:13:32 UTC

This featrue is tracked by https://issues.redhat.com/browse/MGMT-4811 and we have decided to wait on node health check.

Note You need to log in before you can comment on or make changes to this bug.