In order to make sure chaos scenarios won't affect user workload we need to enable machine health check by default on freshly installed clusters.
Not sure what it means, are you talking about https://github.com/openshift/assisted-service/blob/master/deploy/assisted-service.yaml#L29 ?
Michael I am talking about https://docs.openshift.com/container-platform/4.5/machine_management/deploying-machine-health-checks.html
@yshnaidm I guess we can do it the same way we create the BMH? alazar, rom if we want to add it we should probably do it during the ignition generation. Thoughts?
I think that the problem here is more profound: since the assisted installer is not an IPI, it does not integrate at all with the MachineHealthCheck (MHC). I think this bz should be changed to a request for extension: let assisted-installed cluster integrate with MHC, so that non-responsive nodes can be automatically recycled/restarted.
(In reply to Eran Cohen from comment #3) > @yshnaidm I guess we can do it the same way we create the BMH? > alazar, rom if we want to add it we should probably do > it during the ignition generation. > Thoughts? In terms of implementation, if the purpose is to add a custom manifest to the cluster, such as the one describe here: https://docs.openshift.com/container-platform/4.5/machine_management/deploying-machine-health-checks.html#machine-health-checks-resource_deploying-machine-health-checks It can be achieved by using the manifest API to provide it after the cluster was created and it will be rendered into the ign file by: https://github.com/openshift/assisted-service/blob/master/internal/ignition/ignition.go#L141
@ercohen why are we integrated with Machine Health from the start? I mean, why should we create any specific manifest? Does not openshift-installer should do it? by some kind of configuraiton
Please take a look at BZ #1889651 comments to have more context about this change.
The lack of a provisioning network isn't specifically an issue, but we do need the Machine API to be functional and able to provision/destroy nodes. Adding the Lifecycle squad for visibility
(In reply to Andrew Beekhof from comment #13) > The lack of a provisioning network isn't specifically an issue, but we do > need the Machine API to be functional and able to provision/destroy nodes. > > Adding the Lifecycle squad for visibility Currently in AI, creating/deleting machine objects has no effect as the bmh entities have no BMC details or any other ability to provision (they are discovered, but in an unmanaged state). There is work ahead to enable day 2 provisioning, but this is a while off.
Not working directly on AI at the moment. Releasing so someone else can work on it.
This featrue is tracked by https://issues.redhat.com/browse/MGMT-4811 and we have decided to wait on node health check.