1848755 – [IPI][OSP] Worker deleted on openstack is not recreated

Bug 1848755 - [IPI][OSP] Worker deleted on openstack is not recreated

Summary: [IPI][OSP] Worker deleted on openstack is not recreated

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.5.0
Assignee:	egarcia
QA Contact:	David Sanz
Docs Contact:
URL:
Whiteboard:
Depends On:	1843597
Blocks:	1832999
TreeView+	depends on / blocked

Reported:	2020-06-18 21:30 UTC by OpenShift BugZilla Robot
Modified:	2020-07-13 17:44 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-07-13 17:44:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-api-provider-openstack pull 102	0	None	closed	Bug 1848755: Revendor MAO and client-go	2020-07-06 13:02:51 UTC
Red Hat Product Errata	RHBA-2020:2409	0	None	None	None	2020-07-13 17:44:40 UTC

Comment 5 Martin André 2020-06-25 14:13:29 UTC

This will be looked into during the upcoming sprint.

Comment 8 Alberto 2020-07-01 14:34:18 UTC

>NAMESPACE               NAME                     MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
openshift-machine-api   openstack-health-check   40%            2                  1

This seems to me working as expected. 1 unhealthy out of 2 means you have 50% unhealthy. 50% is above the 40% MAXUNHEALTHY you have set. If you check the MHC controller logs you should see it short-circuiting.

Comment 9 Luis Tomas Bolivar 2020-07-01 14:46:44 UTC

(In reply to Alberto from comment #8)
> >NAMESPACE               NAME                     MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
> openshift-machine-api   openstack-health-check   40%            2           
> 1
> 
> This seems to me working as expected. 1 unhealthy out of 2 means you have
> 50% unhealthy. 50% is above the 40% MAXUNHEALTHY you have set. If you check
> the MHC controller logs you should see it short-circuiting.

Then maxunhealthy is very misleading. In your example you have 50% unhealthy machine and that is above the "max allowed unhealthy" machines. If it works the other way around, it should be "min_healthy" right?

Comment 10 Alberto 2020-07-01 15:00:46 UTC

maxUnhealthy is a threshold of your choice. It represents the max of concurrent of unhealthy machines in a given pool that the MHC is allowed to operate over.

If you are above that threshold it short-circuits so manual intervention can take place.

We'll be probably reflecting short-circuit state in the status/conditions of the MHC to make it more visual.

Comment 11 Luis Tomas Bolivar 2020-07-01 15:12:36 UTC

(In reply to Alberto from comment #10)
> maxUnhealthy is a threshold of your choice. It represents the max of
> concurrent of unhealthy machines in a given pool that the MHC is allowed to
> operate over.
> 
> If you are above that threshold it short-circuits so manual intervention can
> take place.
> 
> We'll be probably reflecting short-circuit state in the status/conditions of
> the MHC to make it more visual.

Got it! Thanks!

I was reading that as how many machines we allow to be unhealthy before MHC kicks in and fix it. While the meaning is the other way around: we allow MHC to do its job unless there is a lot of unhealthy machines, where we prefer to have manual intervention. And the max limit of unhealthy machines where the MHC should stop doing the recovery for us is defined by that maxunhealthy

Comment 12 David Sanz 2020-07-01 16:25:22 UTC

Ok, based on the explanation about how MHC works, it is verified on latest 4.5 nightly

Comment 14 errata-xmlrpc 2020-07-13 17:44:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.