Bug 1848755
Summary: | [IPI][OSP] Worker deleted on openstack is not recreated | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
Component: | Cloud Compute | Assignee: | egarcia |
Cloud Compute sub component: | OpenStack Provider | QA Contact: | David Sanz <dsanzmor> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | adduarte, agarcial, egarcia, ltomasbo, m.andre, mfedosin, pprinett |
Version: | 4.5 | Keywords: | UpcomingSprint |
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-13 17:44:18 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1843597 | ||
Bug Blocks: | 1832999 |
Comment 5
Martin André
2020-06-25 14:13:29 UTC
>NAMESPACE NAME MAXUNHEALTHY EXPECTEDMACHINES CURRENTHEALTHY
openshift-machine-api openstack-health-check 40% 2 1
This seems to me working as expected. 1 unhealthy out of 2 means you have 50% unhealthy. 50% is above the 40% MAXUNHEALTHY you have set. If you check the MHC controller logs you should see it short-circuiting.
(In reply to Alberto from comment #8) > >NAMESPACE NAME MAXUNHEALTHY EXPECTEDMACHINES CURRENTHEALTHY > openshift-machine-api openstack-health-check 40% 2 > 1 > > This seems to me working as expected. 1 unhealthy out of 2 means you have > 50% unhealthy. 50% is above the 40% MAXUNHEALTHY you have set. If you check > the MHC controller logs you should see it short-circuiting. Then maxunhealthy is very misleading. In your example you have 50% unhealthy machine and that is above the "max allowed unhealthy" machines. If it works the other way around, it should be "min_healthy" right? maxUnhealthy is a threshold of your choice. It represents the max of concurrent of unhealthy machines in a given pool that the MHC is allowed to operate over. If you are above that threshold it short-circuits so manual intervention can take place. We'll be probably reflecting short-circuit state in the status/conditions of the MHC to make it more visual. (In reply to Alberto from comment #10) > maxUnhealthy is a threshold of your choice. It represents the max of > concurrent of unhealthy machines in a given pool that the MHC is allowed to > operate over. > > If you are above that threshold it short-circuits so manual intervention can > take place. > > We'll be probably reflecting short-circuit state in the status/conditions of > the MHC to make it more visual. Got it! Thanks! I was reading that as how many machines we allow to be unhealthy before MHC kicks in and fix it. While the meaning is the other way around: we allow MHC to do its job unless there is a lot of unhealthy machines, where we prefer to have manual intervention. And the max limit of unhealthy machines where the MHC should stop doing the recovery for us is defined by that maxunhealthy Ok, based on the explanation about how MHC works, it is verified on latest 4.5 nightly Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |