Bug 1816606
| Summary: | MHC MaxUnhealthy string value can have unexpected behaviour | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Joel Speed <jspeed> |
| Component: | Cloud Compute | Assignee: | Joel Speed <jspeed> |
| Cloud Compute sub component: | Other Providers | QA Contact: | Milind Yadav <miyadav> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | low | ||
| Priority: | unspecified | CC: | agarcial, jhou |
| Version: | 4.4 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.4.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: A value for the MaxUnhealthy field on a MachineHealthCheck can take multiple value formats (eg 10, "10", "10%). Any quoted value was interpreted as a percent value even if it did not contain a percentage sign.
Consequence: The interpreted value of MaxUnhealthy may not have matched the users intention and Machines may have been remediated when they were not meant to be/may have not been remediated when they were meant to be.
Fix: Check if the value contains a percentage sign before marking the value as a percentage value.
Result: A value of 10 or "10" now has the same behaviour and "10" is not interpreted as "10%".
|
Story Points: | --- |
| Clone Of: | 1812862 | Environment: | |
| Last Closed: | 2020-06-29 15:33:54 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1812862 | ||
| Bug Blocks: | |||
|
Description
Joel Speed
2020-03-24 11:08:30 UTC
PR https://github.com/openshift/machine-api-operator/pull/539 will update BZ once 4.4.z release branch opens Validated at :
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.4.0-0.nightly-2020-06-21-210301 True False 36m Cluster version is 4.4.0-0.nightly-2020-06-21-210301
Step 1 . Create a mhc with maxUnhealthy value as “1” refer yaml :
---
apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
creationTimestamp: "2020-02-14T09:47:08Z"
generation: 1
name: mhc1
namespace: openshift-machine-api
resourceVersion: "71059"
selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machinehealthchecks/mhc-miyadav-1402-drlvf-worker-us-east-2c
uid: ef74b735-e58e-4c24-aa69-015d90998b77
spec:
maxUnhealthy: "1"
selector:
matchLabels:
machine.openshift.io/cluster-api-cluster: miyadav-0622-cpsfs
machine.openshift.io/cluster-api-machine-role: worker
machine.openshift.io/cluster-api-machine-type: worker
machine.openshift.io/cluster-api-machineset: miyadav-0622-cpsfs-worker-us-east-2a
unhealthyConditions:
-
status: "False"
timeout: 300s
type: Ready
-
status: Unknown
timeout: 300s
type: Ready
[miyadav@miyadav bugzilla]$ oc create -f mhc_1816606.yml
machinehealthcheck.machine.openshift.io/mhc1 created
[miyadav@miyadav bugzilla]$ oc get mhc
NAME MAXUNHEALTHY EXPECTEDMACHINES CURRENTHEALTHY
mhc1 1 1 1
Step 2:[miyadav@miyadav bugzilla]$ oc delete machine miyadav-0622-cpsfs-worker-us-east-2a-72qnw
machine.machine.openshift.io "miyadav-0622-cpsfs-worker-us-east-2a-72qnw" deleted
, check the logs
oc logs -f machine-api-controllers-77d9ccd587-d6hp6 -c machine-healthcheck-controller
.
.
.I0622 04:15:52.678252 1 machinehealthcheck_controller.go:166] Reconciling openshift-machine-api/mhc1: finding targets
I0622 04:15:52.678389 1 machinehealthcheck_controller.go:272] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: health checking
I0622 04:15:52.678452 1 machinehealthcheck_controller.go:286] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: is likely to go unhealthy in 5m0.321562927s
I0622 04:15:52.685076 1 machinehealthcheck_controller.go:199] Reconciling openshift-machine-api/mhc1: monitoring MHC: total targets: 1, maxUnhealthy: 1, unhealthy: 1. Remediations are allowed
I0622 04:15:52.685114 1 machinehealthcheck_controller.go:223] Reconciling openshift-machine-api/mhc1: some targets might go unhealthy. Ensuring a requeue happens in 5m0.321562927s
I0622 04:15:53.027627 1 machinehealthcheck_controller.go:153] Reconciling openshift-machine-api/mhc1
I0622 04:15:53.028123 1 machinehealthcheck_controller.go:166] Reconciling openshift-machine-api/mhc1: finding targets
I0622 04:15:53.028245 1 machinehealthcheck_controller.go:272] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: health checking
I0622 04:15:53.028278 1 machinehealthcheck_controller.go:286] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: is likely to go unhealthy in 4m59.971736102s
I0622 04:15:53.037795 1 machinehealthcheck_controller.go:199] Reconciling openshift-machine-api/mhc1: monitoring MHC: total targets: 1, maxUnhealthy: 1, unhealthy: 1. Remediations are allowed
I0622 04:15:53.037830 1 machinehealthcheck_controller.go:223] Reconciling openshift-machine-api/mhc1: some targets might go unhealthy. Ensuring a requeue happens in 4m59.971736102s
I0622 04:16:02.602147 1 machinehealthcheck_controller.go:153] Reconciling openshift-machine-api/mhc1
I0622 04:16:02.602182 1 machinehealthcheck_controller.go:166] Reconciling openshift-machine-api/mhc1: finding targets
I0622 04:16:02.602263 1 machinehealthcheck_controller.go:272] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: health checking
I0622 04:16:02.602288 1 machinehealthcheck_controller.go:286] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: is likely to go unhealthy in 4m50.397725768s
I0622 04:16:02.608958 1 machinehealthcheck_controller.go:199] Reconciling openshift-machine-api/mhc1: monitoring MHC: total targets: 1, maxUnhealthy: 1, unhealthy: 1. Remediations are allowed
I0622 04:16:02.608992 1 machinehealthcheck_controller.go:223] Reconciling openshift-machine-api/mhc1: some targets might go unhealthy. Ensuring a requeue happens in 4m50.397725768s
I0622 04:16:52.747694 1 machinehealthcheck_controller.go:153] Reconciling openshift-machine-api/mhc1
I0622 04:16:52.748552 1 machinehealthcheck_controller.go:166] Reconciling openshift-machine-api/mhc1: finding targets
I0622 04:16:52.748671 1 machinehealthcheck_controller.go:272] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: health checking
I0622 04:16:52.748701 1 machinehealthcheck_controller.go:286] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: is likely to go unhealthy in 4m0.251310018s
I0622 04:16:52.755874 1 machinehealthcheck_controller.go:199] Reconciling openshift-machine-api/mhc1: monitoring MHC: total targets: 1, maxUnhealthy: 1, unhealthy: 1. Remediations are allowed
I0622 04:16:52.755969 1 machinehealthcheck_controller.go:223] Reconciling openshift-machine-api/mhc1: some targets might go unhealthy. Ensuring a requeue happens in 4m0.251310018s
.
.
.
Actual & Expected:Remediation happened successfully as maxUnhealthy value is 1
Step 3: Edit the mhc mhc1 with value of maxUnhealthy as “1%”
[miyadav@miyadav bugzilla]$ oc edit mhc mhc1
machinehealthcheck.machine.openshift.io/mhc1 edited
Step 4: Repeat step 2
Step 5 : Monitor mhc logs , oc logs -f machine-api-controllers-77d9ccd587-d6hp6 -c machine-healthcheck-controller
I0622 04:27:51.415666 1 machinehealthcheck_controller.go:272] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-5pb2d/: health checking
I0622 04:27:51.415716 1 machinehealthcheck_controller.go:286] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-5pb2d/: is likely to go unhealthy in 9m56.584292844s
W0622 04:27:51.421869 1 machinehealthcheck_controller.go:182] Reconciling openshift-machine-api/mhc1: total targets: 2, maxUnhealthy: 1%, unhealthy: 2. Short-circuiting remediation
Actual:Remediation did not happen as maxUnhealthy value is 1 percent
Expected : Remediation should not happen as maxunhealthy value is 1 percent ( exceeded max condition to allow remediation)
Additional Info:
Moved to VERIFIED
https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-28859
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2713 |