Bug 1816606
Summary: | MHC MaxUnhealthy string value can have unexpected behaviour | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Joel Speed <jspeed> |
Component: | Cloud Compute | Assignee: | Joel Speed <jspeed> |
Cloud Compute sub component: | Other Providers | QA Contact: | Milind Yadav <miyadav> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | unspecified | CC: | agarcial, jhou |
Version: | 4.4 | ||
Target Milestone: | --- | ||
Target Release: | 4.4.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: A value for the MaxUnhealthy field on a MachineHealthCheck can take multiple value formats (eg 10, "10", "10%). Any quoted value was interpreted as a percent value even if it did not contain a percentage sign.
Consequence: The interpreted value of MaxUnhealthy may not have matched the users intention and Machines may have been remediated when they were not meant to be/may have not been remediated when they were meant to be.
Fix: Check if the value contains a percentage sign before marking the value as a percentage value.
Result: A value of 10 or "10" now has the same behaviour and "10" is not interpreted as "10%".
|
Story Points: | --- |
Clone Of: | 1812862 | Environment: | |
Last Closed: | 2020-06-29 15:33:54 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1812862 | ||
Bug Blocks: |
Description
Joel Speed
2020-03-24 11:08:30 UTC
PR https://github.com/openshift/machine-api-operator/pull/539 will update BZ once 4.4.z release branch opens Validated at : NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-06-21-210301 True False 36m Cluster version is 4.4.0-0.nightly-2020-06-21-210301 Step 1 . Create a mhc with maxUnhealthy value as “1” refer yaml : --- apiVersion: machine.openshift.io/v1beta1 kind: MachineHealthCheck metadata: creationTimestamp: "2020-02-14T09:47:08Z" generation: 1 name: mhc1 namespace: openshift-machine-api resourceVersion: "71059" selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machinehealthchecks/mhc-miyadav-1402-drlvf-worker-us-east-2c uid: ef74b735-e58e-4c24-aa69-015d90998b77 spec: maxUnhealthy: "1" selector: matchLabels: machine.openshift.io/cluster-api-cluster: miyadav-0622-cpsfs machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: miyadav-0622-cpsfs-worker-us-east-2a unhealthyConditions: - status: "False" timeout: 300s type: Ready - status: Unknown timeout: 300s type: Ready [miyadav@miyadav bugzilla]$ oc create -f mhc_1816606.yml machinehealthcheck.machine.openshift.io/mhc1 created [miyadav@miyadav bugzilla]$ oc get mhc NAME MAXUNHEALTHY EXPECTEDMACHINES CURRENTHEALTHY mhc1 1 1 1 Step 2:[miyadav@miyadav bugzilla]$ oc delete machine miyadav-0622-cpsfs-worker-us-east-2a-72qnw machine.machine.openshift.io "miyadav-0622-cpsfs-worker-us-east-2a-72qnw" deleted , check the logs oc logs -f machine-api-controllers-77d9ccd587-d6hp6 -c machine-healthcheck-controller . . .I0622 04:15:52.678252 1 machinehealthcheck_controller.go:166] Reconciling openshift-machine-api/mhc1: finding targets I0622 04:15:52.678389 1 machinehealthcheck_controller.go:272] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: health checking I0622 04:15:52.678452 1 machinehealthcheck_controller.go:286] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: is likely to go unhealthy in 5m0.321562927s I0622 04:15:52.685076 1 machinehealthcheck_controller.go:199] Reconciling openshift-machine-api/mhc1: monitoring MHC: total targets: 1, maxUnhealthy: 1, unhealthy: 1. Remediations are allowed I0622 04:15:52.685114 1 machinehealthcheck_controller.go:223] Reconciling openshift-machine-api/mhc1: some targets might go unhealthy. Ensuring a requeue happens in 5m0.321562927s I0622 04:15:53.027627 1 machinehealthcheck_controller.go:153] Reconciling openshift-machine-api/mhc1 I0622 04:15:53.028123 1 machinehealthcheck_controller.go:166] Reconciling openshift-machine-api/mhc1: finding targets I0622 04:15:53.028245 1 machinehealthcheck_controller.go:272] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: health checking I0622 04:15:53.028278 1 machinehealthcheck_controller.go:286] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: is likely to go unhealthy in 4m59.971736102s I0622 04:15:53.037795 1 machinehealthcheck_controller.go:199] Reconciling openshift-machine-api/mhc1: monitoring MHC: total targets: 1, maxUnhealthy: 1, unhealthy: 1. Remediations are allowed I0622 04:15:53.037830 1 machinehealthcheck_controller.go:223] Reconciling openshift-machine-api/mhc1: some targets might go unhealthy. Ensuring a requeue happens in 4m59.971736102s I0622 04:16:02.602147 1 machinehealthcheck_controller.go:153] Reconciling openshift-machine-api/mhc1 I0622 04:16:02.602182 1 machinehealthcheck_controller.go:166] Reconciling openshift-machine-api/mhc1: finding targets I0622 04:16:02.602263 1 machinehealthcheck_controller.go:272] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: health checking I0622 04:16:02.602288 1 machinehealthcheck_controller.go:286] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: is likely to go unhealthy in 4m50.397725768s I0622 04:16:02.608958 1 machinehealthcheck_controller.go:199] Reconciling openshift-machine-api/mhc1: monitoring MHC: total targets: 1, maxUnhealthy: 1, unhealthy: 1. Remediations are allowed I0622 04:16:02.608992 1 machinehealthcheck_controller.go:223] Reconciling openshift-machine-api/mhc1: some targets might go unhealthy. Ensuring a requeue happens in 4m50.397725768s I0622 04:16:52.747694 1 machinehealthcheck_controller.go:153] Reconciling openshift-machine-api/mhc1 I0622 04:16:52.748552 1 machinehealthcheck_controller.go:166] Reconciling openshift-machine-api/mhc1: finding targets I0622 04:16:52.748671 1 machinehealthcheck_controller.go:272] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: health checking I0622 04:16:52.748701 1 machinehealthcheck_controller.go:286] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-mpcmm/ip-10-0-135-30.us-east-2.compute.internal: is likely to go unhealthy in 4m0.251310018s I0622 04:16:52.755874 1 machinehealthcheck_controller.go:199] Reconciling openshift-machine-api/mhc1: monitoring MHC: total targets: 1, maxUnhealthy: 1, unhealthy: 1. Remediations are allowed I0622 04:16:52.755969 1 machinehealthcheck_controller.go:223] Reconciling openshift-machine-api/mhc1: some targets might go unhealthy. Ensuring a requeue happens in 4m0.251310018s . . . Actual & Expected:Remediation happened successfully as maxUnhealthy value is 1 Step 3: Edit the mhc mhc1 with value of maxUnhealthy as “1%” [miyadav@miyadav bugzilla]$ oc edit mhc mhc1 machinehealthcheck.machine.openshift.io/mhc1 edited Step 4: Repeat step 2 Step 5 : Monitor mhc logs , oc logs -f machine-api-controllers-77d9ccd587-d6hp6 -c machine-healthcheck-controller I0622 04:27:51.415666 1 machinehealthcheck_controller.go:272] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-5pb2d/: health checking I0622 04:27:51.415716 1 machinehealthcheck_controller.go:286] Reconciling openshift-machine-api/mhc1/miyadav-0622-cpsfs-worker-us-east-2a-5pb2d/: is likely to go unhealthy in 9m56.584292844s W0622 04:27:51.421869 1 machinehealthcheck_controller.go:182] Reconciling openshift-machine-api/mhc1: total targets: 2, maxUnhealthy: 1%, unhealthy: 2. Short-circuiting remediation Actual:Remediation did not happen as maxUnhealthy value is 1 percent Expected : Remediation should not happen as maxunhealthy value is 1 percent ( exceeded max condition to allow remediation) Additional Info: Moved to VERIFIED https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-28859 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2713 |