Bug 1812862
Summary: | MHC MaxUnhealthy string value can have unexpected behaviour | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Joel Speed <jspeed> | |
Component: | Cloud Compute | Assignee: | Joel Speed <jspeed> | |
Cloud Compute sub component: | Other Providers | QA Contact: | Milind Yadav <miyadav> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | unspecified | |||
Priority: | unspecified | |||
Version: | 4.5 | |||
Target Milestone: | --- | |||
Target Release: | 4.5.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: A value for the MaxUnhealthy field on a MachineHealthCheck can take multiple value formats (eg 10, "10", "10%). Any quoted value was interpreted as a percent value even if it did not contain a percentage sign.
Consequence: The interpreted value of MaxUnhealthy may not have matched the users intention and Machines may have been remediated when they were not meant to be/may have not been remediated when they were meant to be.
Fix: Check if the value contains a percentage sign before marking the value as a percentage value.
Result: A value of 10 or "10" now has the same behaviour and "10" is not interpreted as "10%".
|
Story Points: | --- | |
Clone Of: | ||||
: | 1816606 (view as bug list) | Environment: | ||
Last Closed: | 2020-08-04 18:05:00 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1816606 |
Description
Joel Speed
2020-03-12 11:19:12 UTC
Validated at : NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-03-25-223812 True False 64m Cluster version is 4.5.0-0.nightly-2020-03-25-223812 Step 1 . Create a mhc with maxUnhealthy value as “1” refer yaml : --- apiVersion: machine.openshift.io/v1beta1 kind: MachineHealthCheck metadata: creationTimestamp: "2020-02-14T09:47:08Z" generation: 1 name: mhc1 namespace: openshift-machine-api resourceVersion: "71059" selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machinehealthchecks/mhc-miyadav-1402-drlvf-worker-us-east-2c uid: ef74b735-e58e-4c24-aa69-015d90998b77 spec: maxUnhealthy: "1" selector: matchLabels: machine.openshift.io/cluster-api-cluster: miyadav-2603-gcsjd machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: miyadav-2603-gcsjd-worker-us-east-2c unhealthyConditions: - status: "False" timeout: 300s type: Ready - status: Unknown timeout: 300s type: Ready [miyadav@miyadav ManualRun]$ oc create -f mhc_bz.yml machinehealthcheck.machine.openshift.io/mhc1 created [miyadav@miyadav ManualRun]$ oc get mhc NAME MAXUNHEALTHY EXPECTEDMACHINES CURRENTHEALTHY mhc1 1 1 1 Step 2: Go to the IaaS console and terminate the machine of the machineset being monitored , check the logs [miyadav@miyadav ManualRun]$ oc logs -f machine-api-controllers-54bb9448c-vlhsq -c machine-healthcheck-controller . . .I0326 03:06:40.649760 1 machinehealthcheck_controller.go:292] Reconciling openshift-machine-api/mhc1/miyadav-2603-gcsjd-worker-us-east-2c-rlk76/ip-10-0-163-4.us-east-2.compute.internal: is likely to go unhealthy in 5m0.350253619s I0326 03:06:40.658447 1 machinehealthcheck_controller.go:205] Reconciling openshift-machine-api/mhc1: monitoring MHC: total targets: 1, maxUnhealthy: 1, unhealthy: 1. Remediations are allowed I0326 03:06:40.658480 1 machinehealthcheck_controller.go:229] Reconciling openshift-machine-api/mhc1: some targets might go unhealthy. Ensuring a requeue happens in 5m0.350253619s I0326 03:06:44.325958 1 machinehealthcheck_controller.go:153] Reconciling openshift-machine-api/mhc1 I0326 03:06:44.325993 1 machinehealthcheck_controller.go:166] Reconciling openshift-machine-api/mhc1: finding targets I0326 03:06:44.326163 1 machinehealthcheck_controller.go:278] Reconciling openshift-machine-api/mhc1/miyadav-2603-gcsjd-worker-us-east-2c-rlk76/ip-10-0-163-4.us-east-2.compute.internal: health checking I0326 03:06:44.326193 1 machinehealthcheck_controller.go:292] Reconciling openshift-machine-api/mhc1/miyadav-2603-gcsjd-worker-us-east-2c-rlk76/ip-10-0-163-4.us-east-2.compute.internal: is likely to go unhealthy in 4m56.67381994s I0326 03:06:44.332319 1 machinehealthcheck_controller.go:205] Reconciling openshift-machine-api/mhc1: monitoring MHC: total targets: 1, maxUnhealthy: 1, unhealthy: 1. Remediations are allowed I0326 03:06:44.332354 1 machinehealthcheck_controller.go:229] Reconciling openshift-machine-api/mhc1: some targets might go unhealthy. Ensuring a requeue happens in 4m56.67381994s. . . . Actual:Remediation happened successfully as maxUnhealthy value is 1 Expected : Remediation should not happen as maxunhealthy value is 1 ( met max condition to allow remediation) Step 3: Edit the mhc mhc1 with value of maxUnhealthy as “1%” [miyadav@miyadav ManualRun]$ oc edit mhc mhc1 machinehealthcheck.machine.openshift.io/mhc1 edited Step 4: Repeat step 2 Step 5 : Monitor mhc logs , oc logs -f machine-api-controllers-54bb9448c-vlhsq -c machine-healthcheck-controller I0326 03:11:38.352849 1 machinehealthcheck_controller.go:278] Reconciling openshift-machine-api/mhc1/miyadav-2603-gcsjd-worker-us-east-2c-rlk76/ip-10-0-163-4.us-east-2.compute.internal: health checking I0326 03:11:38.352937 1 machinehealthcheck_controller.go:568] openshift-machine-api/mhc1/miyadav-2603-gcsjd-worker-us-east-2c-rlk76/ip-10-0-163-4.us-east-2.compute.internal: unhealthy: machine phase is "Failed" W0326 03:11:38.358387 1 machinehealthcheck_controller.go:188] Reconciling openshift-machine-api/mhc1: total targets: 1, maxUnhealthy: 1%, unhealthy: 1. Short-circuiting remediation I0326 03:11:54.047499 1 machinehealthcheck_controller.go:153] Reconciling openshift-machine-api/mhc1 I0326 03:11:54.047595 1 machinehealthcheck_controller.go:166] Reconciling openshift-machine-api/mhc1: finding targets I0326 03:11:54.047811 1 machinehealthcheck_controller.go:278] Reconciling openshift-machine-api/mhc1/miyadav-2603-gcsjd-worker-us-east-2c-rlk76/ip-10-0-163-4.us-east-2.compute.internal: health checking I0326 03:11:54.047838 1 machinehealthcheck_controller.go:568] openshift-machine-api/mhc1/miyadav-2603-gcsjd-worker-us-east-2c-rlk76/ip-10-0-163-4.us-east-2.compute.internal: unhealthy: machine phase is "Failed" W0326 03:11:54.053284 1 machinehealthcheck_controller.go:188] Reconciling openshift-machine-api/mhc1: total targets: 1, maxUnhealthy: 1%, unhealthy: 1. Short-circuiting remediation I0326 03:13:15.973549 1 machinehealthcheck_controller.go:153] Reconciling openshift-machine-api/mhc1 I0326 03:13:15.973604 1 machinehealthcheck_controller.go:166] Reconciling openshift-machine-api/mhc1: finding targets I0326 03:13:15.973717 1 machinehealthcheck_controller.go:278] Reconciling openshift-machine-api/mhc1/miyadav-2603-gcsjd-worker-us-east-2c-rlk76/ip-10-0-163-4.us-east-2.compute.internal: health checking I0326 03:13:15.973800 1 machinehealthcheck_controller.go:568] openshift-machine-api/mhc1/miyadav-2603-gcsjd-worker-us-east-2c-rlk76/ip-10-0-163-4.us-east-2.compute.internal: unhealthy: machine phase is "Failed" W0326 03:13:15.979594 1 machinehealthcheck_controller.go:188] Reconciling openshift-machine-api/mhc1: total targets: 1, maxUnhealthy: 1%, unhealthy: 1. Short-circuiting remediation Actual:Remediation did not happen as maxUnhealthy value is 1 percent Expected : Remediation should not happen as maxunhealthy value is 1 percent ( exceeded max condition to allow remediation) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |