Description of problem: If the number of unhealthy machines exceeds maxUnhealthy, the mhc status will not set RemediationsAllowed = 0 Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2020-11-26-221840 How reproducible: always Steps to Reproduce: 1. Create a new machineset with replics=3 2. Create mhc with maxUnhealthy: 2 ---- apiVersion: "machine.openshift.io/v1beta1" kind: "MachineHealthCheck" metadata: name: mhc1 namespace: openshift-machine-api spec: selector: matchLabels: machine.openshift.io/cluster-api-cluster: zhsunaws27-x965j machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: zhsunaws27-x965j-worker-us-east-2bb unhealthyConditions: - type: Ready status: "False" timeout: 300s - type: Ready status: Unknown timeout: 300s maxUnhealthy: 2 3. Go to cloud provider console, terminate all three worker node(that has machine reference and is checked by mhc) 4. Check mhc status Actual results: The number of unhealthy machines exceeds maxUnhealthy, but the mhc status doesnt set RemediationsAllowed = 0 $ oc get mhc mhc1 -o yaml ... status: conditions: - lastTransitionTime: "2020-11-27T08:21:12Z" message: 'Remediation is not allowed, the number of not started or unhealthy machines exceeds maxUnhealthy (total: 3, unhealthy: 3, maxUnhealthy: 2)' reason: TooManyUnhealthy severity: Warning status: "False" type: RemediationAllowed currentHealthy: 0 expectedMachines: 3 Expected results: The mhc status will set RemediationsAllowed = 0 Additional info:
This field is being omitted because 0 is the zero value of the field, as such the json encoder doesn't encode it. We could make this work if we made the value a pointer, but that will make us diverge further from upstream and complicates the code somewhat. Is there a strong prior preference for having this kind of field shown when it's zero? Since we have the condition showing that the remediation is being restricted, I wonder if this field is actually needed at this point?
Verified clusterversion: 4.7.0-0.nightly-2020-12-03-205004 Status: Conditions: Last Transition Time: 2020-12-04T07:38:06Z Status: True Type: RemediationAllowed Current Healthy: 1 Expected Machines: 2 Remediations Allowed: 0 Status: Conditions: Last Transition Time: 2020-12-04T08:02:35Z Message: Remediation is not allowed, the number of not started or unhealthy machines exceeds maxUnhealthy (total: 2, unhealthy: 2, maxUnhealthy: 1) Reason: TooManyUnhealthy Severity: Warning Status: False Type: RemediationAllowed Current Healthy: 0 Expected Machines: 2 Remediations Allowed: 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633