Bug 1835160
Summary: | MachineSets: missing replicas in spec breaks autoscaler | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Joel Speed <jspeed> |
Component: | Cloud Compute | Assignee: | Joel Speed <jspeed> |
Cloud Compute sub component: | Other Providers | QA Contact: | Milind Yadav <miyadav> |
Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
Severity: | low | ||
Priority: | unspecified | CC: | agarcial, deads, hongkliu, jhou, jspeed, mgugino, zhsun |
Version: | 4.4 | ||
Target Milestone: | --- | ||
Target Release: | 4.4.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: A MachineSet can have a nil replicas field
Consequence: The autoscaler cannot determine the size of the MachineSet as it currently is
Fix: Allow the autoscaler to read the replica count from the status replica field
Result: The autoscaler will always be able to determine the current size of a MachineSet
|
Story Points: | --- |
Clone Of: | 1820654 | Environment: | |
Last Closed: | 2020-08-18 10:08:33 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1820654, 1852061 | ||
Bug Blocks: |
Comment 3
sunzhaohua
2020-06-29 06:01:55 UTC
I've just taken a look at the codepaths that are being followed to lead to the log line `pre_filtering_processor.go:62] Error while checking node group size openshift-machine-api/zhsun62944-fvwvq-worker-us-east-2c: group size not found` and I think this is an unrelated issue to this bug That log comes from here https://github.com/openshift/kubernetes-autoscaler/blob/2ec541e3e31778428a75e0fc16469ba22d94e4bd/cluster-autoscaler/processors/nodes/pre_filtering_processor.go#L60-L64, checking the nodegroup ID as a key in a map That map come from https://github.com/openshift/kubernetes-autoscaler/blob/2ec541e3e31778428a75e0fc16469ba22d94e4bd/cluster-autoscaler/utils/utils.go#L26-L38, which suggests the only reason the group size would not be present is if there is an error in `TargetSize()`, except that in our implementation, it's impossible for that method to return an error I'll spin up a cluster and see if I can reproduce this Ok, I've found the issue, the bug isn't fully resolved, but it is hidden in the later releases by the autoscaling from zero work that was done. https://github.com/openshift/kubernetes-autoscaler/blob/2ec541e3e31778428a75e0fc16469ba22d94e4bd/cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_controller.go#L365 This line requires that the replicas be set, otherwise it will fail https://github.com/openshift/kubernetes-autoscaler/blob/4abdca547be45a251ba33486324f0cac8664ca25/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller.go#L510 In the master branch, because this bug was verified on a platform that supports scaling from zero, this was hidden We've decided we need to revert this as the fix is not complete for 4.4. We will complete the fix in master and start the backport process, but it is unlikely to be worth completing for 4.4 by the time it is backported to 4.5 This is fixed in newer releases and is of low priority and low impact to users. It is not worth back porting this at this point. |