Bug 1854907
| Summary: | Config logic for skip-nodes-with-local-storage is flawed | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Marcel Härri <mharri> | |
| Component: | Cloud Compute | Assignee: | Michael McCune <mimccune> | |
| Cloud Compute sub component: | Other Providers | QA Contact: | sunzhaohua <zhsun> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | unspecified | CC: | mimccune | |
| Version: | 4.4 | Keywords: | UpcomingSprint | |
| Target Milestone: | --- | |||
| Target Release: | 4.6.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause: Setting any of the ClusterAutoscaler resource values "balanceSimilarNodeGroups", "ignoreDaemonsetsUtilization", or "skipNodesWithLocalStorage" to "false".
Consequence: The false setting is not respected when the cluster autoscaler is deployed.
Fix: The cluster-autoscaler-operator has been patched to ensure these values are read properly when deploying the cluster-autoscaler.
Result: The cluster-autoscaler now properly reads the "false" value.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1879162 (view as bug list) | Environment: | ||
| Last Closed: | 2020-10-27 16:12:56 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1879162 | |||
thanks for posting this Marcel, i am taking a look at the issue and pull request. we need to get another review on this from our team, but we should be able to merge it soon. Verified
clusterversion: 4.6.0-0.nightly-2020-08-05-013608
spec:
balanceSimilarNodeGroups: false
skipNodesWithLocalStorage: false
ignoreDaemonsetsUtilization: false
$ oc edit deploy cluster-autoscaler-default
- --balance-similar-node-groups=false
- --ignore-daemonsets-utilization=false
- --skip-nodes-with-local-storage=false
spec:
balanceSimilarNodeGroups: true
skipNodesWithLocalStorage: true
ignoreDaemonsetsUtilization: true
- --balance-similar-node-groups=true
- --ignore-daemonsets-utilization=true
- --skip-nodes-with-local-storage=true
Can we get this backported to 4.4 / 4.5 ? i think this is a good candidate for backport, it should be possible to do this sprint. planning to get this backported during the upcoming sprint. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |
There is an example to set the following option: cluster-autoscaler-operator/examples/clusterautoscaler.yaml Line 9 in 9c4a47c skipNodesWithLocalStorage: true However, when setting this option to false nothing happens. The deployment is not getting updated. This is because the configuration logic is flawed: cluster-autoscaler-operator/pkg/controller/clusterautoscaler/clusterautoscaler.go Lines 95 to 97 in 9c4a47c if ca.Spec.SkipNodesWithLocalStorage != nil && *ca.Spec.SkipNodesWithLocalStorage { args = append(args, SkipNodesWithLocalStorage.String()) } But you want the autoscaler to run with --skip-nodes-with-local-storage=false if you want to scale down nodes with pods using emptyDir. There is already a fix available: https://github.com/openshift/cluster-autoscaler-operator/pull/156 It would be nice to have it backported at least down to 4.4