Bug 1854907 - Config logic for skip-nodes-with-local-storage is flawed
Summary: Config logic for skip-nodes-with-local-storage is flawed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.0
Assignee: Michael McCune
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On:
Blocks: 1879162
TreeView+ depends on / blocked
 
Reported: 2020-07-08 12:33 UTC by Marcel Härri
Modified: 2020-10-27 16:13 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Setting any of the ClusterAutoscaler resource values "balanceSimilarNodeGroups", "ignoreDaemonsetsUtilization", or "skipNodesWithLocalStorage" to "false". Consequence: The false setting is not respected when the cluster autoscaler is deployed. Fix: The cluster-autoscaler-operator has been patched to ensure these values are read properly when deploying the cluster-autoscaler. Result: The cluster-autoscaler now properly reads the "false" value.
Clone Of:
: 1879162 (view as bug list)
Environment:
Last Closed: 2020-10-27 16:12:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-autoscaler-operator issues 155 0 None closed Config logic for skip-nodes-with-local-storage is flawed 2020-11-03 07:58:06 UTC
Github openshift cluster-autoscaler-operator pull 156 0 None closed Bug 1854907: Fix #155 - ensure full boolean arguments 2020-11-03 07:58:07 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:13:15 UTC

Description Marcel Härri 2020-07-08 12:33:49 UTC
There is an example to set the following option:

cluster-autoscaler-operator/examples/clusterautoscaler.yaml

Line 9 in 9c4a47c

 skipNodesWithLocalStorage: true 

However, when setting this option to false nothing happens. The deployment is not getting updated.

This is because the configuration logic is flawed:

cluster-autoscaler-operator/pkg/controller/clusterautoscaler/clusterautoscaler.go

Lines 95 to 97 in 9c4a47c

 if ca.Spec.SkipNodesWithLocalStorage != nil && *ca.Spec.SkipNodesWithLocalStorage { 
 	args = append(args, SkipNodesWithLocalStorage.String()) 
 } 

But you want the autoscaler to run with --skip-nodes-with-local-storage=false if you want to scale down nodes with pods using emptyDir.


There is already a fix available: https://github.com/openshift/cluster-autoscaler-operator/pull/156

It would be nice to have it backported at least down to 4.4

Comment 1 Michael McCune 2020-07-08 13:16:24 UTC
thanks for posting this Marcel, i am taking a look at the issue and pull request.

Comment 2 Michael McCune 2020-07-28 21:02:20 UTC
we need to get another review on this from our team, but we should be able to merge it soon.

Comment 5 sunzhaohua 2020-08-05 07:37:06 UTC
Verified
clusterversion: 4.6.0-0.nightly-2020-08-05-013608
spec:
  balanceSimilarNodeGroups: false
  skipNodesWithLocalStorage: false
  ignoreDaemonsetsUtilization: false
$ oc edit deploy cluster-autoscaler-default
        - --balance-similar-node-groups=false
        - --ignore-daemonsets-utilization=false
        - --skip-nodes-with-local-storage=false
spec:
  balanceSimilarNodeGroups: true
  skipNodesWithLocalStorage: true
  ignoreDaemonsetsUtilization: true
        - --balance-similar-node-groups=true
        - --ignore-daemonsets-utilization=true
        - --skip-nodes-with-local-storage=true

Comment 6 Marcel Härri 2020-08-05 14:33:54 UTC
Can we get this backported to 4.4 / 4.5 ?

Comment 7 Michael McCune 2020-08-05 16:06:57 UTC
i think this is a good candidate for backport, it should be possible to do this sprint.

Comment 8 Michael McCune 2020-08-17 19:15:16 UTC
planning to get this backported during the upcoming sprint.

Comment 10 errata-xmlrpc 2020-10-27 16:12:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.