Hide Forgot
Description of problem: Cluster autoscaler can scale up nodes to a number greater than the intended value. Version-Release number of selected component (if applicable): $ bin/openshift-install version bin/openshift-install v0.5.0-master-2-g78e2c8b144352b1bef854501d3760a9daaaa2eb0 Terraform v0.11.8 How reproducible: Always Steps to Reproduce: 1. Create clusterautoscaler resource, set maxNodesTotal=10 2. Create pod to scale up the cluster 3. Check node number Actual results: Node number greater than the set value $ oc edit clusterautoscaler default apiVersion: autoscaling.openshift.io/v1alpha1 kind: ClusterAutoscaler metadata: creationTimestamp: 2018-12-04T04:47:54Z generation: 1 name: default resourceVersion: "85156" selfLink: /apis/autoscaling.openshift.io/v1alpha1/clusterautoscalers/default uid: c3263c80-f77f-11e8-ba7f-0644519597a8 spec: resourceLimits: maxNodesTotal: 10 scaleDown: delayAfterAdd: 10s delayAfterDelete: 10s delayAfterFailure: 10s enabled: true $ oc logs -f cluster-autoscaler-default-77f666c784-t5svt I1204 06:57:18.921091 1 leaderelection.go:187] attempting to acquire leader lease openshift-cluster-api/cluster-autoscaler... I1204 06:57:35.138441 1 leaderelection.go:196] successfully acquired lease openshift-cluster-api/cluster-autoscaler I1204 06:57:45.479845 1 scale_up.go:584] Scale-up: setting group qe-zhsun-worker-us-east-2a size to 3 I1204 06:57:56.302130 1 scale_up.go:584] Scale-up: setting group qe-zhsun-worker-us-east-2b size to 3 I1204 06:58:06.400558 1 scale_up.go:584] Scale-up: setting group qe-zhsun-worker-us-east-2c size to 3 $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-129-142.us-east-2.compute.internal Ready worker 6m v1.11.0+b74cbdf ip-10-0-135-191.us-east-2.compute.internal Ready worker 2h v1.11.0+b74cbdf ip-10-0-139-100.us-east-2.compute.internal Ready worker 6m v1.11.0+b74cbdf ip-10-0-146-243.us-east-2.compute.internal Ready worker 27m v1.11.0+b74cbdf ip-10-0-148-83.us-east-2.compute.internal Ready worker 6m v1.11.0+b74cbdf ip-10-0-15-241.us-east-2.compute.internal Ready master 2h v1.11.0+b74cbdf ip-10-0-150-26.us-east-2.compute.internal Ready worker 6m v1.11.0+b74cbdf ip-10-0-160-98.us-east-2.compute.internal Ready worker 27m v1.11.0+b74cbdf ip-10-0-161-210.us-east-2.compute.internal Ready worker 6m v1.11.0+b74cbdf ip-10-0-166-156.us-east-2.compute.internal Ready worker 6m v1.11.0+b74cbdf ip-10-0-21-79.us-east-2.compute.internal Ready master 2h v1.11.0+b74cbdf ip-10-0-40-58.us-east-2.compute.internal Ready master 2h v1.11.0+b74cbdf $ oc get machine NAME AGE qe-zhsun-master-0 2h qe-zhsun-master-1 2h qe-zhsun-master-2 2h qe-zhsun-worker-us-east-2a-mv5t9 9m qe-zhsun-worker-us-east-2a-rcqf6 9m qe-zhsun-worker-us-east-2a-xc49n 2h qe-zhsun-worker-us-east-2b-9l2bs 9m qe-zhsun-worker-us-east-2b-m5fc9 9m qe-zhsun-worker-us-east-2b-n4jzn 30m qe-zhsun-worker-us-east-2c-2xp5f 9m qe-zhsun-worker-us-east-2c-b5vvw 9m qe-zhsun-worker-us-east-2c-tx4jp 30m Expected results: Node number less than the set value Additional info:
PR - https://github.com/openshift/kubernetes-autoscaler/pull/16
Verified $ bin/openshift-install version bin/openshift-install v0.7.0-master-35-gead9f4b779a20dc32d51c3b2429d8d71d48ea043 $ oc version oc v4.0.0-alpha.0+a2218fc-788 kubernetes v1.11.0+a2218fc features: Basic-Auth GSSAPI Kerberos SPNEGO 1. Create clusterautoscaler apiVersion: "autoscaling.openshift.io/v1alpha1" kind: "ClusterAutoscaler" metadata: name: "default" spec: resourceLimits: maxNodesTotal: 7 scaleDown: enabled: true delayAfterAdd: 10s delayAfterDelete: 10s delayAfterFailure: 10s 2. Create machineautoscaler apiVersion: autoscaling.openshift.io/v1alpha1 kind: MachineAutoscaler metadata: finalizers: - machinetarget.autoscaling.openshift.io name: autoscale-us-east-2a namespace: openshift-cluster-api spec: maxReplicas: 10 minReplicas: 1 scaleTargetRef: apiVersion: cluster.k8s.io/v1alpha1 kind: MachineSet name: qe-zhsun-1-worker-us-east-2a status: {} 3. Create pod to scaleup 4. Check logs and node $ oc logs -f cluster-autoscaler-default-5777b87c56-kg6sh I1219 02:27:27.268286 1 scale_up.go:584] Scale-up: setting group openshift-cluster-api/qe-zhsun-1-worker-us-east-2a size to 2 E1219 02:27:37.416081 1 static_autoscaler.go:275] Failed to scale up: max node total count already reached E1219 02:27:47.504674 1 static_autoscaler.go:275] Failed to scale up: max node total count already reached E1219 02:27:57.574213 1 static_autoscaler.go:275] Failed to scale up: max node total count already reached E1219 02:28:07.642231 1 static_autoscaler.go:275] Failed to scale up: max node total count already reached $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-1-191.us-east-2.compute.internal Ready master 23m v1.11.0+a2218fc ip-10-0-141-194.us-east-2.compute.internal Ready worker 19m v1.11.0+a2218fc ip-10-0-148-215.us-east-2.compute.internal Ready worker 10m v1.11.0+a2218fc ip-10-0-150-181.us-east-2.compute.internal Ready worker 19m v1.11.0+a2218fc ip-10-0-164-244.us-east-2.compute.internal Ready worker 19m v1.11.0+a2218fc ip-10-0-27-8.us-east-2.compute.internal Ready master 23m v1.11.0+a2218fc ip-10-0-32-233.us-east-2.compute.internal Ready master 23m v1.11.0+a2218fc
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758