Bug 1656270
| Summary: | [cloud-CA] ClusterAutoscaler maxNodesTotal does not work | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | sunzhaohua <zhsun> |
| Component: | Cloud Compute | Assignee: | Andrew McDermott <amcdermo> |
| Status: | CLOSED ERRATA | QA Contact: | sunzhaohua <zhsun> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.1.0 | CC: | amcdermo, jhou |
| Target Milestone: | --- | ||
| Target Release: | 4.1.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-04 10:41:04 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Verified
$ bin/openshift-install version
bin/openshift-install v0.7.0-master-35-gead9f4b779a20dc32d51c3b2429d8d71d48ea043
$ oc version
oc v4.0.0-alpha.0+a2218fc-788
kubernetes v1.11.0+a2218fc
features: Basic-Auth GSSAPI Kerberos SPNEGO
1. Create clusterautoscaler
apiVersion: "autoscaling.openshift.io/v1alpha1"
kind: "ClusterAutoscaler"
metadata:
name: "default"
spec:
resourceLimits:
maxNodesTotal: 7
scaleDown:
enabled: true
delayAfterAdd: 10s
delayAfterDelete: 10s
delayAfterFailure: 10s
2. Create machineautoscaler
apiVersion: autoscaling.openshift.io/v1alpha1
kind: MachineAutoscaler
metadata:
finalizers:
- machinetarget.autoscaling.openshift.io
name: autoscale-us-east-2a
namespace: openshift-cluster-api
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: cluster.k8s.io/v1alpha1
kind: MachineSet
name: qe-zhsun-1-worker-us-east-2a
status: {}
3. Create pod to scaleup
4. Check logs and node
$ oc logs -f cluster-autoscaler-default-5777b87c56-kg6sh
I1219 02:27:27.268286 1 scale_up.go:584] Scale-up: setting group openshift-cluster-api/qe-zhsun-1-worker-us-east-2a size to 2
E1219 02:27:37.416081 1 static_autoscaler.go:275] Failed to scale up: max node total count already reached
E1219 02:27:47.504674 1 static_autoscaler.go:275] Failed to scale up: max node total count already reached
E1219 02:27:57.574213 1 static_autoscaler.go:275] Failed to scale up: max node total count already reached
E1219 02:28:07.642231 1 static_autoscaler.go:275] Failed to scale up: max node total count already reached
$ oc get node
NAME STATUS ROLES AGE VERSION
ip-10-0-1-191.us-east-2.compute.internal Ready master 23m v1.11.0+a2218fc
ip-10-0-141-194.us-east-2.compute.internal Ready worker 19m v1.11.0+a2218fc
ip-10-0-148-215.us-east-2.compute.internal Ready worker 10m v1.11.0+a2218fc
ip-10-0-150-181.us-east-2.compute.internal Ready worker 19m v1.11.0+a2218fc
ip-10-0-164-244.us-east-2.compute.internal Ready worker 19m v1.11.0+a2218fc
ip-10-0-27-8.us-east-2.compute.internal Ready master 23m v1.11.0+a2218fc
ip-10-0-32-233.us-east-2.compute.internal Ready master 23m v1.11.0+a2218fc
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |
Description of problem: Cluster autoscaler can scale up nodes to a number greater than the intended value. Version-Release number of selected component (if applicable): $ bin/openshift-install version bin/openshift-install v0.5.0-master-2-g78e2c8b144352b1bef854501d3760a9daaaa2eb0 Terraform v0.11.8 How reproducible: Always Steps to Reproduce: 1. Create clusterautoscaler resource, set maxNodesTotal=10 2. Create pod to scale up the cluster 3. Check node number Actual results: Node number greater than the set value $ oc edit clusterautoscaler default apiVersion: autoscaling.openshift.io/v1alpha1 kind: ClusterAutoscaler metadata: creationTimestamp: 2018-12-04T04:47:54Z generation: 1 name: default resourceVersion: "85156" selfLink: /apis/autoscaling.openshift.io/v1alpha1/clusterautoscalers/default uid: c3263c80-f77f-11e8-ba7f-0644519597a8 spec: resourceLimits: maxNodesTotal: 10 scaleDown: delayAfterAdd: 10s delayAfterDelete: 10s delayAfterFailure: 10s enabled: true $ oc logs -f cluster-autoscaler-default-77f666c784-t5svt I1204 06:57:18.921091 1 leaderelection.go:187] attempting to acquire leader lease openshift-cluster-api/cluster-autoscaler... I1204 06:57:35.138441 1 leaderelection.go:196] successfully acquired lease openshift-cluster-api/cluster-autoscaler I1204 06:57:45.479845 1 scale_up.go:584] Scale-up: setting group qe-zhsun-worker-us-east-2a size to 3 I1204 06:57:56.302130 1 scale_up.go:584] Scale-up: setting group qe-zhsun-worker-us-east-2b size to 3 I1204 06:58:06.400558 1 scale_up.go:584] Scale-up: setting group qe-zhsun-worker-us-east-2c size to 3 $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-129-142.us-east-2.compute.internal Ready worker 6m v1.11.0+b74cbdf ip-10-0-135-191.us-east-2.compute.internal Ready worker 2h v1.11.0+b74cbdf ip-10-0-139-100.us-east-2.compute.internal Ready worker 6m v1.11.0+b74cbdf ip-10-0-146-243.us-east-2.compute.internal Ready worker 27m v1.11.0+b74cbdf ip-10-0-148-83.us-east-2.compute.internal Ready worker 6m v1.11.0+b74cbdf ip-10-0-15-241.us-east-2.compute.internal Ready master 2h v1.11.0+b74cbdf ip-10-0-150-26.us-east-2.compute.internal Ready worker 6m v1.11.0+b74cbdf ip-10-0-160-98.us-east-2.compute.internal Ready worker 27m v1.11.0+b74cbdf ip-10-0-161-210.us-east-2.compute.internal Ready worker 6m v1.11.0+b74cbdf ip-10-0-166-156.us-east-2.compute.internal Ready worker 6m v1.11.0+b74cbdf ip-10-0-21-79.us-east-2.compute.internal Ready master 2h v1.11.0+b74cbdf ip-10-0-40-58.us-east-2.compute.internal Ready master 2h v1.11.0+b74cbdf $ oc get machine NAME AGE qe-zhsun-master-0 2h qe-zhsun-master-1 2h qe-zhsun-master-2 2h qe-zhsun-worker-us-east-2a-mv5t9 9m qe-zhsun-worker-us-east-2a-rcqf6 9m qe-zhsun-worker-us-east-2a-xc49n 2h qe-zhsun-worker-us-east-2b-9l2bs 9m qe-zhsun-worker-us-east-2b-m5fc9 9m qe-zhsun-worker-us-east-2b-n4jzn 30m qe-zhsun-worker-us-east-2c-2xp5f 9m qe-zhsun-worker-us-east-2c-b5vvw 9m qe-zhsun-worker-us-east-2c-tx4jp 30m Expected results: Node number less than the set value Additional info: