Description of problem: Autoscaler failed to scale up: could not compute total resources Version-Release number of selected component (if applicable): 4.4.0-0.nightly-2020-03-26-041820 How reproducible: Always Steps to Reproduce: 1. Create clusterautoscaler and machineautoscaler 2. Create workload to scale up the cluster 3. Check autoscale logs Actual results: $ oc get machineset NAME DESIRED CURRENT READY AVAILABLE AGE leopard-hw72r-worker-0 3 3 3 3 4d12h leopard-hw72r-worker-00 1 1 1 1 49m $ oc get machineautoscaler NAME REF KIND REF NAME MIN MAX AGE worker-b MachineSet leopard-hw72r-worker-0 1 3 55m I0408 09:26:35.540214 1 scale_up.go:271] Pod openshift-machine-api/scale-up-788b7f7c75-fcc52 is unschedulable I0408 09:26:35.540223 1 scale_up.go:271] Pod openshift-machine-api/scale-up-788b7f7c75-qms7z is unschedulable I0408 09:26:35.540232 1 scale_up.go:271] Pod openshift-machine-api/scale-up-788b7f7c75-fj9ws is unschedulable I0408 09:26:35.540240 1 scale_up.go:271] Pod openshift-machine-api/scale-up-788b7f7c75-hv4cl is unschedulable I0408 09:26:35.540249 1 scale_up.go:271] Pod openshift-machine-api/scale-up-788b7f7c75-xn4jx is unschedulable E0408 09:26:35.540401 1 static_autoscaler.go:369] Failed to scale up: Could not compute total resources: No node info for: openshift-machine-api/leopard-hw72r-worker-0 W0408 09:26:35.540466 1 clusterstate.go:389] Failed to find readiness information for openshift-machine-api/leopard-hw72r-worker-0 W0408 09:26:35.540492 1 clusterstate.go:446] Failed to find readiness information for openshift-machine-api/leopard-hw72r-worker-0 W0408 09:26:35.540503 1 clusterstate.go:389] Failed to find readiness information for openshift-machine-api/leopard-hw72r-worker-0 Expected results: Autoscaler could scale up and down. Additional info:
Almost all the autoscaler cases are blocked by this.
due to capacity constraints we will be revisiting this bug in the upcoming sprint
Once we got a RHV env, we will test this.
Verify clusterversion: 4.6.0-0.nightly-2020-08-12-062953 1. Create clusterautoscaler and machineautoscaler --- apiVersion: "autoscaling.openshift.io/v1" kind: "ClusterAutoscaler" metadata: name: "default" spec: scaleDown: enabled: true delayAfterAdd: 10s delayAfterDelete: 10s delayAfterFailure: 10s unneededTime: 10s $ oc get machineautoscaler NAME REF KIND REF NAME MIN MAX AGE worker-c MachineSet primary-zbpb5-worker-1 1 3 4h9m 2. Create workload to scale up the cluster 3. Check machine status, csrs, autoscale logs I0817 06:24:58.795113 1 scale_up.go:456] Best option to resize: openshift-machine-api/primary-zbpb5-worker-1 I0817 06:24:58.795140 1 scale_up.go:460] Estimated 3 nodes needed in openshift-machine-api/primary-zbpb5-worker-1 I0817 06:24:58.997943 1 scale_up.go:574] Final scale-up plan: [{openshift-machine-api/primary-zbpb5-worker-1 1->3 (max: 3)}] I0817 06:24:58.998168 1 scale_up.go:663] Scale-up: setting group openshift-machine-api/primary-zbpb5-worker-1 size to 3 $ oc get machine NAME PHASE TYPE REGION ZONE AGE primary-zbpb5-master-0 Running 4d23h primary-zbpb5-master-1 Running 4d23h primary-zbpb5-master-2 Running 4d23h primary-zbpb5-worker-0-ctv4b Running 4d22h primary-zbpb5-worker-0-f9dht Running 4d22h primary-zbpb5-worker-0-mcl94 Running 4d22h primary-zbpb5-worker-1-fzfdf Running 16m primary-zbpb5-worker-1-nvm77 Running 16m primary-zbpb5-worker-1-v4s9p Running 50m $ oc get node NAME STATUS ROLES AGE VERSION primary-zbpb5-master-0 Ready master 4d22h v1.19.0-rc.2+edbf229-dirty primary-zbpb5-master-1 Ready master 4d22h v1.19.0-rc.2+edbf229-dirty primary-zbpb5-master-2 Ready master 4d22h v1.19.0-rc.2+edbf229-dirty primary-zbpb5-worker-0-ctv4b Ready worker 4d22h v1.19.0-rc.2+edbf229-dirty primary-zbpb5-worker-0-f9dht Ready worker 4d22h v1.19.0-rc.2+edbf229-dirty primary-zbpb5-worker-0-mcl94 Ready worker 4d22h v1.19.0-rc.2+edbf229-dirty primary-zbpb5-worker-1-fzfdf Ready worker 4m57s v1.19.0-rc.2+edbf229-dirty primary-zbpb5-worker-1-nvm77 Ready worker 2m53s v1.19.0-rc.2+edbf229-dirty primary-zbpb5-worker-1-v4s9p Ready worker 34m v1.19.0-rc.2+edbf229-dirty
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196