Hide Forgot
Description of problem: Autoscaler couldn't scale up Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.alpha-2019-01-10-030010 True False 24m Cluster version is 4.0.0-0.alpha-2019-01-10-030010 How reproducible: Always Steps to Reproduce: 1. Create clusterautoscaler resource $ oc get clusterautoscaler -o yaml apiVersion: "autoscaling.openshift.io/v1alpha1" kind: "ClusterAutoscaler" metadata: name: "default" spec: scaleDown: enabled: true delayAfterAdd: 10s delayAfterDelete: 10s delayAfterFailure: 10s 2. Create machineautoscaler resource $ oc get machineautoscaler worker-us-east-2a -o yaml apiVersion: "autoscaling.openshift.io/v1alpha1" kind: "MachineAutoscaler" metadata: name: "worker-us-east-2a" namespace: "openshift-cluster-api" spec: minReplicas: 1 maxReplicas: 3 scaleTargetRef: apiVersion: cluster.k8s.io/v1alpha1 kind: MachineSet name: zhsun-worker-us-east-2a 3. Create pod to scale up the cluster $ oc get deploy scale-up -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: scale-up labels: app: scale-up spec: replicas: 20 selector: matchLabels: app: scale-up template: metadata: labels: app: scale-up spec: containers: - name: busybox image: docker.io/library/busybox resources: requests: memory: 2Gi command: - /bin/sh - "-c" - "echo 'this should be in the logs' && sleep 86400" terminationGracePeriodSeconds: 0 4. Check machine, node and logs Actual results: Couldn't scale up. $ oc get machine NAME AGE zhsun-master-0 39m zhsun-master-1 39m zhsun-master-2 39m zhsun-worker-us-east-2a-8cj64 38m zhsun-worker-us-east-2b-6mqq8 38m zhsun-worker-us-east-2b-jxc2k 18m zhsun-worker-us-east-2b-xf2qd 18m zhsun-worker-us-east-2c-b7gxz 18m zhsun-worker-us-east-2c-cdq4q 18m zhsun-worker-us-east-2c-lv2f7 38m $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-138-114.us-east-2.compute.internal Ready worker 37m v1.11.0+f67f40dbad ip-10-0-147-190.us-east-2.compute.internal Ready worker 37m v1.11.0+f67f40dbad ip-10-0-169-58.us-east-2.compute.internal Ready worker 37m v1.11.0+f67f40dbad ip-10-0-22-114.us-east-2.compute.internal Ready master 41m v1.11.0+f67f40dbad ip-10-0-37-96.us-east-2.compute.internal Ready master 41m v1.11.0+f67f40dbad ip-10-0-9-142.us-east-2.compute.internal Ready master 41m v1.11.0+f67f40dbad $ oc logs -f cluster-autoscaler-default-56c9cd4b6d-cvz84 I0110 04:44:48.271569 1 scale_up.go:584] Scale-up: setting group openshift-cluster-api/zhsun-worker-us-east-2b size to 3 I0110 04:44:58.422215 1 scale_up.go:584] Scale-up: setting group openshift-cluster-api/zhsun-worker-us-east-2c size to 3 W0110 04:59:50.716321 1 clusterstate.go:201] Scale-up timed out for node group openshift-cluster-api/zhsun-worker-us-east-2b after 15m2.424890837s W0110 04:59:50.721855 1 clusterstate.go:223] Disabling scale-up for node group openshift-cluster-api/zhsun-worker-us-east-2b until 2019-01-10 05:04:50.715266058 +0000 UTC m=+1685.106703017 W0110 04:59:50.784942 1 scale_up.go:327] Node group openshift-cluster-api/zhsun-worker-us-east-2b is not ready for scaleup - backoff W0110 05:00:00.803310 1 clusterstate.go:201] Scale-up timed out for node group openshift-cluster-api/zhsun-worker-us-east-2c after 15m2.365596705s W0110 05:00:00.803370 1 clusterstate.go:223] Disabling scale-up for node group openshift-cluster-api/zhsun-worker-us-east-2c until 2019-01-10 05:05:00.802366978 +0000 UTC m=+1695.193804101 W0110 05:00:00.865659 1 scale_up.go:327] Node group openshift-cluster-api/zhsun-worker-us-east-2b is not ready for scaleup - backoff W0110 05:00:00.865692 1 scale_up.go:327] Node group openshift-cluster-api/zhsun-worker-us-east-2c is not ready for scaleup - backoff $ oc logs -f clusterapi-manager-controllers-6f9cf4dd7c-lsz8f -c machine-controller I0110 04:56:23.731044 1 utils.go:151] Falling to providerConfig E0110 04:56:23.731054 1 actuator.go:384] error decoding MachineProviderConfig: unable to find machine provider config: neither Spec.ProviderConfig.Value nor Spec.ProviderConfig.ValueFrom set E0110 04:56:23.731063 1 actuator.go:351] error getting running instances: unable to find machine provider config: neither Spec.ProviderConfig.Value nor Spec.ProviderConfig.ValueFrom set E0110 04:56:23.731072 1 controller.go:166] Error checking existence of machine instance for machine object zhsun-worker-us-east-2c-cdq4q; unable to find machine provider config: neither Spec.ProviderConfig.Value nor Spec.ProviderConfig.ValueFrom set I0110 04:56:24.731448 1 actuator.go:347] checking if machine exists Machineset "providerSpec" disappeared. $ oc edit machineset zhsun-worker-us-east-2b spec: replicas: 3 selector: matchLabels: sigs.k8s.io/cluster-api-cluster: zhsun sigs.k8s.io/cluster-api-machineset: zhsun-worker-us-east-2b template: metadata: creationTimestamp: null labels: sigs.k8s.io/cluster-api-cluster: zhsun sigs.k8s.io/cluster-api-machine-role: worker sigs.k8s.io/cluster-api-machine-type: worker sigs.k8s.io/cluster-api-machineset: zhsun-worker-us-east-2b spec: metadata: creationTimestamp: null providerConfig: {} versions: kubelet: "" status: availableReplicas: 1 fullyLabeledReplicas: 3 observedGeneration: 2 readyReplicas: 1 replicas: 3 Expected results: Could scale up normally. Additional info: As soon as we create a machineautoscaler, machineset providerSpec field will disappear
Hi sunzhaohua, can you share the machineset CRD definition? `kubectl get crd machinesets.cluster.k8s.io -o yaml` will do. To confirm if the providerSpec field is defined or missing. Thanks
Verified. In the new version, I didn't reproduce this issue. Cluster can scale up and down noramlly. If reproduced, will reopen a bug and check the CRD definition. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.alpha-2019-01-15-001217 True False 2h Cluster version is 4.0.0-0.alpha-2019-01-15-001217 $ oc get machine NAME INSTANCE STATE TYPE REGION ZONE AGE zhsun-master-0 i-080a7bb622af5dbf7 running m4.large us-east-2 us-east-2a 29m zhsun-master-1 i-086bb1037011b5e66 running m4.large us-east-2 us-east-2b 29m zhsun-master-2 i-044ba37ef3c01df49 running m4.large us-east-2 us-east-2c 29m zhsun-worker-us-east-2a-5s7wd i-0e37fa6f833672972 running m4.large us-east-2 us-east-2a 28m zhsun-worker-us-east-2a-8lszv i-019e3f765a1149f66 running m4.large us-east-2 us-east-2a 5m zhsun-worker-us-east-2a-dsqsj i-062f9d90e4e545117 running m4.large us-east-2 us-east-2a 5m zhsun-worker-us-east-2a-gmgx2 i-027057005cf3c4263 running m4.large us-east-2 us-east-2a 5m zhsun-worker-us-east-2a-z5drr i-0af94036444067f47 running m4.large us-east-2 us-east-2a 5m zhsun-worker-us-east-2b-z8wkp i-096d82f8a0ad0050a running m4.large us-east-2 us-east-2b 28m zhsun-worker-us-east-2c-kfns2 i-0c1dc48eb2b1d2346 running m4.large us-east-2 us-east-2c 28m $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-129-168.us-east-2.compute.internal Ready worker 27m v1.11.0+c69f926354 ip-10-0-134-248.us-east-2.compute.internal Ready worker 4m v1.11.0+c69f926354 ip-10-0-134-252.us-east-2.compute.internal Ready worker 4m v1.11.0+c69f926354 ip-10-0-134-67.us-east-2.compute.internal Ready worker 5m v1.11.0+c69f926354 ip-10-0-139-238.us-east-2.compute.internal Ready worker 4m v1.11.0+c69f926354 ip-10-0-15-49.us-east-2.compute.internal Ready master 37m v1.11.0+c69f926354 ip-10-0-151-196.us-east-2.compute.internal Ready worker 27m v1.11.0+c69f926354 ip-10-0-171-213.us-east-2.compute.internal Ready worker 27m v1.11.0+c69f926354 ip-10-0-20-128.us-east-2.compute.internal Ready master 37m v1.11.0+c69f926354 ip-10-0-36-74.us-east-2.compute.internal Ready master 37m v1.11.0+c69f926354 [szh@localhost installer]$ oc logs -f cluster-autoscaler-default-56c9cd4b6d-vt7d8 I0115 03:50:15.993741 1 leaderelection.go:187] attempting to acquire leader lease openshift-cluster-api/cluster-autoscaler... I0115 03:50:16.056782 1 leaderelection.go:196] successfully acquired lease openshift-cluster-api/cluster-autoscaler I0115 03:51:50.058660 1 scale_up.go:584] Scale-up: setting group openshift-cluster-api/zhsun-worker-us-east-2a size to 5 I0115 04:08:07.807776 1 scale_down.go:791] Scale-down: removing empty node ip-10-0-134-67.us-east-2.compute.internal I0115 04:08:07.807922 1 scale_down.go:791] Scale-down: removing empty node ip-10-0-139-238.us-east-2.compute.internal I0115 04:08:07.807998 1 scale_down.go:791] Scale-down: removing empty node ip-10-0-134-252.us-east-2.compute.internal I0115 04:08:07.809036 1 scale_down.go:791] Scale-down: removing empty node ip-10-0-134-248.us-east-2.compute.internal
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758