Description of problem: Scale down node name does not match the autoscaler log output info Version-Release number of selected component (if applicable): $ bin/openshift-install version bin/openshift-install v0.7.0-master-35-gead9f4b779a20dc32d51c3b2429d8d71d48ea043 How reproducible: Sometimes Steps to Reproduce: 1. Deploy clusterautoscaler and machineautoscaler 2. Create pods to scale up the cluster and check node $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-1-191.us-east-2.compute.internal Ready master 3h v1.11.0+a2218fc ip-10-0-128-25.us-east-2.compute.internal Ready worker 8m v1.11.0+a2218fc ip-10-0-139-245.us-east-2.compute.internal Ready worker 8m v1.11.0+a2218fc ip-10-0-139-69.us-east-2.compute.internal Ready worker 3h v1.11.0+a2218fc ip-10-0-150-181.us-east-2.compute.internal Ready worker 3h v1.11.0+a2218fc ip-10-0-162-109.us-east-2.compute.internal Ready worker 8m v1.11.0+a2218fc ip-10-0-164-244.us-east-2.compute.internal Ready worker 3h v1.11.0+a2218fc ip-10-0-169-114.us-east-2.compute.internal Ready worker 8m v1.11.0+a2218fc ip-10-0-27-8.us-east-2.compute.internal Ready master 3h v1.11.0+a2218fc ip-10-0-32-233.us-east-2.compute.internal Ready master 3h v1.11.0+a2218fc 3. Delete pods and wait cluster to scale down 4. Check node name and autoscaler log output info $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-1-191.us-east-2.compute.internal Ready master 4h v1.11.0+a2218fc ip-10-0-139-69.us-east-2.compute.internal Ready worker 3h v1.11.0+a2218fc ip-10-0-150-181.us-east-2.compute.internal Ready worker 3h v1.11.0+a2218fc ip-10-0-162-109.us-east-2.compute.internal Ready worker 9m v1.11.0+a2218fc ip-10-0-27-8.us-east-2.compute.internal Ready master 4h v1.11.0+a2218fc ip-10-0-32-233.us-east-2.compute.internal Ready master 4h v1.11.0+a2218fc $ oc logs -f cluster-autoscaler-default-7c88c947bc-7vp7d I1219 05:59:17.414597 1 scale_down.go:791] Scale-down: removing empty node ip-10-0-139-245.us-east-2.compute.internal I1219 05:59:17.415418 1 scale_down.go:791] Scale-down: removing empty node ip-10-0-162-109.us-east-2.compute.internal I1219 05:59:17.416561 1 scale_down.go:791] Scale-down: removing empty node ip-10-0-128-25.us-east-2.compute.internal E1219 05:59:17.564286 1 scale_down.go:841] Problem with empty node deletion: failed to delete ip-10-0-139-245.us-east-2.compute.internal: unable to update number of replicas of machineset "openshift-cluster-api/qe-zhsun-1-worker-us-east-2a": Operation cannot be fulfilled on machinesets.cluster.k8s.io "qe-zhsun-1-worker-us-east-2a": the object has been modified; please apply your changes to the latest version and try again E1219 05:59:17.569363 1 static_autoscaler.go:341] Failed to scale down: <nil> I1219 05:59:38.746554 1 scale_down.go:791] Scale-down: removing empty node ip-10-0-162-109.us-east-2.compute.internal I1219 05:59:38.746620 1 scale_down.go:791] Scale-down: removing empty node ip-10-0-128-25.us-east-2.compute.internal Actual results: Scale down node name does not match the autoscaler log output info. In fact, these nodes are removed. ip-10-0-128-25.us-east-2.compute.internal Ready worker 8m v1.11.0+a2218fc ip-10-0-139-245.us-east-2.compute.internal Ready worker 8m v1.11.0+a2218fc ip-10-0-164-244.us-east-2.compute.internal Ready worker 3h v1.11.0+a2218fc ip-10-0-169-114.us-east-2.compute.internal Ready worker 8m v1.11.0+a2218fc Expected results: Scale down node name match with the autoscaler log output info. Additional info:
I wasn't able to reproduce this: I started out with: 1 ip-10-0-131-4.us-east-2.compute.internal Ready master 69m v1.12.4+2a194a0f02 2 ip-10-0-153-24.us-east-2.compute.internal Ready master 69m v1.12.4+2a194a0f02 3 ip-10-0-162-238.us-east-2.compute.internal Ready master 69m v1.12.4+2a194a0f02 4 ip-10-0-130-106.us-east-2.compute.internal Ready worker 56m v1.12.4+2a194a0f02 5 ip-10-0-170-45.us-east-2.compute.internal Ready worker 56m v1.12.4+2a194a0f02 6 ip-10-0-157-225.us-east-2.compute.internal Ready worker 56m v1.12.4+2a194a0f02 I scaled out to: 1 ip-10-0-153-24.us-east-2.compute.internal Ready master 78m v1.12.4+2a194a0f02 2 ip-10-0-131-4.us-east-2.compute.internal Ready master 78m v1.12.4+2a194a0f02 3 ip-10-0-162-238.us-east-2.compute.internal Ready master 78m v1.12.4+2a194a0f02 4 ip-10-0-130-106.us-east-2.compute.internal Ready worker 65m v1.12.4+2a194a0f02 5 ip-10-0-170-45.us-east-2.compute.internal Ready worker 65m v1.12.4+2a194a0f02 6 ip-10-0-157-225.us-east-2.compute.internal Ready worker 65m v1.12.4+2a194a0f02 7 ip-10-0-140-62.us-east-2.compute.internal Ready worker 6m58s v1.12.4+2a194a0f02 8 ip-10-0-140-54.us-east-2.compute.internal Ready worker 6m58s v1.12.4+2a194a0f02 9 ip-10-0-129-55.us-east-2.compute.internal Ready worker 6m58s v1.12.4+2a194a0f02 10 ip-10-0-128-202.us-east-2.compute.internal Ready worker 6m57s v1.12.4+2a194a0f02 11 ip-10-0-139-109.us-east-2.compute.internal Ready worker 6m57s v1.12.4+2a194a0f02 12 ip-10-0-136-66.us-east-2.compute.internal Ready worker 6m57s v1.12.4+2a194a0f02 13 ip-10-0-133-132.us-east-2.compute.internal Ready worker 6m57s v1.12.4+2a194a0f02 14 ip-10-0-138-177.us-east-2.compute.internal Ready worker 6m56s v1.12.4+2a194a0f02 15 ip-10-0-139-100.us-east-2.compute.internal Ready worker 6m56s v1.12.4+2a194a0f02 16 ip-10-0-137-149.us-east-2.compute.internal Ready worker 6m55s v1.12.4+2a194a0f02 17 ip-10-0-139-77.us-east-2.compute.internal Ready worker 6m47s v1.12.4+2a194a0f02 and, after scale down, I ended up with all the new nodes being deleted: 1 ip-10-0-131-4.us-east-2.compute.internal Ready master 96m v1.12.4+2a194a0f02 2 ip-10-0-153-24.us-east-2.compute.internal Ready master 96m v1.12.4+2a194a0f02 3 ip-10-0-162-238.us-east-2.compute.internal Ready master 96m v1.12.4+2a194a0f02 4 ip-10-0-130-106.us-east-2.compute.internal Ready worker 83m v1.12.4+2a194a0f02 5 ip-10-0-170-45.us-east-2.compute.internal Ready worker 83m v1.12.4+2a194a0f02 6 ip-10-0-157-225.us-east-2.compute.internal Ready worker 83m v1.12.4+2a194a0f02
In new version I wasn't able to reproduce this, too. Close this bug. Clusterversion: 4.0.0-0.nightly-2019-03-13-233958